Compare commits

...

725 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
7414b30308 Merge pull request #4116 from pipecat-ai/changelog-0.0.107
Release 0.0.107 - Changelog Update
2026-03-23 20:13:49 -07:00
aconchillo
3268cb93d5 Update changelog for version 0.0.107 2026-03-23 20:11:31 -07:00
Aleix Conchillo Flaque
9211379720 update uv.lock 2026-03-23 20:06:28 -07:00
Filipi da Silva Fuchter
12dc429761 Merge pull request #4104 from pipecat-ai/filipi/audio_issue
Allow defining whether to insert silence in the output transport.
2026-03-23 17:17:37 -04:00
filipi87
066b206b3d Renaming insert_silence to auto_silence 2026-03-23 18:12:26 -03:00
filipi87
ddd1b71b56 Renaming audio_out_insert_silence to audio_out_auto_silence 2026-03-23 17:57:42 -03:00
filipi87
8612c9f50a Updating to use daily-python 0.27.0 2026-03-23 17:52:41 -03:00
Mark Backman
fd0bfe141f Merge pull request #4109 from pipecat-ai/pk/tiny-fix 2026-03-23 15:17:19 -04:00
filipi87
3042929989 Fixing changelog description. 2026-03-23 15:57:25 -03:00
Mark Backman
f283cc5bc6 Merge pull request #4091 from pipecat-ai/mb/gradium-multiplexing-setup
feat: send per-context setup in Gradium TTS multiplexing
2026-03-23 12:00:53 -04:00
Mark Backman
70552d7697 Add changelog entry for #4091 2026-03-23 11:58:14 -04:00
Mark Backman
84c2a24c9f feat: send per-context setup messages in Gradium TTS multiplexing
Send a setup message with client_req_id before the first text message
for each context, matching Gradium multiplexing protocol. This allows
Gradium to associate each session with its setup configuration when
using close_ws_on_eos=False.
2026-03-23 11:58:14 -04:00
Paul Kompfner
e93b0ace06 Remove an unnecessary check in SyncParallelPipeline 2026-03-23 10:00:32 -04:00
filipi87
e6602f9244 Disabling auto_silence for tavus video service. 2026-03-22 18:28:57 -03:00
filipi87
9a30b18f21 Configuring Daily CustomAudioSource to automatically inject silence or not. 2026-03-22 17:29:01 -03:00
filipi87
936a39f4a1 Updating tavus examples to not send silence. 2026-03-22 14:41:23 -03:00
filipi87
3b1cb30926 Adding changelog entry. 2026-03-22 13:26:00 -03:00
filipi87
ce36487143 Allow defining whether to insert silence in the output transport. 2026-03-22 13:09:09 -03:00
Mark Backman
ec3bd8c5b1 Merge pull request #4097 from pipecat-ai/mb/update-minimax-docs-link
Update MiniMaxHttpTTSService platform docs link
2026-03-21 07:08:40 -04:00
Mark Backman
622ebd5d74 Update MiniMaxHttpTTSService platform docs link 2026-03-21 07:02:06 -04:00
Mark Backman
a9a1941a45 Merge pull request #4093 from poislagarde/fix/genesys-pong-parameters 2026-03-20 19:52:58 -04:00
Pablo Ois Lagarde
53e0136366 chore: rename changelog fragment to PR #4093
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:46:35 -03:00
Pablo Ois Lagarde
bc0e7130b8 fix: always include parameters field in Genesys AudioHook messages
The AudioHook protocol requires every message to carry a `parameters`
object. `_create_message` conditionally included it only when parameters
were truthy, so pong responses and closed responses without
outputVariables were sent without the field.

Clients that validate message structure (including the Genesys reference
implementation) rejected these messages, which broke server sequence
tracking and prevented outputVariables from reaching the Architect flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:37:53 -03:00
Mark Backman
d8af4447ff Merge pull request #3449 from kingster/telemetry-fix-system-message
fix: Record correct system_instruction in LLM spans for LLM services
2026-03-20 13:42:47 -04:00
Mark Backman
c89e366739 refactor: align tracing attributes with OpenTelemetry GenAI conventions
- gen_ai.system -> gen_ai.provider.name (deprecated)
- system / system_instructions -> gen_ai.system_instructions
- gen_ai.usage.cache_read_input_tokens -> gen_ai.usage.cache_read.input_tokens
- gen_ai.usage.cache_creation_input_tokens -> gen_ai.usage.cache_creation.input_tokens
2026-03-20 13:36:20 -04:00
Mark Backman
b5c362d6e6 refactor: rename tracing span attribute "system" to "system_instructions"
Align with the OpenTelemetry GenAI semantic convention
gen_ai.system_instructions for system prompts. The old "system"
attribute name was unrelated to gen_ai.system (which is for
provider name).
2026-03-20 13:20:03 -04:00
Mark Backman
e5aaa4c4eb fix: read system_instruction from _settings instead of removed attribute
Replace adapter-based extraction in traced_llm with direct reads from
_settings.system_instruction (priority) and context messages (fallback).
The old approach had three bugs: signature mismatch with Anthropic
adapter, key name inconsistency, and unnecessary overhead from full
message/tools conversion.

Also deduplicate the system instruction in spans -- it was appearing as
both "system" and "param.system_instruction".
2026-03-20 13:12:40 -04:00
Varun Singh
a12ad27348 enable_dialout should not depend on sip_caller_phone being set (#4087)
* enable_dialout should not depend on SIP being set

* we still need room_prefix to have pipecat-sip, s/sip/telephony in room prefix
2026-03-20 10:01:31 -07:00
Mark Backman
44504efdc7 Merge pull request #4090 from pipecat-ai/mb/fix-tts-audio-context-routing
fix: route TTS audio through audio context queue in Fish, LMNT, Neuph…
2026-03-20 11:06:53 -04:00
Mark Backman
da8070e98e Add changelog entry for #4090 2026-03-20 10:46:36 -04:00
Mark Backman
b98ad7fb64 fix: route TTS audio through audio context queue in Fish, LMNT, Neuphonic, Rime NonJson
These services were pushing audio frames directly via push_frame() in their
WebSocket receive loops, bypassing the base TTSService audio context
serialization queue. This causes incorrect frame ordering and broken
interruption handling.

Changes per service:
- Fish Audio: use append_to_audio_context(), replace _handle_interruption
  with on_audio_context_interrupted()
- LMNT: use append_to_audio_context(), remove redundant push_frame override
- Neuphonic: use append_to_audio_context(), remove redundant push_frame and
  process_frame overrides (base class handles pause/resume)
- Rime NonJson: use append_to_audio_context(), remove redundant push_frame
  override
2026-03-20 10:41:43 -04:00
Mark Backman
10ddf45015 Merge pull request #4088 from pipecat-ai/mb/add-community-integrations-README
Add community integrations to README
2026-03-20 10:34:09 -04:00
Filipi da Silva Fuchter
e41cb2cd0c Merge pull request #4083 from pipecat-ai/filipi/deepgram_sagemaker_tts_improvements
Improvements to DeepgramSageMakerTTSService
2026-03-20 10:30:48 -04:00
Filipi da Silva Fuchter
a69abcc67a Merge pull request #4082 from pipecat-ai/filipi/sarvam_tts_improvements
Improvements to SarvamTTSService.
2026-03-20 10:28:02 -04:00
Mark Backman
a11c48d5b0 Add community integrations to README 2026-03-20 10:09:58 -04:00
Kinshuk Bairagi
7caec9018b Merge branch 'pipecat-ai:main' into telemetry-fix-system-message 2026-03-20 18:36:31 +05:30
kompfner
08052d8880 Merge pull request #4085 from pipecat-ai/pk/remove-broken-05a-example
Remove 05a example, which was broken and isn't currently a priority t…
2026-03-19 15:59:39 -04:00
Paul Kompfner
4c456ada04 Remove 05a example, which was broken and isn't currently a priority to fix 2026-03-19 15:52:48 -04:00
kompfner
488dc1d07e Merge pull request #4074 from pipecat-ai/pk/openai-responses-llm-service
feat: add OpenAI Responses API LLM service
2026-03-19 15:44:26 -04:00
Paul Kompfner
dafbb2eb66 fix: typo "conversatione" → "conversation" in 20- examples 2026-03-19 15:38:38 -04:00
Paul Kompfner
ea1534f9f8 docs: note input_audio coming soon, no conversion needed
The LLMContext format already matches the expected Responses API
shape for input_audio, so no adapter conversion will be needed
once OpenAI enables support.
2026-03-19 15:36:23 -04:00
kompfner
f6e7599e49 Merge pull request #4029 from pipecat-ai/pk/sync-parallel-pipeline-fixes
`SyncParallelPipeline` and related fixes
2026-03-19 14:41:16 -04:00
Paul Kompfner
6424c36666 refactor: remove model init param from OpenAIResponsesLLMService
Model is only configurable via settings, matching the canonical API.
2026-03-19 14:38:01 -04:00
Paul Kompfner
05e344b9ec docs: port _closing comments from BaseOpenAILLMService 2026-03-19 14:30:34 -04:00
Paul Kompfner
4ec7be8850 feat: include cached_tokens and reasoning_tokens in usage metrics 2026-03-19 14:23:39 -04:00
Paul Kompfner
0533ea7b7f refactor: use direct attribute access for typed stream events
Replace getattr() calls with direct attribute access and isinstance()
checks on the strongly-typed OpenAI SDK event models.
2026-03-19 14:19:10 -04:00
Paul Kompfner
a3431d3b01 fix: prefer _full_model_name over _settings.model in tracing
The API-provided full model name is more specific than the
user-provided model name (e.g. includes version/snapshot details).
Reorder the lookup in _get_model_name and add a comment where the
Responses service sets the field.
2026-03-19 13:58:35 -04:00
Paul Kompfner
348df9d4ce fix: remove redundant instructions override in run_inference
The override would re-add `instructions` after the adapter had
intentionally converted it to a developer message for empty contexts.
Added a regression test.
2026-03-19 13:34:41 -04:00
Filipi da Silva Fuchter
a9256ebc35 Merge pull request #4075 from pipecat-ai/filipi/tts_frame_order
Fixing TTS frame order
2026-03-19 13:30:28 -04:00
filipi87
a0f311158d Changelog entry for the DeepgramSageMakerTTSService improvements. 2026-03-19 11:46:49 -03:00
filipi87
d3ca034c4f Routing the audio through the audio context queue. 2026-03-19 11:40:43 -03:00
filipi87
39425a675a Improvements to DeepgramSageMakerTTSService. 2026-03-19 11:32:56 -03:00
filipi87
c4d1b89049 Adding changelog entry for the Sarvam fixes. 2026-03-19 11:17:39 -03:00
filipi87
fd8c6c88bb Improvements to SarvamTTSService. 2026-03-19 11:13:17 -03:00
Paul Kompfner
57fd29f0c4 Remove changelog fragment that no longer applies after a rebase 2026-03-19 09:57:26 -04:00
Paul Kompfner
06f7da44f1 Clarify SyncParallelPipeline docstrings
Rewrite docstrings to more clearly explain what SyncParallelPipeline
does: hold all output until every parallel branch finishes, so frames
produced in response to a single input are released together.
2026-03-19 09:43:51 -04:00
Paul Kompfner
d702ebd6a2 Add frame_order parameter to SyncParallelPipeline
Adds a FrameOrder enum with ARRIVAL (default, existing behavior) and
PIPELINE (pushes frames in pipeline definition order). This lets callers
guarantee output ordering between parallel pipelines — e.g. ensuring
image frames precede audio frames — without needing a separate reordering
processor downstream.

Updates the 05-sync-speech-and-image example to use FrameOrder.PIPELINE,
removing the ImageBeforeAudioReorderer class entirely.
2026-03-19 09:43:51 -04:00
Paul Kompfner
26fc238eb7 Add changelog entry for Whisker debugger fix 2026-03-19 09:43:51 -04:00
Paul Kompfner
61ff53f2b9 Add changelog entries for PR #4029 2026-03-19 09:43:51 -04:00
Paul Kompfner
5e7639812a Add ImageBeforeAudioReorderer to sync-speech-and-image example
Add a processor after SyncParallelPipeline that ensures each image frame
precedes its corresponding TTS audio frames. SyncParallelPipeline batches
them together but doesn't guarantee branch ordering. The reorderer detects
when TTS frames arrive before their image (via context_id tracking) and
holds them until the image arrives.

Also rename ImageAudioSync to MarkImageForPlaybackSync for clarity.
2026-03-19 09:43:51 -04:00
Paul Kompfner
ba779f920f Revert a couple of logs that were changed from trace to debug just for debugging 2026-03-19 09:43:51 -04:00
Paul Kompfner
c3d6e965d8 Use TextAggregationMode.TOKEN in the 05-sync-speech-and-image
example since the SentenceAggregator already provides complete sentences.
2026-03-19 09:43:37 -04:00
Paul Kompfner
0f1ff16af1 Add sync_with_audio support for OutputImageRawFrame
Add a `sync_with_audio` field to `OutputImageRawFrame` that routes image
frames through the audio queue in the output transport, ensuring images
are only displayed after all preceding audio has been sent. This enables
proper audio/image synchronization in pipelines like the calendar month
narration example.

Update the 05-sync-speech-and-image example to use an `ImageAudioSync`
processor that sets this flag on image frames.
2026-03-19 09:41:21 -04:00
Paul Kompfner
1ede8460a2 Fix SyncParallelPipeline race condition with concurrent SystemFrame processing
The FrameProcessor two-queue architecture processes SystemFrames and
non-SystemFrames on separate concurrent async tasks. Both paths called
SyncParallelPipeline.process_frame(), which used the same per-pipeline
sink queues. A SystemFrame's wait_for_sync could steal frames from a
concurrent non-SystemFrame's wait_for_sync, corrupting synchronization
and stalling the pipeline.

This was triggered by the auto-embedded RTVI processor (added in
v0.0.101) which floods OutputTransportMessageUrgentFrame SystemFrames
through the pipeline during LLM responses.

Fix: SystemFrames (except EndFrame) now take a fast path — passed
through internal pipelines and pushed downstream directly without
touching the sink queues or drain logic. EndFrame retains the full
drain behavior as a lifecycle frame.
2026-03-19 09:41:21 -04:00
Paul Kompfner
463db59bb5 Minor comment typo fix 2026-03-19 09:41:21 -04:00
Paul Kompfner
0be4084683 Fix bug resulting in SyncParallelPipeline breaking the Whisker debugger 2026-03-19 09:41:21 -04:00
filipi87
8f6dfc4777 Mentioning the frame order fix in the changelog. 2026-03-19 10:26:58 -03:00
filipi87
6841c0719b Always appending TTSTextFrame to the audio context. 2026-03-19 10:12:01 -03:00
filipi87
2836b1ea7e Fixing the frame ordering of the AggregatedTextFrame. 2026-03-19 10:07:25 -03:00
filipi87
5fd98e1391 Fixing TTS frame order. 2026-03-19 09:43:40 -03:00
Mark Backman
ef419cd87a Merge pull request #4073 from joachimchauvet/fix/livekit-mixer-invalidstate-log-spam
Suppress InvalidState log spam from audio mixer during interruptions in LiveKit transport
2026-03-19 08:39:42 -04:00
Aleix Conchillo Flaqué
8750c26cdc Merge pull request #4080 from pipecat-ai/changelog-0.0.106
Release 0.0.106 - Changelog Update
2026-03-18 23:39:22 -07:00
aconchillo
3e0c536fe7 Update changelog for version 0.0.106 2026-03-18 23:36:18 -07:00
Aleix Conchillo Flaqué
7ee5fa9e20 Merge pull request #4079 from pipecat-ai/aleix/fix-tavus-dtmf-callback
Add missing on_dtmf_event callback to Tavus transport
2026-03-18 21:47:28 -07:00
Aleix Conchillo Flaqué
7dfcaf8096 Add missing on_dtmf_event callback to Tavus transport
The on_dtmf_event callback was added to DailyCallbacks in #4047 but
the Tavus transport was not updated, causing a missing argument error.
2026-03-18 21:46:06 -07:00
Filipi da Silva Fuchter
4aea7784c9 Fixed the ordering of _maybe_pause_frame_processing call in TTSService (#4071)
* Fixing the invocation of pause_frame_processing at the correct time when receiving LLMFullResponseEndFrame and EndFrame.
2026-03-18 16:55:59 -04:00
Mark Backman
bad10177d4 Add WakePhraseUserTurnStartStrategy (#4064)
- Add WakePhraseUserTurnStartStrategy for gating interaction behind wake                                                                            
  phrase detection, with timeout and single_activation modes                                                                                        
- Add default_user_turn_start_strategies() and                                                                                                      
  default_user_turn_stop_strategies() helper functions                                                                                              
- Deprecate WakeCheckFilter in favor of the new strategy
- Extend ProcessFrameResult to stop strategies for short-circuit evaluation
- Fix MinWordsUserTurnStartStrategy including filtered text in output
2026-03-18 16:47:17 -04:00
Mark Backman
c4be513044 Improvements for Nova Sonic LLM and TTS output frames (#4042)
* Fix empty user transcription causing spurious interruption in Nova Sonic

Skip _report_user_transcription_ended() when _user_text_buffer is empty,
which happens when the initial prompt is text-only. Previously, an empty
TranscriptionFrame was pushed upstream, triggering a chain reaction:
on_user_turn_stopped → UserStartedSpeakingFrame → interruption →
premature BotStoppedSpeaking → multiple response start/stop cycles.

* Improve TextFrame and assistant end of turn logic

Now, SPECULATIVE text results are used to push the LLMTextFrame,
AggregatedTextFrame, and TTSTextFrame. Additionally, the TTSTextFrames
are push at the end of the corresponding audio segment. 

* Remove BotStoppedSpeakingFrame fallback from Nova Sonic

Now that assistant response end is detected directly from Nova Sonic
contentEnd events (END_TURN and INTERRUPTED), the BotStoppedSpeakingFrame
handler is no longer needed. Inline the cleanup logic in reset_conversation.
2026-03-18 16:04:12 -04:00
Mark Backman
4b704e6d3a GradiumSTTService improvements (#4066)
* Remove duplicate reconnection logic from Gradium STT

The _receive_messages method had its own while-True reconnect loop,
duplicating the reconnection handling already provided by
WebsocketService._receive_task_handler (exponential backoff, max
retries, error reporting). Flatten to just the inner message loop
and let the base class handle reconnection.

* Align Gradium STT VAD handling with base class patterns

Replace the process_frame override with a _handle_vad_user_stopped_speaking
override, which is the proper hook provided by STTService. Move
start_processing_metrics() into run_stt (matching Gladia's pattern).
Remove unused FrameDirection and VADUserStartedSpeakingFrame imports.

* Add transcript aggregation delay after flushed to capture trailing tokens

Gradium flushed response can arrive before all text tokens have been
delivered. Instead of finalizing immediately on flushed, start a short
timer (100ms) that allows trailing tokens to accumulate before pushing
the final TranscriptionFrame.

* Add changelog for PR #4066

* Change default encoding to pcm_16000

* Decouple encoding from sample_rate in Gradium STT

The encoding parameter now takes just the base type (pcm, wav, opus)
and the sample rate is derived from the pipeline audio_in_sample_rate,
assembled dynamically via input_format_from_encoding(). This fixes the
mismatch where SAMPLE_RATE=24000 was passed to the base class while
encoding defaulted to pcm_16000.
2026-03-18 15:57:34 -04:00
Paul Kompfner
b1a8588209 feat: add 12- and 14d- image/video examples for OpenAI Responses 2026-03-18 15:39:06 -04:00
Paul Kompfner
5de794e1da feat: add service_tier support to OpenAIResponsesLLMService 2026-03-18 15:29:04 -04:00
Paul Kompfner
891966346c feat: add 55zi update-settings example for OpenAI Responses 2026-03-18 15:17:16 -04:00
Paul Kompfner
2001ab4577 feat: add 20a persistent context example for OpenAI Responses 2026-03-18 15:14:28 -04:00
Paul Kompfner
0449df828c chore: update previous_response_id comment 2026-03-18 15:07:10 -04:00
Paul Kompfner
951bb0c1a7 feat: set store=False and add run_inference tests
Set store=False in Responses API calls since we send full conversation
history as input items and don't use previous_response_id.

Add 5 run_inference tests for OpenAIResponsesLLMService using real
LLMContext and adapter (only HTTP client mocked).
2026-03-18 14:47:12 -04:00
Paul Kompfner
21b1812c71 chore: add note about previous_response_id and empty input handling 2026-03-18 14:26:51 -04:00
Paul Kompfner
c4f21ef76b test: add run_inference tests for OpenAIResponsesLLMService
Uses real LLMContext and adapter (only HTTP client is mocked) to test
basic inference, client exception propagation, system_instruction
override, empty context fallback, and max_tokens override.
2026-03-18 14:17:21 -04:00
Paul Kompfner
a7167ad121 test: add run_inference tests for OpenAIResponsesLLMService
Tests cover basic inference, client exception propagation,
system_instruction override, and max_tokens override.
2026-03-18 14:09:17 -04:00
Paul Kompfner
eaccb96454 docs: add changelog for OpenAI Responses API service 2026-03-18 11:46:49 -04:00
Paul Kompfner
45186cc4ce feat: add OpenAI Responses API LLM service
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).

- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
2026-03-18 11:45:23 -04:00
joachimchauvet
0378fb0d91 fix(livekit): suppress InvalidState log spam from audio mixer during interruptions 2026-03-18 16:04:42 +02:00
Mark Backman
53388e0426 Merge pull request #4063 from pipecat-ai/mb/wake-word-start-strategy 2026-03-17 21:05:10 -04:00
Mark Backman
edf16c5533 fix: pass list-type Deepgram settings as lists instead of stringifying
List-valued settings like keyterm, keywords, search, redact, and replace
were being converted to strings before being passed to the SDK connect()
method. The SDK expects lists so its encode_query can produce repeated
query params (keyterm=a&keyterm=b).
2026-03-17 18:24:20 -04:00
Mark Backman
d4f69dd333 Merge pull request #4046 from pipecat-ai/mb/fix-4045
Fix SonioxSTTService crash when language_hints contains plain strings…
2026-03-17 16:41:11 -04:00
Mark Backman
a32f558b07 Merge pull request #4026 from pipecat-ai/mb/fix-deepgram-base-url
Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
2026-03-17 16:39:24 -04:00
Mark Backman
4e99cb39b0 Merge pull request #4056 from pipecat-ai/mb/fix-filter-turns-deprecation
Fix deprecation warning when using filter_incomplete_user_turns
2026-03-17 16:23:43 -04:00
Mark Backman
10b3bff525 Merge pull request #4058 from pipecat-ai/mb/improve-stt-tts-language-code-robustness
fix: resolve raw language strings through Language enum for proper service conversion
2026-03-17 16:20:12 -04:00
Mark Backman
95ee096622 Merge pull request #4057 from pipecat-ai/mb/fix-4053
Fix stale state in user turn stop strategies between turns
2026-03-17 16:19:31 -04:00
Mark Backman
6799995b0a Merge pull request #4062 from pipecat-ai/mb/update-pyasn1-0.6.3
Update uv.lock with pyasn1 v0.6.3
2026-03-17 16:19:13 -04:00
Mark Backman
05abc95b5f Update uv.lock with pyasn1 v0.6.3 2026-03-17 16:10:35 -04:00
Mark Backman
18e654b3f0 docs: add changelog for #4058 2026-03-17 12:01:50 -04:00
Mark Backman
790a23d2e5 fix: resolve raw language strings through Language enum for proper service conversion
Raw strings like "de-DE" passed as the language parameter to TTS/STT services
were bypassing the Language enum resolution logic, causing silent failures
(e.g. ElevenLabs expects "de" not "de-DE"). Now raw strings are first converted
to Language enums so they go through the same resolve_language() path, with a
warning logged for unrecognized strings.
2026-03-17 12:00:28 -04:00
Mark Backman
d70df1d8b0 Add changelog for #4057 2026-03-17 11:35:38 -04:00
Mark Backman
5000b040dd Fix stale state in user turn stop strategies between turns
Reset stop strategies at turn start (not just turn stop) so that late
transcriptions arriving between turns do not leave stale _text that
causes premature stops on the next turn. Also cancel pending timeout
tasks in reset() for both SpeechTimeout and TurnAnalyzer strategies.
2026-03-17 11:31:08 -04:00
Mark Backman
248419a7c4 Merge pull request #4050 from pipecat-ai/copilot/update-enable-dialout-to-false
Fix PSTN runner defaulting enable_dialout to True
2026-03-17 11:07:23 -04:00
Mark Backman
024e2ebd4e Fix deprecation warning when using filter_incomplete_user_turns 2026-03-17 10:51:01 -04:00
Mark Backman
091f88e42e feat: add enable_dialout parameter to configure() for dial-out rooms
Expose enable_dialout as a configure() parameter (default False) so
dial-out examples can opt in without needing to build DailyRoomProperties
manually.
2026-03-17 09:03:50 -04:00
Mark Backman
e11b486312 fix: clean up configure() type hints, deduplicate token expiry, and improve comment
Narrow misleading Optional type hints on parameters that never accept
None, extract the duplicated token_exp_duration * 60 * 60 calculation,
remove unnecessary forward-reference quotes on DailyMeetingTokenProperties,
and clarify why enable_dialout is explicitly set to False.
2026-03-17 08:54:07 -04:00
Mark Backman
f54b3c6884 Merge pull request #4048 from julienvantyghem/daily-audio-only-docstring
update enable_recording param  documentation
2026-03-17 08:21:50 -04:00
copilot-swe-agent[bot]
7e60320a74 fix: set enable_dialout to False in PSTN runner to prevent room creation failures
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-03-17 04:04:11 +00:00
copilot-swe-agent[bot]
89cb0f089e Initial plan 2026-03-17 04:01:00 +00:00
Julien Vantyghem
e5b4403ed4 update docstring following https://github.com/pipecat-ai/pipecat/pull/3916 2026-03-16 19:54:04 -06:00
Mark Backman
a0595adbdc Merge pull request #4012 from pipecat-ai/mb/deprecate-old-local-smart-turn 2026-03-16 21:09:26 -04:00
Mark Backman
dc1632bbac Merge pull request #4023 from pipecat-ai/mb/update-small-webrtc-prebuilt-2.4.0 2026-03-16 21:09:08 -04:00
Mark Backman
53f49ac094 Merge pull request #4024 from pipecat-ai/mb/fix-lang-enum-stt-tts 2026-03-16 21:08:48 -04:00
Mark Backman
bf02d61418 Merge pull request #4025 from pipecat-ai/mb/fix-example-system-instruction 2026-03-16 21:07:01 -04:00
Mark Backman
154a8d1987 Merge pull request #4035 from pipecat-ai/mb/bump-pyjwt-version 2026-03-16 21:06:31 -04:00
Mark Backman
fa5b757408 Merge pull request #4044 from pipecat-ai/mb/pyopenssl-upgrade 2026-03-16 21:06:09 -04:00
Aleix Conchillo Flaqué
c765bc98d3 Merge pull request #4047 from pipecat-ai/aleix/daily-python-0.25.0-dtmf-events
Update daily-python to 0.25.0 and add DTMF input events
2026-03-16 18:05:10 -07:00
Aleix Conchillo Flaqué
59486d5abf Add changelog entries for PR #4047 2026-03-16 17:58:12 -07:00
Aleix Conchillo Flaqué
5cb6aecc9f Add DTMF input event support to Daily transport
Handle Daily's on_dtmf_event callback, convert it to an
InputDTMFFrame pushed into the input transport. Also add __str__
methods to InputDTMFFrame and OutputDTMFFrame for better logging.
2026-03-16 17:57:39 -07:00
Aleix Conchillo Flaqué
5c685c35d7 pyproject: update daily-python to 0.25.0 2026-03-16 17:41:44 -07:00
Aleix Conchillo Flaqué
1a1d5e6a84 Merge pull request #4006 from pipecat-ai/aleix/task-frame-flush-ordering
handle EndTaskFrame, StopTaskFrame and CancelTaskFrame downstream
2026-03-16 17:35:11 -07:00
Mark Backman
abb8bae6f7 Add changelog for #4046 2026-03-16 19:51:37 -04:00
Mark Backman
2801439e48 Fix OpenAI STT crash when language is a plain string instead of Language enum 2026-03-16 19:48:49 -04:00
Mark Backman
3b8d040e41 Fix SonioxSTTService crash when language_hints contains plain strings (#4045)
Refactor language_to_soniox_language to use resolve_language + LANGUAGE_MAP
pattern consistent with other services. Fix resolve_language fallback to use
str(language) instead of language.value so plain strings don't crash.
2026-03-16 19:45:03 -04:00
Mark Backman
538b9fa2d9 Bump pyopenssl in uv.lock to 26.0.0 2026-03-16 17:58:44 -04:00
Mark Backman
b437cbe126 Merge pull request #4037 from omChauhanDev/fix/llm-switcher-timeout-secs
forward timeout_secs in LLMSwitcher register methods
2026-03-15 10:08:11 -04:00
Om Chauhan
ed0f5ab09b added changelog 2026-03-15 19:15:18 +05:30
Om Chauhan
a6ad8a355b forward timeout_secs in LLMSwitcher register methods 2026-03-15 19:10:32 +05:30
Mark Backman
e8415b7451 Add changelog for #4035 2026-03-15 08:56:54 -04:00
Mark Backman
24c3d23229 Bump PyJWT minimum version to 2.12.0 for CVE-2026-32597
Addresses Dependabot alert #165 (GHSA-752w-5fwx-jx9f) where PyJWT
<= 2.11.0 accepts unknown `crit` header extensions.
2026-03-15 08:53:06 -04:00
Mark Backman
2f7c441c1c Add changelog for #4026 2026-03-13 13:55:27 -04:00
Mark Backman
79b7a0f969 Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
The base_url parameter previously forced wss:// and https:// schemes,
breaking air-gapped or private deployments that need ws:// or http://.
Extract URL derivation into _derive_deepgram_urls() helper that respects
the developers scheme choice while deriving the paired WebSocket and
HTTP URLs the Deepgram SDK requires.

Closes #4019
2026-03-13 13:53:06 -04:00
Mark Backman
978a1a2083 Update the system_instruction wording in the foundational examples to not mention WebRTC call 2026-03-13 12:22:10 -04:00
Mark Backman
0ec5f5e5ac Add missing language deprecations for XTTSService, LmntTTSService 2026-03-13 11:33:59 -04:00
Mark Backman
1ea23ad362 Add changelog for #4024 2026-03-13 10:58:51 -04:00
Mark Backman
9f2f73b6b4 Remove redundant per-service language conversion from subclasses
Now that the base TTSService and STTService handle Language enum
conversion at init time, subclasses no longer need to convert in their
own __init__ methods. Remove conversion calls from hardcoded defaults,
params paths, and deprecated direct arg paths across 22 service files.

Services just pass raw Language enums and let the base class convert
via language_to_service_language() polymorphic dispatch.
2026-03-13 10:57:04 -04:00
Mark Backman
8467058e48 Fix Language enum conversion at init time in base TTS/STT services
When a Language enum (e.g. Language.ES) is passed via
settings=Service.Settings(language=Language.ES), it gets stored as-is
without conversion to the service-specific code. The base
_update_settings() handles this for runtime updates, but at init time
apply_update() copies the raw enum. This causes API errors because
services send the unconverted enum value.

Add language conversion in TTSService.__init__ and STTService.__init__
after super().__init__(), using the subclass language_to_service_language()
via normal method resolution.
2026-03-13 10:56:33 -04:00
Mark Backman
7365ebfdf9 Add changelog for #4023 2026-03-13 10:22:58 -04:00
Mark Backman
1064482ade Update pipecat-ai-small-webrtc-prebuilt to 2.4.0 2026-03-13 10:20:51 -04:00
Mark Backman
ed0b8dadb5 Add changelog for #4012 2026-03-12 17:22:13 -04:00
Mark Backman
de38ca626d Deprecate LocalSmartTurnAnalyzerV2 and LocalCoreMLSmartTurnAnalyzer
Both analyzers are superseded by LocalSmartTurnAnalyzerV3. Added
deprecation warnings and docstring notices following the existing
pattern from LocalSmartTurnAnalyzer.
2026-03-12 17:19:32 -04:00
kompfner
30d95e3b84 Merge pull request #4009 from pipecat-ai/pk/perplexity-message-ordering-strictness
Add PerplexityLLMAdapter for message ordering strictness
2026-03-12 16:51:11 -04:00
Paul Kompfner
99f28120b7 Remove trailing system→user conversion for cross-call stability
Perplexity appears to have statefulness within a conversation, so
converting a system message to "user" in one call and then back to
"system" in the next (after more messages are appended) causes API
errors. Remove the trailing system→user conversion entirely — if the
context only has system messages, the API call will fail but the
mistake will be caught right away.
2026-03-12 16:07:39 -04:00
Paul Kompfner
e69f5a76e1 Add test for trailing assistant+system ordering, improve docstring
Add test exercising the step 3 ordering where stripping a trailing
assistant exposes a system message that then gets converted to user.
Move the reasoning about when a trailing system message can occur
into the docstring.
2026-03-12 15:24:17 -04:00
Paul Kompfner
7f98cc9921 Remove initial system message merging, handle trailing system messages
Perplexity allows multiple initial system messages, so don't merge them.
Instead, skip system-system pairs during the consecutive same-role merge
step. Broaden the trailing message fix to convert any trailing system
message to user (not just a lone system message), so contexts with only
system messages don't fail.
2026-03-12 15:14:56 -04:00
Mark Backman
43a2d55c61 Merge pull request #4010 from pipecat-ai/mb/quickstart-cloud-build
Update quickstart to use cloud builds
2026-03-12 15:07:06 -04:00
Paul Kompfner
e4bf6281c6 Add changelog for #4009 2026-03-12 14:56:37 -04:00
Paul Kompfner
0373f85b85 Add PerplexityLLMAdapter to enforce Perplexity's message ordering constraints
Perplexity's API is stricter than OpenAI about conversation history:
- Requires strict alternation between user/tool and assistant messages
- Disallows system messages except as the initial message
- Requires the last message to be user or tool

The new adapter transforms messages before sending to satisfy all three
constraints: merging consecutive initial system messages, converting
non-initial system to user, merging consecutive same-role messages, and
removing trailing assistant messages.

Also adds dual-system-instruction warnings to Cerebras, Fireworks,
Mistral, Perplexity, and SambaNova services (matching the existing
BaseOpenAILLMService pattern), and updates the warning text in
BaseOpenAILLMService to be more descriptive.
2026-03-12 14:56:30 -04:00
Mark Backman
38a4d4ff23 Update quickstart to use cloud builds 2026-03-12 14:46:49 -04:00
Aleix Conchillo Flaqué
f6f08d19a8 Add changelog for #4006 2026-03-12 11:34:25 -07:00
Aleix Conchillo Flaqué
2eccd28cf0 handle EndTaskFrame, StopTaskFrame and CancelTaskFrame downstream
EndTaskFrame and StopTaskFrame are now ControlFrames instead of
SystemFrames, so they flow through the pipeline and queue behind
pending work. This prevents races where EndFrame could overtake
in-flight frames (e.g. function call responses).

CancelTaskFrame and InterruptionTaskFrame remain SystemFrames
(via new TaskSystemFrame base): since they need immediate propagation.

The sink now catches EndTaskFrame, StopTaskFrame and CancelTaskFrame
downstream and re-queues it upstream to the task, ensuring the full
pipeline drains before shutdown begins.
2026-03-12 11:34:25 -07:00
Aleix Conchillo Flaqué
374bfd4068 Merge pull request #4007 from pipecat-ai/aleix/fix-parallel-pipeline-flush-and-tts-stop-order
Fix ParallelPipeline flush ordering and TTS stop sequence
2026-03-12 10:21:31 -07:00
Aleix Conchillo Flaqué
a461b2b9e6 Add changelog entries for PR #4007 2026-03-12 10:16:29 -07:00
Aleix Conchillo Flaqué
1a66bdef8e Fix TTS stop ordering to drain audio contexts before canceling
Wait for _audio_context_task to finish draining the contexts queue
before canceling _stop_frame_task, ensuring all pending audio
contexts are processed during shutdown.
2026-03-12 10:16:29 -07:00
Aleix Conchillo Flaqué
73a56f5d81 Fix ParallelPipeline flush ordering and buffered frame handling
Flush buffered frames before pushing the synchronization frame so
downstream processors see the buffered frames first.  Switch to a
while-loop with pop(0) so frames added to the buffer during flush
are also drained.
2026-03-12 10:16:29 -07:00
kompfner
383300979d Merge pull request #4004 from pipecat-ai/pk/service-settings-update-frame-can-target-specific-service
Add optional `service` field to `ServiceUpdateSettingsFrame` for targ…
2026-03-12 11:48:41 -04:00
Paul Kompfner
27b686db8c Don't bother honoring the new LLMUpdateSettingsFrame.service field in the deprecated OpenAIRealtimeBetaLLMService 2026-03-12 11:04:49 -04:00
Mark Backman
3ffa72170b Merge pull request #3457 from ahoshaiyan/fix/reduce-tool-result-context-size
Reduce Tool Result Context Size by Using UTF-8 for JSON Serialization
2026-03-12 10:41:33 -04:00
Mark Backman
1fe1f0f439 Apply ensure_ascii=False to remaining LLM services and fix changelog format 2026-03-12 10:35:19 -04:00
Ali Alhoshaiyan
765fbeec63 Add changelog 2026-03-12 10:35:19 -04:00
Ali Alhoshaiyan
84538b0ca8 Reduce Call Tool Result Context Size by Allowing UTF-8 in JSON Serialization 2026-03-12 10:35:19 -04:00
Mark Backman
1c676c2073 Merge pull request #4005 from pipecat-ai/add-sip-provider-room-geo-to-configure
Add sip_provider and room_geo params to configure()
2026-03-12 09:28:28 -04:00
Mark Backman
bf66ae7e46 Add changelog for #4005 2026-03-12 09:22:31 -04:00
Varun Singh
7a7d600985 Add sip_provider and room_geo parameters to configure()
Add convenience parameters to configure() so callers don't need to
manually construct DailyRoomProperties/DailyRoomSipParams for common
SIP provider and geo configuration.
2026-03-11 21:50:10 -07:00
Paul Kompfner
36b57252b4 Add changelog for PR #4004 2026-03-11 21:47:51 -04:00
Paul Kompfner
65e4e365dc Add optional service field to ServiceUpdateSettingsFrame for targeting a specific service instance
When `service` is set and doesn't match, the service forwards the frame instead of consuming it. This allows targeting a specific service when multiple services of the same type exist in the pipeline.
2026-03-11 21:41:43 -04:00
kompfner
36f9a6d809 Merge pull request #4003 from pipecat-ai/pk/fix-deprecated-vad-analyzer-usage
Fix deprecated vad_analyzer usage in examples
2026-03-11 20:55:39 -04:00
Mark Backman
904331bba1 Merge pull request #4001 from pipecat-ai/mb/simli-settings
Migrate SimliVideoService to AIService with Settings pattern
2026-03-11 17:45:59 -04:00
Mark Backman
11b14b7857 Add changelog for PR #4001 2026-03-11 17:40:53 -04:00
Mark Backman
c0a3cdd35c Merge pull request #4002 from pipecat-ai/mb/update-quickstart-0.0.105
Update quickstart example for 0.0.105
2026-03-11 17:39:07 -04:00
Paul Kompfner
69e7677f4f Remove changelog for #4003 2026-03-11 17:33:20 -04:00
Paul Kompfner
9a0568e6fe Add changelog for #4003 2026-03-11 17:32:39 -04:00
Paul Kompfner
ccc2549c0c Broaden the vad_analyzer deprecation warning in BaseInputTransport to account for use-cases where there is no LLMUserAggregator at play 2026-03-11 17:28:26 -04:00
Paul Kompfner
e456a6bb23 Move away from remaining deprecated TransportParams.vad_analyzer usage in example files. Skip updates to deprecated services. 2026-03-11 17:17:40 -04:00
Mark Backman
2d9dc2fa1c Update quickstart example for 0.0.105 2026-03-11 17:12:59 -04:00
Mark Backman
59dc30a84d Merge pull request #3997 from pipecat-ai/mb/sarvam-package-0.1.26
Update sarvamai dependency from 0.1.26a2 to 0.1.26
2026-03-11 16:59:32 -04:00
Mark Backman
a54aa2d1f8 Migrate SimliVideoService to AIService with Settings pattern
Align Simli with HeyGen/Tavus by extending AIService instead of
FrameProcessor and using a ServiceSettings dataclass. InputParams is
preserved but deprecated; its fields are promoted to direct init params.
Lifecycle handling moves to start()/stop()/cancel() methods.
2026-03-11 16:56:41 -04:00
Mark Backman
3ceff3d5fd Merge pull request #4000 from pipecat-ai/mb/fix-openai-default-model
Fix: Restore default model to gpt-4.1 for OpenAI, Azure
2026-03-11 16:29:51 -04:00
kompfner
52057d628e Merge pull request #3999 from pipecat-ai/pk/camb-voice-int
Override CambTTSSettings.voice type from str to int to match Camb.ai'…
2026-03-11 16:18:59 -04:00
Mark Backman
4a45145cba Restored the default model to gpt-4.1 for OpenAI and Azure LLM services
The default model for OpenAILLMService and AzureLLMService was still set
to gpt-4o. Restored it to gpt-4.1. Also, removed hardcoded gpt-4o/gpt-4o-mini
model references from examples so they pick up the new default.
2026-03-11 16:18:47 -04:00
Paul Kompfner
080ed22ff5 Override CambTTSSettings.voice type from str to int to match Camb.ai's integer voice IDs 2026-03-11 15:44:05 -04:00
Mark Backman
71e6158861 Add changelog for PR #3997 2026-03-11 14:18:47 -04:00
Mark Backman
a9e124b84f Update sarvamai dependency from 0.1.26a2 to 0.1.26
Bump the Sarvam AI SDK to the stable release version.
2026-03-11 14:17:40 -04:00
kompfner
65561a1d83 Merge pull request #3996 from pipecat-ai/pk/prefer-nested-settings-alias
Prefer nested settings alias
2026-03-11 13:41:29 -04:00
Paul Kompfner
e5b60ba095 Make deprecated-init-param warnings recommend the preferred Service.Settings(...) pattern
Move the warning helper into AIService as _warn_init_param_moved_to_settings.
It now uses type(self).__name__ to produce messages like
"Use settings=AnthropicLLMService.Settings(model=...)" instead of the raw
settings class name "AnthropicLLMSettings(model=...)". Callers no longer need
to pass the settings class explicitly.
2026-03-11 13:04:15 -04:00
Paul Kompfner
eb9212f152 Update COMMUNITY_INTEGRATIONS.md code sample to prefer Settings alias over raw settings class name 2026-03-11 12:37:43 -04:00
Paul Kompfner
51a8a28a99 Prefer Service.ThinkingConfig over raw ThinkingConfig class names in Anthropic and Google services and examples 2026-03-11 12:34:10 -04:00
Paul Kompfner
6b168d6bbb Prefer Service.Settings over raw settings class names across all services
Replace direct references to settings class names (e.g. `FooSettings`) with the nested `Settings` alias form throughout all 87 service files:
- Type annotations: `Settings`
- Runtime code: `self.Settings`
- Docstrings: `ServiceClass.Settings`
- Cross-file inheritance: `ParentService.Settings`

This makes the `Settings` alias the canonical way to reference a service's settings, keeping only the class definition and alias assignment as the remaining hits for each raw settings class name.
2026-03-11 12:15:00 -04:00
kompfner
cbb4835e7b Merge pull request #3991 from pipecat-ai/pk/fix-out-of-date-docstrings
Fix out of date docstrings
2026-03-11 10:54:40 -04:00
Paul Kompfner
3cbd27d202 Add changelog for PR #3991 2026-03-11 10:44:15 -04:00
Paul Kompfner
42262d10bb Move OpenAIRealtimeSTTService's noise_reduction into its Settings object, as it might be useful to update it at runtime, and fix outdated OpenAIRealtimeSTTService docstring example 2026-03-11 10:44:15 -04:00
Paul Kompfner
df82df8e39 Fix outdated Google + Gemini TTS service docstring examples 2026-03-11 10:14:18 -04:00
Paul Kompfner
0ebcb55582 Fix outdated DeepgramSageMakerTTSService docstring example 2026-03-11 10:11:26 -04:00
Paul Kompfner
264ce681f7 Fix outdated DeepgramSageMakerSTTService docstring example 2026-03-11 10:10:15 -04:00
Paul Kompfner
916936d3ee Fix outdated Sarvam TTS docstring examples 2026-03-11 10:07:07 -04:00
Paul Kompfner
087abc9bb9 Fix outdated CambTTSService docstring example 2026-03-11 10:03:21 -04:00
Aleix Conchillo Flaqué
7e88b13421 Merge pull request #3983 from pipecat-ai/changelog-0.0.105
Release 0.0.105 - Changelog Update
2026-03-10 17:59:02 -07:00
aconchillo
610dc25fb1 Update changelog for version 0.0.105 2026-03-10 17:58:32 -07:00
Aleix Conchillo Flaqué
327bcfa8d2 Merge pull request #3982 from pipecat-ai/aleix/fix-examples
Fix Groq, Google, and Nvidia examples
2026-03-10 17:37:26 -07:00
Aleix Conchillo Flaqué
4c19337d89 Fix examples: Groq model, Google settings class, Nvidia system instruction 2026-03-10 15:29:52 -07:00
Aleix Conchillo Flaqué
a4310d4335 Merge pull request #3980 from pipecat-ai/aleix/move-google-vertex-openai
Move Google Vertex and OpenAI LLM modules to subpackages
2026-03-10 13:37:02 -07:00
Aleix Conchillo Flaqué
23218aaed7 Add changelog for #3980 2026-03-10 13:04:16 -07:00
Aleix Conchillo Flaqué
7be2c43e1d Update imports to use new google.gemini_live.vertex path 2026-03-10 13:00:31 -07:00
Aleix Conchillo Flaqué
ea09586db6 Add deprecation stub for google/gemini_live/llm_vertex.py 2026-03-10 13:00:02 -07:00
Aleix Conchillo Flaqué
d086b9f138 Move google/gemini_live/llm_vertex.py to google/gemini_live/vertex/llm.py 2026-03-10 12:59:36 -07:00
Aleix Conchillo Flaqué
b23652caa6 Update imports to use new google.vertex and google.openai paths 2026-03-10 12:58:04 -07:00
Aleix Conchillo Flaqué
4fa3890cec Add deprecation stub for google/llm_openai.py 2026-03-10 12:55:16 -07:00
Aleix Conchillo Flaqué
8ea006739c Move google/llm_openai.py to google/openai/llm.py 2026-03-10 12:54:37 -07:00
Aleix Conchillo Flaqué
b159d02b0c Add deprecation stub for google/llm_vertex.py 2026-03-10 12:54:05 -07:00
Aleix Conchillo Flaqué
0df421de9c Move google/llm_vertex.py to google/vertex/llm.py 2026-03-10 12:53:13 -07:00
Aleix Conchillo Flaqué
ed5b061716 Merge pull request #3979 from pipecat-ai/aleix/daily-optional-transcription-settings
Clean up start_transcription to use its settings parameter
2026-03-10 12:51:31 -07:00
kollaikal-rupesh
80bd935c19 Add ServiceSwitcherStrategyFailover for automatic failover on service errors (#3870)
* Add ServiceSwitcherStrategyFailover for automatic error-based service switching

Introduce a strategy hierarchy: ServiceSwitcherStrategy (base) →
ServiceSwitcherStrategyManual (handles ManuallySwitchServiceFrame) →
ServiceSwitcherStrategyFailover (adds error-based failover). ServiceSwitcher
now defaults to ServiceSwitcherStrategyManual with strategy_type optional.
Non-fatal ErrorFrames are forwarded to the strategy via handle_error().

* Move metadata request into _set_active_if_available

Requesting metadata is part of making a service active, so it belongs
alongside setting _active_service and firing on_service_switched. This
removes the duplicate queue_frame calls from ServiceSwitcher push_frame
and process_frame.
2026-03-10 15:37:30 -04:00
Mark Backman
43a9e9a1b5 Merge pull request #3899 from pipecat-ai/mb/tracing-service-settings-comment
Add defensive comment for given_fields() usage in tracing
2026-03-10 15:33:57 -04:00
Aleix Conchillo Flaqué
11a0c11050 Fix start_transcription ignoring its settings argument
DailyTransportClient.start_transcription() accepted a settings
parameter but always used self._params.transcription_settings
instead, silently discarding any custom settings passed by callers.
2026-03-10 12:08:53 -07:00
Aleix Conchillo Flaqué
4a2d57511d Make DailyParams.transcription_settings optional
Change transcription_settings to Optional[DailyTranscriptionSettings]
defaulting to None. The default settings are now applied at the call
site when transcription is started, and start_transcription receives
the serialized settings dict directly.
2026-03-10 11:55:38 -07:00
Aleix Conchillo Flaqué
743e2ac277 Merge pull request #3831 from pipecat-ai/aleix/custom-video-tracks
Replace VirtualCameraDevice with CustomVideoTrack + custom video track support
2026-03-10 11:44:29 -07:00
Aleix Conchillo Flaqué
86597cc9ec Add changelog entries for PR #3831 2026-03-10 11:32:16 -07:00
Aleix Conchillo Flaqué
14dd028b8f Add custom video track example with per-track params 2026-03-10 11:32:16 -07:00
Aleix Conchillo Flaqué
18e99123af Replace VirtualCameraDevice with CustomVideoTrack and add custom video track support
Use CustomVideoSource/CustomVideoTrack for the default camera output instead of
VirtualCameraDevice, mirroring how audio already uses CustomAudioSource/CustomAudioTrack.
Add support for custom video destinations (register_video_destination, add/remove
custom video tracks, routing in write_video_frame) so multiple video tracks can be
published simultaneously.
2026-03-10 11:32:16 -07:00
kompfner
6c4a46dc79 Merge pull request #3978 from pipecat-ai/pk/fix-inaccurate-comment
Fix an out-of-date comment for accuracy. In the OpenAI LLM service, w…
2026-03-10 14:29:39 -04:00
Mark Backman
9b26faff05 Merge pull request #3961 from ai-coustics/goekmengoergen/sys-663-re-enable-enhancement-level-feature-on-pipecat
Add enhancement_level support to `AICFilter`.
2026-03-10 14:24:15 -04:00
Paul Kompfner
3790640322 Fix an out-of-date comment for accuracy. In the OpenAI LLM service, we *don't* replace any context system messages with system instructions from the constructor. 2026-03-10 13:59:01 -04:00
Aleix Conchillo Flaqué
c25d5af8c8 Merge pull request #3970 from pipecat-ai/aleix/update-daily-python
Update daily-python to 0.24.0
2026-03-10 10:49:27 -07:00
Aleix Conchillo Flaqué
6e52623959 Merge pull request #3976 from pipecat-ai/aleix/fix-google-system-instruction-priority
Fix Google LLM system instruction priority
2026-03-10 10:48:28 -07:00
Mark Backman
912f1be31c Add system_instruction parameter to run_inference (#3968)
* Add system_instruction parameter to run_inference

Allow callers to provide a custom system instruction directly when calling
run_inference, without having to construct provider-specific context objects.

For OpenAI, the instruction is prepended as a system message (preserving
existing messages). For Anthropic, Google, and AWS Bedrock, it overrides the
single system field with a warning when an existing system instruction is
present in the context.

* Use system_instruction parameter in _generate_summary

Pass the summarization prompt via run_inference's system_instruction
parameter instead of embedding it as a system message in the context.

* Add changelog for #3968
2026-03-10 12:57:23 -04:00
Mark Backman
0817a57f4c Merge pull request #3974 from pipecat-ai/mb/azure-stt-region-optional
Make Azure STT region optional when private_endpoint is used
2026-03-10 12:31:39 -04:00
Aleix Conchillo Flaqué
db27aaa790 Add changelog for #3976 2026-03-10 09:26:26 -07:00
Aleix Conchillo Flaqué
153705f05b Fix Google LLM system instruction priority
Constructor/settings system_instruction now takes priority over the
context system message. Previously the context value would overwrite
the constructor value on every call. Warn when both are set.
2026-03-10 09:25:42 -07:00
Mark Backman
54c767cce3 Merge pull request #3960 from sysradium/fix-realtime-calls
Treat conversation_already_has_active_response as non-fatal in Realtime API
2026-03-10 11:40:34 -04:00
Mark Backman
2ce9179662 Merge pull request #3958 from pipecat-ai/mb/deepgram-tts-audio-context
Route Deepgram WebSocket TTS audio through audio context queue
2026-03-10 11:37:39 -04:00
Mark Backman
50cc01a578 Guard against None context ID in append_to_audio_context
After interruption, both _playing_context_id and _turn_context_id are
None. If a subclass calls append_to_audio_context(None, frame), the
recovery path matches (None == None) and creates a bogus audio context
that blocks the handler from ever processing the real context.

Early-return when context_id is falsy to prevent this.
2026-03-10 11:34:03 -04:00
Mark Backman
d5c0789ab5 Add changelog for #3958 2026-03-10 11:34:03 -04:00
Mark Backman
92b5185165 Route Deepgram WebSocket TTS audio through audio context queue
The Deepgram TTS service was bypassing pipecats audio context management
system, pushing audio frames directly via push_frame() instead of routing
them through append_to_audio_context(). This caused stale audio to leak
into the pipeline after interruptions and missed ordered playback
guarantees.

- Route audio frames through append_to_audio_context() with context
  availability checks to discard stale post-interruption frames
- Handle Flushed responses by appending TTSStoppedFrame and removing
  the audio context to signal completion
- Replace _handle_interruption override with on_audio_context_interrupted
  hook (the recommended pattern used by ElevenLabs and Cartesia)
- Remove redundant process_frame override that caused double-flush
  (base class already flushes via on_turn_context_completed)
- Remove redundant start_tts_usage_metrics call (base class handles
  aggregated usage metrics)
2026-03-10 11:34:03 -04:00
sysradium
ba0ebd5525 Treat conversation_already_has_active_response as non-fatal in Realtime API 2026-03-10 15:57:55 +01:00
Gökmen Görgen
3a6f848a5b update test description. 2026-03-10 14:54:49 +01:00
kompfner
c660152a84 Merge pull request #3966 from pipecat-ai/pk/add-some-more-missing-55-examples
Add missing 55-* update-settings examples for OpenPipe LLM and XTTS TTS
2026-03-10 09:45:58 -04:00
Gökmen Görgen
a96702acfc fix test. 2026-03-10 14:41:18 +01:00
Gökmen Görgen
780559dc32 address feedback. 2026-03-10 14:23:00 +01:00
Gökmen Görgen
8e1c8a38e4 don't change enhancement level if bypass toggled. 2026-03-10 14:18:45 +01:00
Gökmen Görgen
483f6689ed address feedback, use one logging. 2026-03-10 13:52:13 +01:00
Gökmen Görgen
bc11bf9673 remove _is_filter_enabled from AICFilter and refactor related logic and tests. 2026-03-10 13:48:32 +01:00
Gökmen Görgen
82b300298a add changelog. 2026-03-10 13:36:15 +01:00
Gökmen Görgen
0c87fcc48c re-add bypass parameter support to AICFilter and update related unit tests. 2026-03-10 13:36:15 +01:00
Gökmen Görgen
df64f3f943 add enhancement_level support to AICFilter.
# Conflicts:
#	src/pipecat/audio/filters/aic_filter.py
2026-03-10 13:36:15 +01:00
Mark Backman
db22bf0f75 Merge pull request #3973 from yuki901/fish-audio-s2-pro
Update Fish Audio default model from s1 to s2-pro
2026-03-10 07:57:27 -04:00
Mark Backman
edc65fc45e Add changelog for #3974 2026-03-10 07:48:02 -04:00
Mark Backman
233867fdfb Make region optional and validate Azure STT config
Make `region` optional so users can provide only `private_endpoint`.
Raise ValueError if neither is provided, and warn if both are given
(private_endpoint takes priority).
2026-03-10 07:47:05 -04:00
yukiobata1
ceb53e044b Add changelog for #3973 2026-03-10 19:29:47 +09:00
yukiobata1
c7ef23dd22 Update Fish Audio default model from s1 to s2-pro 2026-03-10 18:22:20 +09:00
Kinshuk Bairagi
9cc2644719 Improve system message extraction in traced_llm
Enhanced the logic for extracting the system message in the traced_llm decorator to support LLMContext via adapter and handle exceptions gracefully. This improves compatibility with different context types and ensures better tracing information.
2026-03-10 11:23:29 +05:30
Aleix Conchillo Flaqué
d79e35d84f Add changelog for #3970 2026-03-09 20:47:42 -07:00
Aleix Conchillo Flaqué
00eb190424 Update daily-python to 0.24.0 2026-03-09 20:47:13 -07:00
Mark Backman
0dc95692ba Merge pull request #3967 from pipecat-ai/mb/fix-azure-stt-private-endpoint 2026-03-09 21:57:10 -04:00
Mark Backman
07b901c2a5 Add changelog for #3967 2026-03-09 15:05:20 -04:00
Mark Backman
f533dc3203 Fix Azure STT SpeechConfig failing when private_endpoint is provided
SpeechConfig does not accept both `region` and `endpoint` simultaneously —
they are mutually exclusive. The previous code always passed both, which
raises ValueError when a user supplies a private_endpoint URL. Now we
conditionally pass either `endpoint` or `region`, never both.
2026-03-09 15:05:20 -04:00
Paul Kompfner
20c3f553b2 Add missing 55-* update-settings examples for OpenPipe LLM and XTTS TTS 2026-03-09 14:36:15 -04:00
kompfner
02791cd503 Merge pull request #3965 from pipecat-ai/pk/fix-integration-tests
Fix broken `test_unified_function_calling_anthropic` due to use of an…
2026-03-09 13:42:35 -04:00
kompfner
f2debd9b1d Merge pull request #3963 from pipecat-ai/pk/improve-claude-changelog-skill
Improve changelog skill: prioritize user-facing language and update e…
2026-03-09 13:00:32 -04:00
kompfner
c0c49d0ddc Merge pull request #3964 from pipecat-ai/pk/add-some-missing-55-examples
Add missing 55-* update-settings examples for Piper TTS, Kokoro TTS, …
2026-03-09 12:59:36 -04:00
Mark Backman
3d1f866e73 Merge pull request #3951 from pipecat-ai/mb/remove-unused-imports-2026-03-07
Remove unused imports, 2026-03-07
2026-03-09 12:49:08 -04:00
Mark Backman
786279f143 Remove unused imports, 2026-03-07 2026-03-09 12:44:47 -04:00
Paul Kompfner
9423d22051 Fix broken test_unified_function_calling_anthropic due to use of an unsupported/deprecated model.
Update the tests in test_integration_unified_function_calling.py to not specify particular models but instead just use service defaults (the tests shouldn't be model-dependent anyway)
2026-03-09 12:07:56 -04:00
Paul Kompfner
f1bb065823 Add missing 55-* update-settings examples for Piper TTS, Kokoro TTS, Whisper STT, and Whisper MLX STT
Also fix 13e-whisper-mlx.py to pass MLXModel.LARGE_V3_TURBO.value instead of the enum directly.
2026-03-09 11:54:25 -04:00
Filipi da Silva Fuchter
0c5e936aa5 Merge pull request #3936 from pipecat-ai/filipi/fix_push_aggregation
Fixed TTS context not being appended to the assistant message history
2026-03-09 11:14:38 -04:00
filipi87
f0c5925a79 Fixing Piper test. 2026-03-09 12:07:45 -03:00
Paul Kompfner
7f9169269c Improve changelog skill: prioritize user-facing language and update example changelog 2026-03-09 10:45:33 -04:00
Mark Backman
c16e534f73 Merge pull request #3952 from pipecat-ai/mb/settings-alias
Add Settings class attribute alias to all service classes
2026-03-09 10:45:10 -04:00
filipi87
8ec160f71e Making the changelog more user friendly. 2026-03-09 11:37:11 -03:00
Filipi da Silva Fuchter
1e615cd095 Merge pull request #3962 from pipecat-ai/filipi/smallwebrtc_queue
Queuing the messages received before the data channel is ready
2026-03-09 10:29:05 -04:00
filipi87
ba87d1609c Only marking self._is_yielding_frames_synchronously if receiving TTSAudioRawFrame 2026-03-09 11:24:36 -03:00
Mark Backman
f7dc13c0de Update COMMUNITY_INTEGRATIONS.md for Settings alias class 2026-03-09 10:24:24 -04:00
filipi87
c5ce667387 Retrieving the context_id from the TTSStartedFrame 2026-03-09 11:10:42 -03:00
filipi87
097f9c0896 Fixed to push LLMAssistantPushAggregationFrame when the base TTSService class is responsible for pushing the TTSStoppedFrame. 2026-03-09 11:04:09 -03:00
Filipi da Silva Fuchter
16336a3ea4 Merge pull request #3937 from pipecat-ai/filipi/fix_orphan_function_call
Fix context summarization leaving orphaned tool responses in kept context.
2026-03-09 09:19:17 -04:00
Mark Backman
9eaa99c8e2 Merge pull request #3957 from pipecat-ai/mb/user-turn-completion-system-instruction
Move turn completion instructions to system_instruction
2026-03-09 09:17:06 -04:00
filipi87
4557ef8c42 Renaming method to _get_earliest_function_call_not_resolved_in_range 2026-03-09 10:16:02 -03:00
filipi87
aa693bb5ee Adding changelog entry for the SmallWebRTCConnection fix. 2026-03-09 10:11:40 -03:00
filipi87
74a06a6968 Adding extra comment. 2026-03-09 10:06:38 -03:00
filipi87
322e317a00 Adding guardrails in case the data channel is never established. 2026-03-09 10:04:33 -03:00
filipi87
25165d6e2b Queuing the messages received before the data channel is ready to send them. 2026-03-09 09:47:45 -03:00
Mark Backman
8a02e6fbc5 Merge pull request #3959 from ajmeraharsh/fix/livekit-call-state-updated-args
fix(livekit): remove redundant self arg in on_call_state_updated
2026-03-09 08:38:12 -04:00
Mark Backman
d85ba75dda Merge pull request #3953 from pipecat-ai/mb/deepgram-flux-on-the-fly
Add on-the-fly Configure support for Deepgram Flux STT
2026-03-09 08:36:00 -04:00
ajmeraharsh
ae6f159b18 chore: add changelog entry for #3959 2026-03-09 09:15:03 +04:00
Aleix Conchillo Flaqué
30d0cccef0 Merge pull request #3947 from pipecat-ai/aleix/summary-applied-event
Expose on_summary_applied event on LLMAssistantAggregator
2026-03-08 19:05:50 -07:00
Aleix Conchillo Flaqué
3b947b7844 Add changelog for #3947 2026-03-08 19:02:51 -07:00
Aleix Conchillo Flaqué
1f8cc3d216 Expose on_summary_applied event on LLMAssistantAggregator
Forward the on_summary_applied event from the internal summarizer to
the aggregator so users can listen for it without accessing private
members. Update summarization examples to use the new public event.
2026-03-08 19:02:51 -07:00
ajmeraharsh
57c4d72bf0 fix(livekit): remove redundant self arg in on_call_state_updated event
_on_call_state_updated passes (self, state) to _call_event_handler,
but _run_handler already prepends self when invoking the handler.
This causes handlers to receive 3 positional arguments instead of 2,
making the on_call_state_updated event unusable.

This aligns with how _on_first_participant_joined correctly passes
only the data arg without self.
2026-03-09 02:51:35 +04:00
Mark Backman
64155e8f06 Add changelog for #3957 2026-03-08 10:44:45 -04:00
Mark Backman
efda57de5c Move turn completion instructions to system_instruction
Turn completion instructions were being injected as a system message in
the LLM context, which caused warning spam when system_instruction was
also set, did not persist across full context updates, and broke LLMs
that do not support consecutive system messages.

Instead, compose the turn completion instructions into the LLM service
system_instruction field. This is managed via _base_system_instruction
which stores the original value for restoration when turn completion is
disabled.
2026-03-08 10:41:40 -04:00
Mark Backman
764c3c4f32 Merge pull request #3938 from koriyoshi2041/fix/replace-bare-except-handlers
fix: replace bare except handlers with specific exception types
2026-03-08 09:04:49 -04:00
Mark Backman
a32e0be120 Merge pull request #3956 from radhikagpt1208/fix/turn-completion-mixin-state-reset
Fix turn completion mixin not resetting state when no `InterruptionFrame` is emitted
2026-03-08 08:54:34 -04:00
radhikagpt1208
b14c8e0e94 Fix turn completion mixin not resetting state after each LLM response 2026-03-08 08:46:45 -04:00
kigland
57f0b6d75b fix: address review feedback on exception handling
- mcp_service.py: remove unnecessary try/except around debug log,
  use len(available_tools.tools) to match actual iteration target
- bedrock_adapter.py, aws/llm.py: add AttributeError to except tuple
  to handle None content (previously caught by bare except)
2026-03-08 12:28:03 +08:00
Mark Backman
edd568b002 Merge pull request #3954 from pipecat-ai/mb/revert-quickstart-changes
Revert changes to quickstart
2026-03-07 15:49:30 -05:00
Mark Backman
807759b874 Revert changes to quickstart 2026-03-07 15:44:26 -05:00
Mark Backman
cd28c82de3 Update examples to use the class Settings alias 2026-03-07 09:15:24 -05:00
Mark Backman
4ebdacdea2 Add changelog for #3953 2026-03-07 08:48:11 -05:00
Mark Backman
c5da3cf2bd Add on-the-fly Configure support for Deepgram Flux STT
Wire up the existing settings update infrastructure to send a Configure
WebSocket message when keyterm, eot_threshold, eager_eot_threshold, or
eot_timeout_ms change mid-stream, avoiding a full reconnect.
2026-03-07 08:37:27 -05:00
Mark Backman
26631a9c31 Add Settings class attribute alias to all service classes
Add a `Settings` class-level alias on every STT, LLM, TTS, image,
vision, and video service class pointing to its settings dataclass.
This lets developers discover the right settings class via the service
class itself (e.g. `GoogleSTTService.Settings(...)`) without needing
to know or import the separate settings class name.
2026-03-07 08:17:40 -05:00
Mark Backman
fdf9fb6f02 Merge pull request #3946 from pipecat-ai/mb/tts-settings-review
Review TTS settings
2026-03-07 07:48:26 -05:00
Mark Backman
1cfaea2007 Address code review feedback 2026-03-07 07:42:42 -05:00
kompfner
dc97ffc909 Merge pull request #3943 from pipecat-ai/pk/llm-settings-updates
Minor findings from auditing LLM settings
2026-03-06 22:39:22 -05:00
Paul Kompfner
622d9279cb Use exact service class names in LLMSettings docstrings 2026-03-06 22:35:55 -05:00
Paul Kompfner
6088e6eb52 Make budget_tokens optional in AnthropicThinkingConfig
budget_tokens is required when type is "enabled" and rejected when type is "disabled" (this is validated by the server)
2026-03-06 22:32:14 -05:00
Paul Kompfner
256c8f87b4 Add missing step 3 comment to LLM service init methods
Adds the explicit "no params object" step 3 comment to all
LLM services that skip from step 2 to step 4 in their
settings initialization sequence, matching the pattern
established in services that do have a params object.
2026-03-06 22:32:14 -05:00
Mark Backman
f630d79900 Merge pull request #3944 from pipecat-ai/mb/gemini-live-settings-examples-fixes 2026-03-06 22:14:32 -05:00
Mark Backman
ec93cd1d51 Fix settings update handling in additional STT services 2026-03-06 21:52:45 -05:00
Mark Backman
9c42d27f4d Support runtime language updates in Azure STT
Extract recognizer setup/teardown into _connect/_disconnect so
_update_settings can reconnect when language changes at runtime.
2026-03-06 21:07:03 -05:00
Mark Backman
536f1e178a Fix race condition in Deepgram STT disconnect causing error flood
Clear self._connection before sending close stream so run_stt stops
sending audio immediately during the WebSocket close handshake.
2026-03-06 21:01:31 -05:00
Mark Backman
750b87dc24 Fix AWS examples, update to sonnet 4.6 2026-03-06 20:53:22 -05:00
Mark Backman
671e9a6846 TTS service and example updates 2026-03-06 20:53:22 -05:00
Mark Backman
2c85d2056c Examples fixes for Gemini Live 2026-03-06 18:42:22 -05:00
Mark Backman
a97a086dbd Fix GeminiLiveLLMService init referencing undefined _params variable
Replace references to undefined `_params` with `self._settings` for
language and VAD config. Add missing `system_instruction` to default
settings to satisfy validate_complete(). Remove redundant line that
read language from the deprecated `params` arg.
2026-03-06 18:41:54 -05:00
Mark Backman
4ed3480e4b Update TTSSettings docstrings with the corresponding class name(s) 2026-03-06 16:40:38 -05:00
Mark Backman
d59c0ea6c1 Merge pull request #3941 from pipecat-ai/mb/stt-settings-updates
STT services: settings and examples fixes
2026-03-06 15:21:30 -05:00
Mark Backman
7d41049b35 Review feedback, clarify corresponding class in STTSettings docstrings 2026-03-06 15:17:02 -05:00
Mark Backman
6431ad8e2a Fix service settings init ordering and example bugs
- Speechmatics: move config build after super().__init__ and settings
  delta so turn_detection_mode (e.g. ADAPTIVE) takes effect
- Google STT: fix example passing bare Language enum instead of list
- Google TTS: add missing explicit defaults for all custom settings fields
- Soniox: fix accidental tuple wrapping of STT service in example
- Speechmatics examples: fix system->user role in kick-off messages
- Deepgram Flux: move tag from settings to __init__ (billing metadata)
- ElevenLabs STT: default tag_audio_events to None (use API default)
- Fal STT: simplify language default handling
- Google TTS: rename GoogleStreamTTSSettings to GoogleTTSSettings
2026-03-06 15:17:01 -05:00
Mark Backman
c3794956ef Add deprecation version, fix foundational example double system message 2026-03-06 15:16:58 -05:00
Mark Backman
940da9eeeb Add vad_threshold to AssemblyAISTTSettings
Wire vad_threshold through Settings, default_settings, the deprecated
connection_params path, and _build_ws_url query params.
2026-03-06 15:16:10 -05:00
Mark Backman
696e431e96 Broaden Service Settings docs to cover all AI service types
Use "AI service" language instead of listing specific types, add
ServiceSettings as a fallback for direct AIService subclasses, and
clarify delta mode description with a concrete frame example.
2026-03-06 15:16:10 -05:00
kompfner
1a1c5668de Merge pull request #3942 from pipecat-ai/pk/aws-nova-sonic-audio-config
Add AudioConfig class to AWSNovaSonicLLMService for non-deprecated au…
2026-03-06 14:58:22 -05:00
Filipi da Silva Fuchter
4b9fc8a30c Merge pull request #3804 from pipecat-ai/filipi/concurrent_audio_contexts
Allowing concurrent audio contexts
2026-03-06 14:49:57 -05:00
Paul Kompfner
9b7a86bb12 Add AudioConfig class to AWSNovaSonicLLMService for non-deprecated audio configuration
The audio fields (sample rates, sample sizes, channel counts) on the deprecated `Params` class had no non-deprecated equivalent. This adds an `AudioConfig` class and `audio_config` init arg so users can specify audio configuration without relying on the deprecated `params` parameter.
2026-03-06 14:39:53 -05:00
filipi87
3000037dec Changelog entries for the TTS improvements and fixes. 2026-03-06 16:16:25 -03:00
filipi87
07abd3d60f Fixed BotStoppedSpeakingFrame emission: now emitted as soon as TTSStoppedFrame is received, with a fallback silence-based timeout increased to reduce false positives 2026-03-06 16:16:11 -03:00
filipi87
88ff7c451b Refactored all 25+ TTS service implementations to use the new push_start_frame=True pattern 2026-03-06 16:15:59 -03:00
filipi87
24430d8d45 Fixing Piper test. 2026-03-06 16:15:26 -03:00
filipi87
921e9e1fc9 Refactoring TTS services to allow concurrent audio contexts. 2026-03-06 16:15:10 -03:00
filipi87
c243850cf1 Removing observer from the inworld example. 2026-03-06 16:14:23 -03:00
kompfner
817f88e90b Merge pull request #3940 from pipecat-ai/pk/grok-realtime-settings-pattern
Adopt the `settings` pattern for Grok Realtime session properties
2026-03-06 14:09:25 -05:00
Aleix Conchillo Flaqué
e65ceb4edc Merge pull request #3931 from pipecat-ai/aleix/examples-always-use-user-role
Update foundational examples to use system_instruction
2026-03-06 10:41:33 -08:00
Aleix Conchillo Flaqué
593b75bc8b Update foundational examples to use "user" role
Use system_instruction on LLM service constructors instead of adding
system messages to LLMContext. Messages added to context now use
"user" role.
2026-03-06 09:53:33 -08:00
Paul Kompfner
f4c039048c Adopt the settings pattern for Grok Realtime session properties
Move `session_properties` into `GrokRealtimeLLMSettings`, making `settings` the canonical way to configure Grok Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=GrokRealtimeLLMSettings(session_properties=...)`.

`system_instruction` is synced bidirectionally between the top-level settings field and `session_properties.instructions`, with top-level taking precedence on conflict. (Unlike OpenAI Realtime, Grok's `SessionProperties` has no `model` field, so no model sync is needed.)
2026-03-06 12:53:26 -05:00
kompfner
d84a250b62 Merge pull request #3939 from pipecat-ai/pk/openai-realtime-settings-pattern
Adopt the `settings` pattern for OpenAI Realtime session properties
2026-03-06 12:39:08 -05:00
Paul Kompfner
2b8a6d9ca4 In OpenAI/Azure Realtime examples, migrate to settings=OpenAIRealtimeLLMSettings(...) pattern
Move `session_properties` and `system_instruction` into the `settings` arg, matching the canonical pattern used across the codebase.
2026-03-06 12:00:41 -05:00
mattie ruth backman
18494658c3 rename models_vX to models.py and models_deprecated.py 2026-03-06 11:49:59 -05:00
mattie ruth backman
da0975a4e0 Fix forward reference 2026-03-06 11:49:59 -05:00
mattie ruth backman
49fba5209c copilot feedback 2026-03-06 11:49:59 -05:00
mattie ruth backman
158424aa28 Convert RTVI framework into a structured package
Replace the monolithic rtvi.py with a proper package split by concern
protocol version:
  - models_v0.py: deprecated pre-1.0 Pydantic models
  - models_v1.py: current RTVI protocol v1 message models
  - frames.py: RTVI pipeline frame dataclasses
  - observer.py: RTVIObserver and RTVIObserverParams
  - processor.py: RTVIProcessor (now lean, imports from submodules)
  - __init__.py: re-exports full public API for backward compatability
2026-03-06 11:49:59 -05:00
Paul Kompfner
bd4229ea9d Adopt the settings pattern for OpenAI Realtime session properties
Move `session_properties` into `OpenAIRealtimeLLMSettings`, making `settings` the canonical way to configure OpenAI Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=OpenAIRealtimeLLMSettings(session_properties=...)`.

`model` and `system_instruction` are synced bidirectionally between the top-level settings fields and `session_properties.model`/`.instructions`, with top-level taking precedence on conflict.
2026-03-06 11:46:21 -05:00
kigland
848f35f5df fix: replace bare except handlers with specific exception types 2026-03-06 23:05:02 +08:00
kompfner
ac80b787bf Merge pull request #3877 from pipecat-ai/pk/service-init-cleanup
Add `settings` as canonical init arg for all AIService descendants, d…
2026-03-06 10:01:50 -05:00
Paul Kompfner
5b270fec8e In AWS Nova Sonic examples, migrate to newer pattern of passing in settings with voice and system_instruction, in favor of passing in voice_id as a direct init arg and the system instruction as the first message in the context 2026-03-06 09:57:57 -05:00
Paul Kompfner
a1641f3762 Add system_instruction to realtime service settings
Add `system_instruction=None` to `default_settings` for OpenAIRealtimeLLMService, GrokRealtimeLLMService, UltravoxRealtimeLLMService, AWSNovaSonicLLMService (Azure inherits from OpenAI), and OpenAIRealtimeBetaLLMService (Azure Beta inherits from OpenAI Beta).

Deprecate `system_instruction` init arg in AWSNovaSonicLLMService in favor of `settings=AWSNovaSonicLLMSettings(system_instruction=...)`. Use `self._settings.system_instruction` directly instead of storing a separate `self._system_instruction`.

Deprecation of `params` and `session_properties` in favor of `settings` for realtime services will be tackled in future work.
2026-03-06 09:57:34 -05:00
Paul Kompfner
78deaa735d Move system_instruction into LLMSettings
Add `system_instruction` field to `LLMSettings` so it is runtime-updatable via settings.
For Google (GoogleLLMService, GoogleVertexLLMService), deprecate the init-time arg since it was already shipped. For Anthropic, AWS Bedrock, and OpenAI, remove the init-time arg entirely since it was never shipped.

Still need to handle realtime services (OpenAI Realtime, Grok Realtime, Gemini Live).
2026-03-06 09:57:08 -05:00
filipi87
524b87f087 Adding changelog entry for the summarization fix. 2026-03-06 11:45:20 -03:00
filipi87
4ef3b52c72 Fix context summarization leaving orphaned tool responses in kept context. 2026-03-06 11:40:27 -03:00
Mark Backman
ee2895a783 Update COMMUNITY_INTEGRATIONS.md with full Service Settings guidance
Broaden the "Dynamic Settings Updates" section into "Service Settings"
covering the complete settings pattern: defining a Settings subclass,
wiring it into __init__ with defaults + apply_update, and distinguishing
init-only config from runtime-updatable fields.
2026-03-06 08:44:15 -05:00
Mark Backman
ab37185208 Update run_eval_pipeline with the latest settings, system_instruction patterns 2026-03-06 08:32:59 -05:00
Mark Backman
8a203dd98f Update more examples, misc services 2026-03-06 08:30:00 -05:00
Mark Backman
62554a2390 Update examples 2026-03-06 08:30:00 -05:00
Mark Backman
14c3a88f02 Fix tests 2026-03-06 08:29:14 -05:00
Mark Backman
939d753c2b Update LLMs 2026-03-06 08:29:14 -05:00
Mark Backman
a4375274b2 Add Settings subclasses to all services and auto-discovered init tests
- Add dedicated Settings subclasses to 20 LLM services that were
  borrowing parent Settings classes (e.g. AzureLLMSettings,
  GroqLLMSettings) so users don't need cross-module imports
- Fix field defaults to NOT_GIVEN in BaseWhisperSTTSettings,
  OpenAIRealtimeSTTSettings, and NvidiaSegmentedSTTSettings for
  delta-mode safety
- Fix incomplete default_settings in AWS, Cartesia, ElevenLabs,
  Fish, and Whisper services so validate_complete() passes
- Add auto-discovered tests that verify all Settings classes default
  to NOT_GIVEN (delta safety) and all services initialize with
  complete settings (store completeness)
2026-03-06 08:29:14 -05:00
Mark Backman
034e81ff18 Update STT service settings 2026-03-06 08:29:14 -05:00
Mark Backman
3cb792a801 Update TTS service settings 2026-03-06 08:29:14 -05:00
Mark Backman
1274bb2c55 Update deprecation version to 0.0.105 2026-03-06 08:29:14 -05:00
Mark Backman
f31bfcf4ec Clean up CartesiaTTSSettings: separate init-only vs runtime-updatable fields
Move output_container, output_encoding, output_sample_rate out of
CartesiaTTSSettings into plain instance attributes since they cannot
change at runtime without breaking the audio pipeline. Remove deprecated
speed/emotion fields and their dead references in _build_msg() and
run_tts(). Remove the from_mapping override that only existed to
destructure those now-removed output format fields.
2026-03-06 08:29:14 -05:00
Mark Backman
07f1d0cd96 Change _warn_deprecated_param to accept type references instead of strings
Update all ~192 call sites across 84 service files to pass class references
(e.g. `CartesiaTTSSettings`) instead of string names (`"CartesiaTTSSettings"`)
to `_warn_deprecated_param()`. This enables better IDE refactoring support.

Also fix `from_mapping` return type annotations in 5 settings subclasses to
use `typing.Self` instead of forward reference strings.
2026-03-06 08:29:14 -05:00
Mark Backman
bc2843e30a Fix deprecation version 2026-03-06 08:29:14 -05:00
Paul Kompfner
5dc312ce0c Add settings as canonical init arg for all AIService descendants, deprecate redundant model/voice/params args
ServiceSettings types were introduced for runtime updates via ServiceUpdateSettingsFrame, but there was tension between init-time and runtime APIs: overlapping-but-different InputParams vs ServiceSettings classes, and runtime-updatable fields like `model` and `voice` scattered as direct init args rather than living in a settings object. This unifies them so developers use the same settings type at both init and runtime, improving ergonomics and consistency.

Every concrete AIService subclass (LLM, TTS, STT, ImageGen, Vision, Video) now accepts a `settings` parameter for runtime-updatable config. Old init args (`model`, `voice_id`, `params`/`InputParams`) still work but emit DeprecationWarnings pointing to the new API. When both are provided, `settings` takes precedence. Leaf classes emit warnings; base classes do not, avoiding double warnings in inheritance chains.
2026-03-06 08:29:14 -05:00
Aleix Conchillo Flaqué
3199168d3e scripts(evals): use context.add_message() 2026-03-05 19:14:06 -08:00
Aleix Conchillo Flaqué
ea8f5f2e22 Merge pull request #3933 from pipecat-ai/aleix/misc-fixes
Fix Daily transport log level and eval script import
2026-03-05 18:48:14 -08:00
Aleix Conchillo Flaqué
1221e2dd76 Fix Daily transport log level and eval script import
Change participant_updated log from debug to trace (too noisy).
Fix deepgram LiveOptions import in eval script.
2026-03-05 16:37:02 -08:00
Aleix Conchillo Flaqué
5b598265c4 update uv.lock 2026-03-05 16:28:55 -08:00
Mark Backman
79131dd6c6 Merge pull request #3930 from dakshdua/main
Add `push_empty_transcripts` param to `BaseWhisperSTTService` to push received empty transcripts downstream
2026-03-05 19:25:15 -05:00
Aleix Conchillo Flaqué
5b808872d1 Merge pull request #3932 from pipecat-ai/aleix/system-instruction-conflict-warning
Warn when both system_instruction and context system message are set
2026-03-05 16:24:06 -08:00
Aleix Conchillo Flaqué
fda4cb6732 Add changelog for #3932 2026-03-05 16:16:41 -08:00
Daksh Dua
789ce2fd5e Add param to push empty transcripts 2026-03-05 16:16:24 -08:00
Aleix Conchillo Flaqué
f4b8245241 Warn when both system_instruction and context system message are set
system_instruction from the constructor always takes precedence. A
warning is now logged when the context also contains a system message
so users can spot the conflict.
2026-03-05 16:16:17 -08:00
Mark Backman
ca27e12c84 Merge pull request #3926 from pipecat-ai/mb/update-deps-2026-03-05
Update dependency version ranges for flexibility
2026-03-05 18:09:04 -05:00
Mark Backman
671ef5b6cc Merge pull request #3928 from zkleb-aai/simplify-assemblyai-examples
Update AssemblyAI turn detection example to use keyterms_prompt
2026-03-05 16:11:08 -05:00
zack
380726cfd3 Update AssemblyAI turn detection example to use keyterms_prompt
Change the commented example from prompt string format to keyterms_prompt
list format for better clarity and consistency with API best practices.
2026-03-05 15:47:54 -05:00
Mark Backman
f4dfeb0f8b Merge pull request #3927 from zkleb-aai/add-assemblyai-vad-threshold
feat(assemblyai): add vad_threshold parameter for U3 Pro
2026-03-05 15:36:23 -05:00
zack
11024ccc2c Add changelog entries for vad_threshold and parameter cleanup 2026-03-05 15:32:09 -05:00
zack
acfb07f859 feat(assemblyai): add vad_threshold parameter for U3 Pro
Add vad_threshold parameter to AssemblyAIConnectionParams to support
voice activity detection threshold configuration for the u3-rt-pro model.

This parameter allows users to align AssemblyAI's VAD threshold with
their external VAD systems (e.g., Silero VAD) to avoid the "dead zone"
where AssemblyAI transcribes speech that the external VAD hasn't
detected yet, which can delay interruption handling.

- Range: 0.0 to 1.0 (lower = more sensitive)
- Default: 0.3 (API default when not sent)
- Only applicable to u3-rt-pro model
- Automatically included in WebSocket query parameters

Recommended usage: Set vad_threshold to match your VAD's activation
threshold (e.g., both at 0.3) for optimal performance.
2026-03-05 15:27:13 -05:00
Mark Backman
06e49d597b Update dev dependencies 2026-03-05 15:23:07 -05:00
Mark Backman
60e9e26164 revert onnxruntime to onnxruntime~=1.23.2 to maintain Python 3.10 support 2026-03-05 15:13:28 -05:00
Mark Backman
3f97c91983 Update optional dependency version ranges and remove SDK dependencies
Widen version ranges for stable packages (anthropic, azure, deepgram,
groq, livekit, nvidia-riva-client, fastapi, ormsgpack, opentelemetry,
faster-whisper) and add upper bounds to previously uncapped packages
(hume, pyjwt, livekit-api, camb).

Replace CartesiaHttpTTSService's internal use of the Cartesia SDK with
direct aiohttp calls, accepting an optional aiohttp_session parameter.

Replace fal-client SDK calls in FalSTTService and FalImageGenService
with direct HTTP to bypass the SDK's aggressive retry/backoff logic
that caused significant latency regressions.
2026-03-05 15:06:54 -05:00
Mark Backman
05fa727c22 Update core dependency version ranges for flexibility
Widen version ranges for stable packages (aiofiles, docstring_parser,
onnxruntime) while adding upper bounds to previously uncapped packages
(transformers, numba, wait_for2). Bump soxr to 1.0.0 and pyloudnorm
to 0.2.0. Move silero extra to empty since onnxruntime is now a core dep.
2026-03-05 13:13:55 -05:00
Aleix Conchillo Flaqué
06be260e54 Merge pull request #3919 from pipecat-ai/aleix/daily-transport-event-logging
Add logging to Daily transport event handlers
2026-03-05 08:35:28 -08:00
Mark Backman
691d1d309e Merge pull request #3920 from pipecat-ai/mb/remove-hathora 2026-03-05 07:00:52 -05:00
Mark Backman
eeb8ed8588 Remove Hathora service integration
Hathora is shutting down on March 5, 2026. Remove the STT/TTS services,
examples, and related references.
2026-03-04 22:10:06 -05:00
Aleix Conchillo Flaqué
fd545cabab update uv.lock 2026-03-04 17:40:24 -08:00
Aleix Conchillo Flaqué
1aadb8bd73 Merge pull request #3918 from pipecat-ai/aleix/system-instruction-openai-anthropic
Wire up system_instruction in OpenAI, Anthropic, and AWS Bedrock
2026-03-04 17:40:00 -08:00
Aleix Conchillo Flaqué
3c60b0c8af Add changelog for #3918 2026-03-04 17:37:32 -08:00
Aleix Conchillo Flaqué
0004a116d8 examples(foundational): use system_instruction in all examples 2026-03-04 17:37:32 -08:00
Aleix Conchillo Flaqué
01f0caf252 wire up system_instruction in OpenAI, Anthropic and AWS Bedrock 2026-03-04 17:37:32 -08:00
Vanessa Pyne
b42dfa4734 Merge pull request #3916 from pipecat-ai/vp-add-cloud-audio-only
daily-transport: add cloud-audio-only recording option
2026-03-04 16:58:39 -06:00
vipyne
aa31ced32f add changelog for 3916 2026-03-04 16:58:28 -06:00
vipyne
9ca900cc4a daily-transport: add cloud-audio-only recording option 2026-03-04 16:58:28 -06:00
Aleix Conchillo Flaqué
96062972db Add logging to Daily transport event handlers
Add appropriate log levels to dial-in/dial-out, participant, transcription,
and recording event handlers. Move transcription error log from client
callback to transport handler to keep logging consistent at the transport
level.
2026-03-04 13:30:43 -08:00
Mark Backman
4b9fe5afe3 Merge pull request #3915 from pipecat-ai/mb/function_call_timeout_secs-error-msg
Add per-tool function call timeout_secs
2026-03-04 15:01:34 -05:00
Mark Backman
f76b8d2982 Merge pull request #3917 from pipecat-ai/mb/sagemaker-init.py
Add missing __init__.py to sagemaker module
2026-03-04 12:25:44 -05:00
Mark Backman
27ae6a0349 Add missing __init__.py to sagemaker module 2026-03-04 11:50:37 -05:00
Mark Backman
97e4e7c647 Add changelog for #3915 2026-03-04 09:42:01 -05:00
Mark Backman
df35ceca2c Add per-tool timeout_secs to register_function and register_direct_function
The default function call timeout (10s) causes silent failures for
long-running tools. This adds an optional timeout_secs parameter to
register_function() and register_direct_function() so individual tools
can override the global function_call_timeout_secs. The warning message
now mentions both the per-tool and global timeout options.
2026-03-04 09:37:56 -05:00
Mark Backman
e5ae5e6f2d Merge pull request #3914 from pipecat-ai/mb/optional-summarization-thresholds
Make max_context_tokens and max_unsummarized_messages independently optional
2026-03-04 08:57:16 -05:00
Mark Backman
6789aee9e8 Add changelog for #3914 2026-03-03 20:09:26 -05:00
Mark Backman
b358657a79 Make max_context_tokens and max_unsummarized_messages independently optional
Allow either threshold to be set to None to cleanly disable that trigger,
instead of requiring users to set a very large number as a workaround.
At least one of the two must remain set (validated at construction time).
2026-03-03 20:08:22 -05:00
Mark Backman
9186f65952 Merge pull request #3908 from pipecat-ai/mb/uv-lock-2026-03-03
uv.lock update
2026-03-03 13:28:27 -05:00
Mark Backman
bdeeacec51 uv.lock update 2026-03-03 10:37:35 -05:00
Mark Backman
8f04f894d5 Merge pull request #3907 from pipecat-ai/mb/update-docs-skill-new-services 2026-03-03 09:48:01 -05:00
Mark Backman
ca0ec16373 Merge pull request #3889 from ai-coustics/goedev/aic-voice-focus-and-memoryview-fix
AIC Voice Focus version update & concurrency safety issue on audio buffer.
2026-03-03 09:28:13 -05:00
Filipi da Silva Fuchter
150c8b92e5 Merge pull request #3848 from pipecat-ai/filipi/deepgram
Upgrading Deepgram to version 6
2026-03-03 09:07:10 -05:00
filipi87
0fdf7dc16a Fixing sagemaker merge conflicts. 2026-03-03 11:03:51 -03:00
filipi87
fc905a7ef5 Merge branch 'main' into filipi/deepgram
# Conflicts:
#	src/pipecat/services/deepgram/stt_sagemaker.py
2026-03-03 10:54:30 -03:00
Mark Backman
aca92745cd Update update-docs skill to register new services in docs.json and supported-services.mdx
When the skill creates a new service documentation page, it now also adds
the page to docs.json navigation and the supported-services.mdx table.
2026-03-03 08:40:12 -05:00
Aleix Conchillo Flaqué
5940731dd0 Merge pull request #3906 from pipecat-ai/changelog-0.0.104
Release 0.0.104 - Changelog Update
2026-03-02 21:24:05 -08:00
aconchillo
62260454a2 Update changelog for version 0.0.104 2026-03-02 21:23:39 -08:00
Aleix Conchillo Flaqué
d1ad7a9580 Merge pull request #3905 from pipecat-ai/aleix/tavus-fix-callback-on-joined
transport(tavus): fix on_joined callback
2026-03-02 21:11:12 -08:00
Aleix Conchillo Flaqué
252f17e1ca transport(tavus): fix on_joined callback 2026-03-02 21:06:49 -08:00
Mark Backman
c79a739c85 Merge pull request #3856 from zkleb-aai/assemblyai-u3-rt-pro
Assemblyai u3 rt pro
2026-03-02 20:28:28 -05:00
Mark Backman
038f6a77d1 Linting 2026-03-02 20:24:30 -05:00
Aleix Conchillo Flaqué
5952ea711c update uv.lock 2026-03-02 16:42:58 -08:00
Mark Backman
aad1211a57 Merge pull request #3885 from pipecat-ai/mb/latency-breakdown
Add latency breakdown to UserBotLatencyObserver
2026-03-02 19:27:35 -05:00
Mark Backman
7dbb130666 Add chronological_events utility function to display UserBotLatencyObserver report 2026-03-02 19:23:42 -05:00
zack
c6c2c5ba05 Fix end_of_turn_confidence_threshold: set to 1.0 (not 0.0) for universal-streaming
- u3-rt-pro: Does not set parameter (not used)
- universal-streaming models: Set to 1.0 to maintain fast response
- This ensures fast response time matches previous implementation
2026-03-02 18:25:25 -05:00
Aleix Conchillo Flaqué
141b0ee014 Merge pull request #3902 from pipecat-ai/aleix/deepgram-sagemaker-move
Move Deepgram SageMaker modules to sagemaker/ subpackage
2026-03-02 15:25:17 -08:00
Aleix Conchillo Flaqué
303616599f Add changelog for #3902 2026-03-02 15:20:52 -08:00
Aleix Conchillo Flaqué
088eb9b01c examples: update to new sagemaker packages 2026-03-02 15:20:52 -08:00
zack
32773b42d6 Improve terminology: rename file and replace 'STT mode' with 'AssemblyAI turn detection'
- Rename 07o-interruptible-assemblyai-stt.py -> 07o-interruptible-assemblyai-turn-detection.py
- Replace 'STT mode' with 'AssemblyAI turn detection mode' throughout codebase
- Replace 'Mode 1'/'Mode 2' with descriptive 'Pipecat turn detection'/'AssemblyAI turn detection'
- Update changelog to use 'built-in turn detection' terminology
- Addresses PR feedback about confusing terminology
2026-03-02 18:08:46 -05:00
Mark Backman
c039e08741 Merge pull request #3900 from pipecat-ai/filipi/lemonslice
Adding the LemonSlice transport integration
2026-03-02 17:56:18 -05:00
zack
b449515410 Address PR review feedback: remove debug logs, fix hasattr logic, add VADAnalyzer 2026-03-02 17:54:31 -05:00
Mark Backman
aae9136df9 Review feedback 2026-03-02 17:52:39 -05:00
Aleix Conchillo Flaqué
fdeddd7c95 Add deprecation shims for moved stt_sagemaker/tts_sagemaker modules
Re-export from the new pipecat.services.deepgram.sagemaker.{stt,tts}
paths so existing imports keep working with a deprecation warning.
2026-03-02 14:47:17 -08:00
Aleix Conchillo Flaqué
11783520c0 services(deepgram): move stt|tts_sagemaker to sagemaker/stt|tts.py 2026-03-02 14:43:34 -08:00
filipi87
49c73bb0a3 Merge branch 'main' into filipi/lemonslice
# Conflicts:
#	README.md
#	uv.lock
2026-03-02 19:24:52 -03:00
filipi87
f07e55a4ed Wrap LemonSlice session creation params in LemonSliceNewSessionRequest 2026-03-02 19:15:18 -03:00
filipi87
daf14f5065 Renaming LemonSlice utils file to api. 2026-03-02 19:08:17 -03:00
filipi87
ebb794995b Changing the log levels. 2026-03-02 19:06:13 -03:00
zkleb-aai
5c2ca0ce64 Update changelog/3856.changed.md
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:54 -05:00
zkleb-aai
6729f4366a Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:42 -05:00
zkleb-aai
7648b62e6e Update src/pipecat/services/assemblyai/stt.py
Co-authored-by: Mark Backman <m.backman@gmail.com>
2026-03-02 17:04:17 -05:00
filipi87
7afd7068b5 Retrieving the elevenlabs voice ID from environment variable 2026-03-02 19:02:51 -03:00
filipi87
07fdd610ca Using a default voice in case it is not provided. 2026-03-02 19:02:33 -03:00
Mark Backman
a4796a2373 Merge pull request #3898 from pipecat-ai/mb/revert-processing-metrics-deprecation
Revert processing metrics deprecation
2026-03-02 16:39:02 -05:00
Aleix Conchillo Flaqué
44466cfa07 Merge pull request #3896 from pipecat-ai/aleix/broadcast-interruption
Add broadcast_interruption() to FrameProcessor
2026-03-02 13:36:39 -08:00
Mark Backman
68d7e98f95 Add defensive comment for given_fields() usage in tracing 2026-03-02 16:33:25 -05:00
Aleix Conchillo Flaqué
741ff14d3a Rename changelog files to use PR #3896 and mark breaking change 2026-03-02 13:26:45 -08:00
Aleix Conchillo Flaqué
4a61d5bfad Add broadcast_interruption() to FrameProcessor
Replace the round-trip push_interruption_task_frame_and_wait() mechanism
with broadcast_interruption(), which pushes an InterruptionFrame both
upstream and downstream directly from the calling processor.

This eliminates race conditions (transcription arriving before the
InterruptionFrame comes back), swallowed-event timeouts (frame blocked
before reaching the sink), and the complexity of _wait_for_interruption
flag / queue bypass / frame.complete() obligations.

- Add broadcast_interruption() to FrameProcessor
- Deprecate push_interruption_task_frame_and_wait() (delegates to new method)
- Remove event field and complete() from InterruptionFrame/InterruptionTaskFrame
- Remove _wait_for_interruption flag and all special-case logic
- Remove frame.complete() calls in stt_mute_filter and llm_response_universal
- Update all 17 call sites to use broadcast_interruption()
- Update tests
2026-03-02 13:26:45 -08:00
Mark Backman
d0ecb3c7a8 Revert "Deprecate processing metrics (ProcessingMetricsData)" (#3852)
This reverts commit 127b52bad5.
2026-03-02 16:26:29 -05:00
Mark Backman
8f66272de7 Update changelog 2026-03-02 16:16:38 -05:00
Mark Backman
ff5b985009 Convert observer data models to Pydantic BaseModel with timestamps
Enables .model_dump() serialization for Pipecat Cloud collection.
All metrics now include start_time (Unix timestamp) for timeline
plotting alongside duration_secs.
2026-03-02 16:11:43 -05:00
Mark Backman
a738a4d82b Add function call latency tracking to LatencyBreakdown 2026-03-02 16:11:43 -05:00
Mark Backman
ddba1b84a9 Add first-bot-speech latency to UserBotLatencyObserver
Measure time from ClientConnectedFrame to first BotStartedSpeakingFrame,
emitting a one-time on_first_bot_speech_latency event with breakdown.
2026-03-02 16:11:43 -05:00
Mark Backman
18155b6a63 Add latency breakdown to UserBotLatencyObserver
Add per-service latency breakdown metrics alongside existing user-to-bot
latency measurement. When enable_metrics=True, the observer now emits an
on_latency_breakdown event with TTFB, text aggregation, and user turn
duration metrics collected between VADUserStoppedSpeakingFrame and
BotStartedSpeakingFrame.

- Add LatencyBreakdown dataclass with ttfb, text_aggregation,
  user_turn_secs fields
- Accumulate MetricsFrame data during user→bot cycles
- Reset accumulators on InterruptionFrame to discard stale metrics
- Measure user_turn_secs from actual user silence (VAD timestamp -
  stop_secs) to turn release (UserStoppedSpeakingFrame)
- Filter zero-value TTFB entries from startup metric resets
- Add frame deduplication using bounded deque + set pattern
- Update example 29 with latency breakdown display
2026-03-02 16:11:43 -05:00
Mark Backman
ac69b3441e Fix tracing to use ServiceSettings API instead of dict access
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
2026-03-02 16:11:43 -05:00
Mark Backman
98bd530574 Add changelog for #3881 2026-03-02 16:11:42 -05:00
Mark Backman
b1e55fd6c2 Merge pull request #3881 from pipecat-ai/mb/startup-observer
Add StartupTimingObserver
2026-03-02 16:07:28 -05:00
Mark Backman
dbdb54ce0f Add on_connected event handler to DailyTransport for cross-transport consistency 2026-03-02 15:44:37 -05:00
Mark Backman
c1743dcffd Rename Tavus event, on_connected 2026-03-02 15:22:44 -05:00
Mark Backman
389d0c3fb6 Use on_pipeline_started from PipelineTask for startup report
Replace the PipelineSink detection in StartupTimingObserver with an
on_pipeline_started() callback from PipelineTask via TaskObserver.
This fixes premature report emission when using ParallelPipeline,
which has its own inner PipelineSinks per branch.
2026-03-02 14:33:55 -05:00
Mark Backman
a88eae7849 Merge pull request #3895 from pipecat-ai/aleix/update-nvidia-example-model
Update Nvidia example to use llama-3.3-70b-instruct
2026-03-02 14:27:53 -05:00
Mark Backman
0cfd953a90 Use _ArrivalInfo dataclass instead of tuple for arrival tracking 2026-03-02 14:15:41 -05:00
Mark Backman
bbbfdfd321 Replace per-processor start_time with start_offset_secs
Use start_offset_secs (offset from StartFrame) on ProcessorStartupTiming
instead of a wall-clock timestamp. Reports keep a single start_time
anchor for dashboard visualization. Remove _mono_to_wall conversion.
2026-03-02 14:07:34 -05:00
Aleix Conchillo Flaqué
193f93c2ce Update Nvidia example to use llama-3.3-70b-instruct model 2026-03-02 10:16:27 -08:00
Mark Backman
75669b12a2 Convert observer data models to Pydantic BaseModel with timestamps
Switch ProcessorStartupTiming, StartupTimingReport, and
TransportTimingReport from dataclasses to Pydantic BaseModel. Add
start_time (Unix timestamp) fields and wall clock conversion for
monotonic observer timestamps.
2026-03-02 13:10:09 -05:00
Mark Backman
68e8732e72 Add BotConnectedFrame and on_transport_timing_report event
Add BotConnectedFrame (SystemFrame) pushed by SFU transports (Daily,
LiveKit, HeyGen, Tavus) when the bot joins the room. Replace the
on_transport_readiness_measured event with on_transport_timing_report
which includes both bot_connected_secs and client_connected_secs.
2026-03-02 13:10:09 -05:00
Mark Backman
de87894778 Update changelog for #3881 2026-03-02 13:10:09 -05:00
Mark Backman
0836066898 Add ClientConnectedFrame and transport readiness timing
Introduce ClientConnectedFrame (SystemFrame) pushed by all transports
when a client connects. StartupTimingObserver uses this to measure
transport readiness — the time from StartFrame to first client
connection — via a new on_transport_readiness_measured event.
2026-03-02 13:10:09 -05:00
Mark Backman
58aa8e1ba5 Add changelog for #3881 2026-03-02 13:10:09 -05:00
Mark Backman
670e5000d2 Merge pull request #3893 from pipecat-ai/mb/fix-azure-error-propagation
Propagate Azure TTS/STT cancellation errors to the pipeline
2026-03-02 13:04:54 -05:00
Mark Backman
e6b9c5c4dc Propagate Azure TTS/STT cancellation errors to the pipeline
Azure TTS _handle_canceled was putting None (the normal completion
signal) into the audio queue for all cancellation reasons, so run_tts
treated errors identically to success—silently producing no audio.
Now error cancellations put an Exception marker in the queue, which
run_tts converts to an ErrorFrame.

Azure STT had no canceled event handler at all, so auth failures,
network errors, and rate-limit cancellations were invisible. Added
_on_handle_canceled which pushes an ErrorFrame upstream via push_error.

Fixes pipecat-ai/pipecat#3892
2026-03-02 12:36:08 -05:00
Mark Backman
c54232bdb4 Add StartupTimingObserver for measuring processor start() times
Tracks how long each processor start method takes during pipeline
startup by measuring StartFrame arrive/leave deltas. Emits a timing
report via the on_startup_timing_report event and auto-logs a summary.
Internal pipeline processors are excluded from reports by default.
2026-03-02 10:48:50 -05:00
Mark Backman
5a6a93e277 Merge pull request #3886 from dhruvladia-sarvam/add/user-agent
fix(sarvam): standardize STT/TTS User-Agent headers
2026-03-02 10:21:23 -05:00
dhruvladia-sarvam
f386722ef9 removing unnecessary logs 2026-03-02 20:38:39 +05:30
Mark Backman
7c07e090a4 Merge pull request #3891 from pipecat-ai/mb/fix-update-docs-oidc
Fix update-docs workflow OIDC failure with pull_request_target
2026-03-02 09:29:35 -05:00
filipi87
8b09f7bbb4 Upgrading Deepgram to version 6. 2026-03-02 11:22:33 -03:00
Mark Backman
07ba255073 Fix update-docs workflow OIDC failure with pull_request_target
The switch from pull_request to pull_request_target (for fork PR
secret access) broke claude-code-action default OIDC-based GitHub
App authentication. Pass github_token explicitly to bypass OIDC.
2026-03-02 09:20:24 -05:00
Mark Backman
eb7a4b7aee Merge pull request #3874 from pipecat-ai/mb/pr-3873
Changelog for PR 3873, docstrings change
2026-03-02 08:36:05 -05:00
Rupesh
ad74d19c6b Remove resampling warning log for consistency with rest of codebase 2026-03-02 13:24:00 +00:00
Rupesh
5e8d722bf2 Use soxr for high-quality audio resampling instead of numpy linear interpolation 2026-03-02 13:24:00 +00:00
Rupesh
a7f6db8436 Add changelog fragment for #3857 2026-03-02 13:24:00 +00:00
Rupesh
442ea6a97e Fix Smart Turn v3 producing incorrect predictions at non-16kHz sample rates
The Whisper-based ONNX model expects 16 kHz audio, but the
_predict_endpoint method had five hardcoded references to 16000 without
checking the actual pipeline sample rate. When running at 8 kHz (e.g.
Twilio telephony), audio was fed to the feature extractor at the wrong
rate, causing the model to perceive speech at 2x speed with shifted
formant frequencies and produce incorrect end-of-turn predictions.

Add automatic resampling via numpy interpolation before feature
extraction and replace all hardcoded sample rate values with a
_MODEL_SAMPLE_RATE constant. Also fix the WAV debug logger to write
files with the correct sample rate header.

Fixes #3844
2026-03-02 13:24:00 +00:00
Mark Backman
018ead8551 Changelog for PR 3873, docstrings change 2026-03-02 08:08:43 -05:00
Mark Backman
5e99aeedf5 Merge pull request #3888 from pipecat-ai/mb/fix-filter-incomplete-turns
Re-inject turn completion instructions after LLM context reset
2026-03-02 08:03:08 -05:00
Mark Backman
c579749d8a Merge pull request #3875 from pipecat-ai/mb/foundational-ex-updates
Miscellaneous foundational example updates
2026-03-02 08:02:51 -05:00
Mark Backman
094de42f0c Merge pull request #3879 from pipecat-ai/mb/fix-tracing-settings
Fix tracing to use ServiceSettings API instead of dict access
2026-03-02 08:01:45 -05:00
dhruvladia-sarvam
1242f1c10e changelog entry 2026-03-02 18:09:51 +05:30
dhruvladia-sarvam
55a641e258 fix(sarvam): standardize STT/TTS User-Agent headers 2026-03-02 18:09:51 +05:30
Gökmen Görgen
7575ea7e07 add changelog entries for Voice Focus 2.0 support and buffer resize fix in AICFilter. 2026-03-02 12:47:59 +01:00
Gökmen Görgen
d6aab6b52e simplify parameter setup in AICFilter.
ParameterFixedError is deprecated.
2026-03-02 12:24:52 +01:00
Gökmen Görgen
8ff3e21654 use new version of vf model. 2026-03-02 11:22:51 +01:00
Gökmen Görgen
ea59695551 don't use memoryview for concurrency safety.
Snapshot the blocks into immutable bytes and trim the buffer BEFORE any await, so no memoryview is
held across async yield points. Without this, a concurrent filter() or stop() call could try to
extend() or clear() the bytearray while a memoryview still exports it, raising "Existing exports
of data: object cannot be re-sized".
2026-03-02 10:55:25 +01:00
Gökmen Görgen
16c676a921 add a test for reproducing the user feedback first. 2026-03-02 10:34:50 +01:00
Mark Backman
91c46ffbf4 Re-inject turn completion instructions after LLM context reset
When filter_incomplete_user_turns is enabled and an LLMMessagesUpdateFrame
replaces the context via set_messages(), the turn completion instructions
system message was lost. This caused the LLM to stop emitting turn
completion markers. Re-inject the instructions after set_messages() to
fix this.
2026-03-01 16:37:07 -05:00
Mark Backman
024c62946f Merge pull request #3878 from pipecat-ai/mb/fix-update-docs-workflow-secrets
fix: use pull_request_target for docs workflow to access secrets from fork PRs
2026-03-01 14:53:53 -05:00
Mark Backman
9b969736f6 Merge pull request #3764 from kedar389/add-support-for-private-endpoint-azure-stt
feat: Add support for private endpoint in Azure STT
2026-03-01 14:50:34 -05:00
Radek Sedlák
6fc718947d Merge branch 'pipecat-ai:main' into add-support-for-private-endpoint-azure-stt 2026-03-01 17:55:45 +01:00
zack
cb7e612738 Remove test files and testing documentation from PR 2026-03-01 11:51:51 -05:00
zack
36b9c05730 Fix changelog entries to use proper markdown bullet format 2026-03-01 11:45:24 -05:00
zack
6968d83ccb Add changelog entries for PR #3856 2026-03-01 11:44:51 -05:00
zack
42f91a9056 Apply ruff formatting fixes 2026-03-01 11:44:37 -05:00
zack
5de495cc98 Use logger.warning instead of warnings.warn for deprecation message
- Makes deprecation warning visible in logs without needing Python warning flags
- Users will see the warning during normal operation
2026-03-01 11:39:00 -05:00
zack
d1cbc81108 Fix 07o example to use new min_turn_silence parameter name in docs and comments 2026-03-01 11:36:46 -05:00
zack
66fca7e382 Add backward compatibility for min_end_of_turn_silence_when_confident parameter
- Keep old parameter name for backward compatibility
- Add deprecation warning when old parameter is used
- Automatically migrate old parameter value to new min_turn_silence parameter
- Exclude deprecated parameter from WebSocket URL to avoid sending it to API
- New parameter takes precedence if both are set
2026-03-01 11:33:22 -05:00
zack
07ae4b8d38 Update AssemblyAI examples to use u3-rt-pro and improve 55d example
- Update 13d-assemblyai-transcription.py to explicitly use u3-rt-pro model
- Update 55d-update-settings-assemblyai-stt.py to demonstrate keyterms updates instead of language updates
- Add helpful logging to show before/after keyterms boosting effect
- Use difficult names (Xiomara, Saoirse, Krzystof) to demonstrate boosting effectiveness
2026-03-01 11:27:31 -05:00
zack
21a409e447 Update prompt warning and rename min_end_of_turn_silence_when_confident to min_turn_silence
- Add "beta feature" note to custom prompt warning
- Rename min_end_of_turn_silence_when_confident parameter to min_turn_silence across all AssemblyAI code
- Update documentation, examples, and test files to use new parameter name
2026-03-01 11:17:39 -05:00
Aleix Conchillo Flaqué
903dc6c1a9 Merge pull request #3883 from pipecat-ai/aleix/queue-frame-direction
Add direction parameter to PipelineTask.queue_frame() and queue_frames()
2026-03-01 06:04:28 -08:00
Mark Backman
dee94b3cb8 Merge pull request #3795 from omChauhanDev/fix/realtime-cancel-not-active
fix(realtime): handle response_cancel_not_active as non-fatal
2026-03-01 07:29:59 -05:00
Om Chauhan
ece4343839 changed log level to debug 2026-03-01 12:25:42 +05:30
Aleix Conchillo Flaqué
94a59de4e1 Add changelog for #3883 2026-02-28 17:28:44 -08:00
Aleix Conchillo Flaqué
f37fd39cdb Add optional direction parameter to PipelineTask.queue_frame() and queue_frames()
Allow pushing frames upstream through the pipeline by passing
FrameDirection.UPSTREAM. Downstream frames use the existing push queue,
while upstream frames are pushed directly from the pipeline sink.
2026-02-28 17:28:44 -08:00
Mark Backman
9d4955054c Fix tracing to use ServiceSettings API instead of dict access
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
2026-02-27 22:41:40 -05:00
Mark Backman
6464230627 fix: use pull_request_target for docs workflow to access secrets from fork PRs
The update-docs workflow intermittently failed with "Input required and
not supplied: token" because pull_request events from fork PRs don't
have access to repository secrets. Switching to pull_request_target
runs the workflow in the base repo's context, ensuring secrets are
always available. This is safe since the workflow only runs on
already-merged PRs.
2026-02-27 22:22:35 -05:00
Mark Backman
950a8628dc Miscellaneous foundational example updates 2026-02-27 19:49:45 -05:00
Mark Backman
17205c1647 Merge pull request #3871 from rupesh-svg/fix/rtvi-processor-double-insert
Fix PipelineTask double-inserting RTVIProcessor with custom RTVIObserver
2026-02-27 19:34:46 -05:00
Mark Backman
2a776d0c1e Merge pull request #3873 from rimelabs/matt/rime/add_speedAlpha_param_to_arcana
[RimeTTS] Add `speedAlpha` parameter support to the `arcana` model
2026-02-27 19:27:56 -05:00
zack
d7ce1eedd9 Add foundational examples for AssemblyAI u3-rt-pro
- 07o-interruptible-assemblyai.py: Basic example using Pipecat VAD mode
- 07o-interruptible-assemblyai-stt.py: Advanced example using STT-controlled
  turn detection with comprehensive documentation on u3-rt-pro features
  (turn detection tuning, prompt-based enhancement, speaker diarization)
2026-02-27 17:58:18 -05:00
zack
ef00f27d53 Fix incorrect await on synchronous request_finalize() method
The request_finalize() method in STTService is synchronous (sets a flag),
but was being called with await in the VAD turn endpoint handling code.
This caused "object NoneType can't be used in 'await' expression" errors.

Also includes automatic formatting improvements from ruff.
2026-02-27 17:58:05 -05:00
Rupesh
56f2564ed1 Use local variable instead of instance variable for RTVI prepend decision
Replace _rtvi_external instance variable with a local prepend_rtvi flag
since it is only used during __init__ to decide whether to prepend the
RTVIProcessor to the pipeline.
2026-02-27 14:45:37 -08:00
macaki
000d38e253 [Rime] Both mist and arcana now support the speedAlpha parameter. 2026-02-27 15:17:23 -07:00
Filipi da Silva Fuchter
36edef489e Merge pull request #3863 from pipecat-ai/filipi/manual_summarization
Manual context summarization
2026-02-27 16:46:37 -05:00
filipi87
d077a810ae Fixing context summarization tests 2026-02-27 18:42:50 -03:00
filipi87
0839e3813f Refactoring the examples to use the new context summarization classes. 2026-02-27 18:42:39 -03:00
filipi87
69414e8a5a Added example 54b-context-summarization-manual-openai.py demonstrating on-demand summarization triggered via a function call tool. 2026-02-27 18:42:23 -03:00
filipi87
dfd0a515f3 Changelog entries for the context summarization improvements. 2026-02-27 18:42:13 -03:00
filipi87
ed7f0a2c08 Adding support for on-demand summarization 2026-02-27 18:41:55 -03:00
filipi87
08d93ce9b6 Renamed LLMAssistantAggregatorParams fields for clarity. 2026-02-27 18:41:17 -03:00
filipi87
f11d4b6944 Refactored LLMContextSummarizationConfig into two focused classes, LLMContextSummaryConfig and LLMAutoContextSummarizationConfig. 2026-02-27 18:40:41 -03:00
filipi87
51a3310e78 Added LLMSummarizeContextFrame: push this frame anywhere in the pipeline to trigger on-demand context summarization (e.g. from a function call tool). 2026-02-27 18:39:57 -03:00
Rupesh
6f33aff0c6 Fix PipelineTask double-inserting RTVIProcessor when custom RTVIObserver is provided
When the user places an RTVIProcessor inside their pipeline and provides
a custom RTVIObserver subclass in observers, PipelineTask correctly
detects both and logs "skipping default ones." However it then
unconditionally prepends self._rtvi to the pipeline, causing the
processor to appear twice in the frame chain.

Track whether the RTVIProcessor was found externally (inside the user
pipeline) vs created internally. Only prepend it when created internally.

Fixes #3867
2026-02-27 13:29:01 -08:00
zack
45532a9478 Remove info logs and unused import per PR feedback
- Remove unused Mapping import
- Remove info logs at initialization (connection params)
- Remove info logs in _handle_transcription (transcript details, text sent to LLM)
- Remove info logs in _build_ws_url (WebSocket URL and params)
- Keep debug logs (less verbose, appropriate for development)
2026-02-27 16:15:49 -05:00
Mark Backman
4eb993c980 Merge pull request #3868 from wollerman/wollerman/numba-version-pin-update
fix: Update numba version pin from == to >=0.61.2
2026-02-27 16:04:20 -05:00
Mark Backman
83e29eb478 Merge pull request #3855 from pipecat-ai/mb/context-summarization-improvements
Improve context summarization with dedicated LLM, timeout, and observability
2026-02-27 15:24:38 -05:00
zack
6ba9f780b0 Remove unnecessary SpeechStarted fallback in STT mode
u3-rt-pro guarantees SpeechStarted is always sent before transcripts,
so the fallback UserStartedSpeakingFrame broadcast is never needed.

This ensures clean pairing of UserStarted/StoppedSpeakingFrame:
- Start: Always from _handle_speech_started
- Stop: Always from _handle_transcription on final turn
2026-02-27 15:00:38 -05:00
zack
aa7e9a17d5 Fix finalization pattern: Use request/confirm in Pipecat mode, finalized flag in STT mode
- Add request_finalize() before sending ForceEndpoint in Pipecat mode
- Keep confirm_finalize() when receiving formatted finals in Pipecat mode
- Remove confirm_finalize() from STT mode (use finalized=True instead)

This follows Pipecat's two-step finalization pattern where request_finalize()
is called when sending a finalize request to the STT service, and
confirm_finalize() is called when receiving confirmation back.
2026-02-27 14:55:22 -05:00
Matt
acff172bf2 create changelog entry 2026-02-27 14:52:37 -05:00
Mark Backman
9747e8da4a Merge pull request #3866 from pipecat-ai/mb/fix-docs-workflow-version
Fix docs workflow to add auto-docs label
2026-02-27 13:09:36 -05:00
Mark Backman
8fc63352d9 Merge pull request #3865 from pipecat-ai/mb/elevenlabs-realtime-stt-finalized
Set finalized flag on ElevenLabs Realtime STT for manual commit strategy
2026-02-27 13:09:17 -05:00
Matt
6ebfea4746 update numba version pin to >= 2026-02-27 12:44:31 -05:00
Mark Backman
f74af9b9c7 Always apply a timeout to summarization LLM calls
Even when summarization_timeout is explicitly set to None, use a
DEFAULT_SUMMARIZATION_TIMEOUT (120s) fallback so the LLM call can
never hang indefinitely. Applied in both LLMService and the dedicated
LLM path in LLMContextSummarizer.
2026-02-27 12:09:00 -05:00
Mark Backman
82c249608f Move dedicated LLM summarization into LLMContextSummarizer
The dedicated LLM logic lived in LLMAssistantAggregator, creating two
code paths and requiring the aggregator to call a private LLMService
method. Move it into the summarizer which already owns the config and
summarization lifecycle, keeping the aggregator handler as a single-line
upstream push.
2026-02-27 12:09:00 -05:00
Mark Backman
98e737b4e9 Add tests for context summarization improvements
Cover summary message role, template, on_summary_applied event,
summarization timeout, and dedicated LLM routing/error handling.
2026-02-27 12:08:43 -05:00
Mark Backman
ec9ddb3199 Add changelog entries for context summarization improvements (#3855) 2026-02-27 12:07:34 -05:00
Mark Backman
712305c5b1 Add example 54c showing custom context summarization 2026-02-27 12:07:34 -05:00
Mark Backman
be8ea818c8 Add on_summary_applied event for observability
Emits a SummaryAppliedEvent after context summarization completes,
  providing message counts so applications can track compression
  metrics.
2026-02-27 12:07:34 -05:00
Mark Backman
50710e9c3f Add summarization timeout to prevent hung LLM calls
Adds a configurable summarization_timeout (default 120s) that cancels
  summary generation if the LLM hangs. On timeout, an error result is
  returned so _summarization_in_progress resets and future
  summarizations are unblocked.
2026-02-27 12:07:34 -05:00
Mark Backman
a489bfaf00 Add optional dedicated LLM for context summarization
Adds an  field to LLMContextSummarizationConfig that allows
  routing summarization to a separate LLM service (e.g., Gemini Flash)
  instead of the pipeline's primary model. This avoids paying for
  expensive inference when compressing context in long-running sessions.
2026-02-27 12:07:34 -05:00
Mark Backman
945a523eed Add configurable summary_message_template to LLMContextSummarizationConfig
Allows applications to customize how the summary is wrapped when
  injected into context (e.g., XML tags, custom delimiters) so system
  prompts can distinguish summaries from live conversation.
2026-02-27 12:07:34 -05:00
Mark Backman
790c434a08 Update summary message role: use user instead of assistant
The context summary is information provided to the assistant, not
  something the assistant said.
2026-02-27 12:07:34 -05:00
Filipi da Silva Fuchter
db40a354be Merge pull request #3794 from omChauhanDev/fix/context-summarization-llm-specific-message
skipping provider-specific messages during summarization
2026-02-27 10:57:34 -05:00
filipi87
aa6d3b38b3 Add explanatory comments for LLMSpecificMessage guards in context summarization, amd fixed the missing guard in LLMContextSummarizer._apply_summary when searching for the first system message. 2026-02-27 12:53:25 -03:00
Mark Backman
41d6470e4a Fix docs workflow: add auto-docs label, remove version info 2026-02-27 10:39:37 -05:00
Mark Backman
601822e3e5 Add changelog for PR #3865 2026-02-27 10:25:48 -05:00
Mark Backman
3a32d91c66 Set finalized flag on ElevenLabs Realtime STT transcriptions for manual commit strategy 2026-02-27 10:21:10 -05:00
Filipi da Silva Fuchter
35b3803ebc Merge pull request #3845 from pipecat-ai/filipi/fix_tts_speak_frame
Add TTSSpeakFrame.push_assistant_aggregation to force context flush after TTS.
2026-02-27 09:59:33 -05:00
filipi87
3b427a47b6 Fixing Piper test. 2026-02-27 11:57:11 -03:00
filipi87
d701c3427c Changelog entry for the TTSSpeakFrame fix. 2026-02-27 11:57:03 -03:00
filipi87
1f45e80f9d Updated the 52-live-translation.py example to demonstrate the fix 2026-02-27 11:56:52 -03:00
filipi87
bc6f8e51de Fixed TTSSpeakFrame not automatically committing spoken text to the conversation context when used outside of an LLM response (e.g., for bot greeting messages or injected speech) 2026-02-27 11:56:44 -03:00
filipi87
deba2515f9 Added a new LLMAssistantPushAggregationFrame control frame that signals LLMAssistantAggregator to immediately flush its text buffer to the conversation context 2026-02-27 11:56:36 -03:00
Mark Backman
127b52bad5 Merge pull request #3852 from pipecat-ai/mb/deprecate-processing-metrics
Deprecate processing metrics (ProcessingMetricsData)
2026-02-27 09:50:29 -05:00
Mark Backman
0697f72dae Merge pull request #3864 from pipecat-ai/mb/auto-docs-update
Add automated docs update workflow
2026-02-27 09:36:27 -05:00
Mark Backman
c259a6a73b Deprecate processing metrics (ProcessingMetricsData)
Add deprecation warnings to start_processing_metrics() and
stop_processing_metrics() on FrameProcessorMetrics and FrameProcessor.
Mark ProcessingMetricsData as deprecated in docstring. All existing
behavior is preserved — the warnings inform users that these will be
removed in a future version.
2026-02-27 09:22:29 -05:00
Mark Backman
3e04f5d05f Add GitHub Actions workflow to auto-update docs on PR merge
Runs Claude Code Action after PRs merge to main when source files
in services/transports/serializers/processors/audio/turns/observers/pipeline
are changed. Creates a docs PR on pipecat-ai/docs with targeted edits
following the existing update-docs skill instructions.
2026-02-27 09:18:15 -05:00
zack
cd07937c5d Fix missing imports: Add UserStartedSpeakingFrame and UserStoppedSpeakingFrame 2026-02-26 22:18:02 -05:00
zack
72934bd8ae Add u3-rt-pro support and improvements to AssemblyAI STT service
- Fix speaker diarization: Add field alias for speaker_label → speaker
  mapping in TurnMessage model
- Add warning for non-optimal min_end_of_turn_silence_when_confident
  values (recommends 100ms for best latency)
- Improve max_turn_silence override warning message clarity
- Update custom prompt warning (remove 88% accuracy claim)
- Add comprehensive logging for debugging:
  - Log final connection params after modifications
  - Log WebSocket URL and parsed parameters
  - Log speaker field in transcripts
  - Log text sent to LLM with speaker formatting
- Support dynamic configuration updates via STTUpdateSettingsFrame:
  - keyterms_prompt (when AssemblyAI API supports it)
  - prompt
  - max_turn_silence
  - min_end_of_turn_silence_when_confident
2026-02-26 22:04:21 -05:00
Mark Backman
2a6a993869 Merge pull request #3850 from rupesh-svg/fix/genesys-remove-audio-chunk-logging
Remove verbose audio chunk logging from GenesysAudioHookSerializer
2026-02-26 21:52:54 -05:00
Rupesh
bbaa79fef0 Add changelog for PR #3850 2026-02-26 14:00:34 -08:00
Rupesh
fff9db0d8f Remove verbose audio chunk logging from GenesysAudioHookSerializer
Fixes #3777
2026-02-26 13:51:05 -08:00
kompfner
7fe458fe59 Merge pull request #3817 from pipecat-ai/pk/service-settings-fix-back-compat-for-nested-external-sdk-types
Flatten `LiveOptions` into individual fields on `DeepgramSTTSettings`…
2026-02-26 11:08:27 -05:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Mark Backman
b63ca524f5 Merge pull request #3806 from pipecat-ai/mb/ultravox-updates
Align Ultravox Realtime service with OpenAI/Gemini patterns
2026-02-26 10:49:21 -05:00
Mark Backman
907ff58d41 Align Ultravox Realtime service with OpenAI/Gemini patterns
- Add InterruptionFrame handling with stop_all_metrics()
- Add processing metrics (start/stop) at response boundaries
- Fix agent transcript handling for voice and text modalities:
  - Voice mode: push LLMTextFrame (append_to_context=False) and
    TTSTextFrame for deltas, skip duplicated final text
  - Text mode: push LLMTextFrame with proper response lifecycle,
    no TTSTextFrame (downstream TTS handles audio)
- Add output_medium parameter to AgentInputParams and OneShotInputParams
- Improve TTFB measurement using VAD speech end time
- Update example with user turn strategies and transcript events
- Add text-only output example (50a-ultravox-realtime-text.py)
2026-02-26 10:44:36 -05:00
Mark Backman
97b93ebe57 Merge pull request #3696 from pipecat-ai/mb/streaming-tts-input
Improve streaming TTS input support, add TextAggregationMetricsData
2026-02-26 10:26:53 -05:00
Mark Backman
3ae173520e Code review feedback 2026-02-26 10:23:35 -05:00
Paul Kompfner
c184ac09b8 Inline _build_live_options into _connect in DeepgramSTTService and DeepgramSageMakerSTTService since it's trivial and only called from one place 2026-02-26 09:42:15 -05:00
Paul Kompfner
3c20eda8bf Keep model/language in LiveOptions at construction time so apply_update's bidirectional sync is sufficient; simplify _build_live_options to only add sample_rate 2026-02-26 09:32:52 -05:00
Mark Backman
d69a337def Add text_aggregation_mode parameter to TTSService
Move the sentence vs token aggregation concern into text aggregators
so all text flows through them regardless of mode. This enables
pattern detection and tag handling to work in TOKEN mode.

- Add TextAggregationMode enum (SENTENCE, TOKEN) as the user-facing
  TTS setting, separate from the internal AggregationType
- Add TOKEN mode support to Simple, SkipTags, and PatternPair aggregators
- Add text_aggregation_mode parameter to TTSService and all TTS subclasses
- Deprecate aggregate_sentences in favor of text_aggregation_mode
- Merge TTSService._process_text_frame() into a single codepath
2026-02-26 08:55:41 -05:00
Mark Backman
f7434cdde1 Add text aggregation time metric for TTS sentence aggregation
Add TextAggregationMetricsData measuring the time from the first LLM
token to the first complete sentence, representing the latency cost of
sentence aggregation in the TTS pipeline.
2026-02-26 08:48:47 -05:00
Paul Kompfner
e21e8585f0 Add deepgram and sagemaker extras to CI test dependencies so Deepgram and Deepgram Sagemaker settings tests can run 2026-02-25 18:59:59 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
a4b6db6fb4 Flatten LiveOptions into individual fields on DeepgramSTTSettings and DeepgramSageMakerSTTSettings for backward-compatible dict-style updates via STTUpdateSettingsFrame; during the big service settings refactor, we accidentally got rid of the ability to update individual LiveOptions fields with a sparse update 2026-02-25 17:39:31 -05:00
Mark Backman
edc79d374a Merge pull request #3836 from pipecat-ai/mb/small-webrtc-prebuilt-2.3.0
Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0
2026-02-25 17:18:32 -05:00
Mark Backman
e521aef5df Merge pull request #3842 from pipecat-ai/mb/claude-plugin-docs
Add /update-docs skill to claude-plugin
2026-02-25 16:38:16 -05:00
kompfner
3cfff51205 Merge pull request #3827 from pipecat-ai/pk/gemini-tts-service-remove-model-ivar
Remove unnecessary `_model` ivar from `GeminiTTSService`, using `_set…
2026-02-25 16:14:38 -05:00
Paul Kompfner
3d8e3a4043 Remove unnecessary _model ivar from ElevenLabs STT services, using _settings.model instead. 2026-02-25 16:07:33 -05:00
Paul Kompfner
7ee0400c4c Remove unnecessary _model ivar from Hathora TTS and STT services, using _settings.model instead. 2026-02-25 16:07:26 -05:00
Paul Kompfner
781d191509 Remove unnecessary _model ivar from GeminiTTSService, using _settings.model instead 2026-02-25 15:59:38 -05:00
kompfner
a8cb2a26d1 Merge pull request #3841 from pipecat-ai/pk/groq-tweaks
A few Groq-related tweaks:
2026-02-25 15:54:33 -05:00
kompfner
b1df1ba5d4 Merge pull request #3834 from pipecat-ai/pk/make-ai-service-exclusive-syncer-of-model-name-to-metrics
Make it so that `AIService` is the exclusive "syncer" of model name t…
2026-02-25 15:53:59 -05:00
Mark Backman
eee2ef7e85 Add /update-docs skill to claude-plugin 2026-02-25 15:45:16 -05:00
Paul Kompfner
ff0f3dce32 A few Groq-related tweaks:
- Wire up passing speed setting to Groq, even though only a value of 1.0 is supported today
- Update the 55y example to switch voices instead of changing speed
- Add a 55zn example to exercise runtime updates of Groq STT
2026-02-25 15:10:48 -05:00
Paul Kompfner
bca42f7d68 Fix Hathora 55 series examples, and fix Hathora missing settings field warning 2026-02-25 14:48:40 -05:00
Paul Kompfner
27940d83a2 Make it so that AIService is the exclusive "syncer" of model name to metrics.
The only (rare) exception—where a service directly still needs to directly call `self._sync_model_name_to_metrics()`—is when the model name need to be "pulled" from another field (or nested field) in settings up to settings.model on a settings update. This only occurs in Deepgram services, where we use the voice as the model name.

This change has the side-effect of bringing model name to metrics for a number of services that were accidentally omitting it before.
2026-02-25 14:48:24 -05:00
Mark Backman
937c691f2a Merge pull request #3838 from pipecat-ai/mb/remove-playht
Remove PlayHT TTS services
2026-02-25 14:34:15 -05:00
Mark Backman
6803d38d3f Merge pull request #3833 from pipecat-ai/mb/add-performance-changelog-fragment
Add Performance as a changelog fragment option
2026-02-25 14:33:52 -05:00
Mark Backman
44993fe9e3 Remove PlayHT TTS services 2026-02-25 14:12:39 -05:00
Mark Backman
0fe4c732b7 Merge pull request #3837 from alts/alts/append-trailing-space
Add `append_trailing_space` to all Rime websocket services
2026-02-25 13:35:07 -05:00
Stephen Altamirano
ceead60ef2 Add append_trailing_space to all Rime websocket services
This was added in 31daa889e8, but only
to `RimeTTSService`, not to `RimeNonJsonTTSService. Bringing these
to parity means that users switching between the two, with the same
inputs, have more consistent vocalization behaviors.
2026-02-25 10:02:38 -08:00
Mark Backman
e028194dbe Update the pipecat-ai-small-webrtc-prebuilt to 2.3.0 2026-02-25 12:23:13 -05:00
Mark Backman
81f4672535 Add Performance as a changelog fragment option 2026-02-25 09:47:42 -05:00
Mark Backman
9273b158ea Merge pull request #3825 from pipecat-ai/mb/llm-user-aggregator-interim-transcription
Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
2026-02-25 09:06:34 -05:00
Mark Backman
353a28842c Merge pull request #3807 from pipecat-ai/mb/update-openai-realtime-1.5
Update OpenAI Realtime default model to gpt-realtime-1.5
2026-02-25 09:06:19 -05:00
Mark Backman
3e6c59c736 Merge pull request #3809 from pipecat-ai/mb/krisp-viva-result
Add Krisp API key support and debug logging
2026-02-25 09:05:12 -05:00
Mark Backman
0ca8c850fb Add TurnMetricsData and e2e processing time for KrispVivaTurn
Introduce a generic TurnMetricsData class for turn detection metrics,
replacing the service-specific SmartTurnMetricsData (now deprecated).
Add end-to-end processing time measurement to KrispVivaTurn, tracking
the interval from VAD speech-to-silence transition to model threshold
crossing. Consume metrics in the strategy _handle_input_audio path
so they are pushed immediately when fresh.
2026-02-25 09:01:21 -05:00
Mark Backman
73ee4da7d4 Add Krisp API key support for new SDK licensing requirement
The Krisp VIVA SDK v1.8.0 requires a license key in globalInit(). Add
api_key parameter to KrispVivaSDKManager, KrispVivaTurn, and
KrispVivaFilter with fallback to KRISP_API_KEY env var. Maintain
backwards compatibility with older SDK versions by catching TypeError
and falling back to the old 3-arg signature.
2026-02-25 09:01:00 -05:00
Filipi da Silva Fuchter
2f60074da3 Merge pull request #3814 from pipecat-ai/filipi/fix_close_context
Fixed an issue where the TTS providers did not close the context after the audio context finished processing all audio.
2026-02-25 08:21:04 -05:00
filipi87
751b1b8100 Adding the changelog entries for the tts fixes. 2026-02-25 10:18:25 -03:00
filipi87
d899f0af11 Refactored all AudioContextTTSService based providers to override the new callbacks instead of _handle_interruption(), making provider-specific cleanup cleaner and more explicit 2026-02-25 10:18:16 -03:00
filipi87
c09ae6ba6d Added two new lifecycle callbacks to AudioContextTTSService: on_audio_context_interrupted() and on_audio_context_completed() 2026-02-25 10:17:54 -03:00
Mark Backman
a187a4b3b2 Merge pull request #3830 from pipecat-ai/aleix/restore-dev-skills 2026-02-25 06:33:16 -05:00
Aleix Conchillo Flaqué
68e19a730b Restore dev skills and add marketplace for maintainer workflows
Brings back the 6 development workflow skills (changelog, cleanup,
code-review, docstring, pr-description, pr-submit) that were moved
to pipecat-ai/skills, and adds a .claude-plugin/marketplace.json so
other pipecat-ai repos can install them. Updates README contributing
section with installation instructions.
2026-02-24 23:47:06 -08:00
Mark Backman
67cb7d575f Merge pull request #3828 from pipecat-ai/mb/skip-empty-audio-filter-frames
Skip empty audio frames after filter buffering
2026-02-24 23:27:22 -05:00
Mark Backman
a84930dc3e Skip empty audio frames after filter buffering
Audio filters like RNNoise, KrispViva, and AIC return empty bytes while
buffering audio to accumulate their required frame size. These empty
frames were flowing downstream, causing misleading "Empty audio frame
received for STT service" warnings.

Skip the frame in BaseInputTransport when audio is empty, preventing
unnecessary processing in VAD and downstream processors.

Fixes #3517
2026-02-24 23:21:52 -05:00
kompfner
54fd73c460 Merge pull request #3821 from pipecat-ai/pk/fix-missing-field-warning-rime-tts
Fix missing field warning in `RimeTTSService`
2026-02-24 21:58:17 -05:00
kompfner
c954e1c898 Merge pull request #3820 from pipecat-ai/pk/fix-breakage-when-sending-generic-settings-update
Fix breakage when using a generic settings update (e.g. a `TTSSetting…
2026-02-24 21:58:05 -05:00
kompfner
db76cd052a Merge pull request #3819 from pipecat-ai/pk/make-update-settings-frames-uninterruptible
Make `ServiceUpdateSettingsFrame` uninterruptible—settings updates ar…
2026-02-24 21:57:41 -05:00
Mark Backman
167e68672b Add changelog for InterimTranscriptionFrame/TranslationFrame fix 2026-02-24 20:52:16 -05:00
Mark Backman
69d916ca51 Consume InterimTranscriptionFrame and TranslationFrame in LLMUserAggregator
These frames were falling through to the else branch and being pushed
downstream, unlike TranscriptionFrame which is explicitly consumed.
This aligns with how the assistant aggregator already filters them.
2026-02-24 20:51:41 -05:00
Mark Backman
9890b93d08 Merge pull request #3822 from pipecat-ai/fix/stt-ttfb-timeout-timestamp
Fix STT TTFB timeout measuring to use transcript arrival time
2026-02-24 19:45:15 -05:00
Mark Backman
f928206b3a Add changelog for STT TTFB timeout fix 2026-02-24 19:02:40 -05:00
Mark Backman
f421ad9cf6 Fix STT TTFB timeout measuring to timeout expiry instead of transcript time
PR #3776 replaced manual timestamp tracking with stop_ttfb_metrics() in
the timeout handler, but without an end_time it uses time.time() at
timeout expiry—producing TTFB = timeout + stop_secs (~2.2s) instead of
the actual transcript latency.

Restore _last_transcript_time tracking so the timeout handler measures
to when the transcript arrived, and skip reporting if none arrived.
2026-02-24 18:57:38 -05:00
Paul Kompfner
d918a20b75 Fix missing field warning in RimeTTSService 2026-02-24 18:14:16 -05:00
Paul Kompfner
d91c230b85 Fix breakage when using a generic settings update (e.g. a TTSSettings) instead of a more specific one (e.g. a RimeTTSSettings). Both should work, assuming you're only changing fields present in the generic settings. 2026-02-24 18:05:27 -05:00
Paul Kompfner
b6f21ab15d Make ServiceUpdateSettingsFrame uninterruptible—settings updates are generally independent of specific utterances.
Before this change, settings updates were often not applied. For example, a `TTSUpdateSettingsFrame` queued while the bot was speaking would only have an effect at the end of the bot's reply, and any interruption before the end of the reply would "cancel" the update.
2026-02-24 17:47:53 -05:00
kompfner
6f0061ab96 Merge pull request #3812 from pipecat-ai/pk/service-settings-storage-v-delta-mode
Make clearer the distinction between "storage-mode" and "delta-mode" …
2026-02-24 15:37:49 -05:00
Aleix Conchillo Flaqué
761397d1f9 Merge pull request #3816 from pipecat-ai/aleix/use-skills-repo
Move skills to pipecat-ai/skills repo, add README instructions
2026-02-24 12:11:20 -08:00
Aleix Conchillo Flaqué
d9bb4d07c6 Merge pull request #3815 from pipecat-ai/fix/sentry-metrics-signatures
Fix SentryMetrics method signatures to match base class
2026-02-24 12:11:07 -08:00
Aleix Conchillo Flaqué
ee46cbce4c Move skills to pipecat-ai/skills repo, add README instructions
Remove bundled Claude Code skills (changelog, cleanup, code-review,
docstring, pr-description, pr-submit) that now live in
https://github.com/pipecat-ai/skills. Add a section to the README
with installation instructions. The update-docs skill remains as
it is specific to this repository.
2026-02-24 11:41:19 -08:00
Aleix Conchillo Flaqué
b4b9976b9c Fix SentryMetrics method signatures to match base class
Update start_ttfb_metrics, stop_ttfb_metrics, start_processing_metrics,
and stop_processing_metrics to accept start_time/end_time keyword
arguments matching the updated FrameProcessorMetrics signatures.

Closes #3808
2026-02-24 11:26:34 -08:00
Paul Kompfner
b78a293ffb Flatten input_params into individual fields on SonioxSTTSettings and GladiaSTTSettings
This makes each service-specific field individually visible to the delta/update mechanism (`apply_update`, `given_fields`) and removes the need for complex sync logic between `input_params` and top-level fields like `model`.

- Soniox: replace `input_params: SonioxInputParams` with 8 individual fields, simplify `_update_settings` by removing model sync logic, remove unused `is_given` import
- Gladia: replace `input_params: GladiaInputParams` with 11 individual fields, resolve deprecated `language` → `language_config` at init time rather than at `_prepare_settings` time
2026-02-24 14:01:43 -05:00
Paul Kompfner
0a89d24f70 Update some more services to ensure that there are no un-initialized fields in self._settings 2026-02-24 14:01:43 -05:00
Paul Kompfner
8c9ccf8f82 Bump various deprecation messages from mentioning version 0.0.103 to 0.0.104 2026-02-24 14:01:43 -05:00
Paul Kompfner
bcc2b4def4 Make clearer the distinction between "storage-mode" and "delta-mode" usage of *Settings objects
- Storage mode: for use in `self._settings`. All fields should be specified, i.e. should not be `NOT_GIVEN`.
- Delta mode: for use in `*UpdateSettingsFrame`.

In service of this, this commit:
- Adds a runtime check that all fields are specified in storage mode
- Updates all services to specify all fields in stored settings
- Updates all services to no longer check for `is_given` in stored settings (not necessary anymore)
- Updates relevant docstrings
- Renames `update` to `delta` in `*UpdateSettingsFrame`
- Updates community integrations guide
2026-02-24 14:01:28 -05:00
Filipi da Silva Fuchter
57d25c564c Merge pull request #3786 from pipecat-ai/filipi/refactor_word_tts_service
Refactoring the services using the WordTTSService
2026-02-24 13:53:58 -05:00
filipi87
6cda2ff941 Changelog entry for word timestamp refactor and deprecation notes. 2026-02-24 15:49:02 -03:00
filipi87
323477bfa4 Refactoring the services using the WordTTSService. 2026-02-24 15:48:46 -03:00
Mark Backman
fa692ec989 Merge pull request #3813 from pipecat-ai/mb/fix-stt-ttfb
Fix STT TTFB metrics for Soniox and AWS Transcribe
2026-02-24 13:12:32 -05:00
Mark Backman
23ad181515 Fix Soniox processing metrics to measure token-to-transcript time
Move start_processing_metrics from run_stt (called per audio chunk,
producing noisy 0ms logs) to _receive_messages when the first final
token arrives for a new utterance. The existing stop_processing_metrics
in send_endpoint_transcript completes the pair, giving a meaningful
measurement of time from first recognition to finalized transcript.
2026-02-24 13:09:29 -05:00
Mark Backman
6f7664846c Add can_generate_metrics to Soniox and AWS Transcribe STT services
Commit 859cd7c9 refactored STT TTFB measurement to use the base class
start_ttfb_metrics/stop_ttfb_metrics, which are gated behind
can_generate_metrics(). Soniox and AWS Transcribe never overrode this
method (default returns False), so TTFB was silently never reported.
2026-02-24 12:59:44 -05:00
Mark Backman
081aaa50dc Merge pull request #3811 from pipecat-ai/mb/nltk-upgrade
Bump nltk minimum version from 3.9.1 to 3.9.3
2026-02-24 10:28:32 -05:00
Mark Backman
aff8ab8a40 Update OpenAI Realtime default model to gpt-realtime-1.5 2026-02-24 09:07:31 -05:00
Mark Backman
0f7e6e14ab Bump nltk minimum version from 3.9.1 to 3.9.3
Resolves a security vulnerability flagged by Dependabot (#164).
2026-02-24 08:56:00 -05:00
Mark Backman
65f563ad34 Add debug logging to KrispVivaTurn analyze_end_of_turn and update example
Move speech detection tracking outside the per-frame loop in append_audio
since is_speech applies to the whole buffer. Add debug log in
analyze_end_of_turn to show state and probability at decision time. Update
the Krisp VIVA example to use Cartesia TTS and turn analyzer strategy.
2026-02-23 21:35:35 -05:00
Mark Backman
9c2ac661a3 Merge pull request #3805 from pipecat-ai/mb/dataclass-basemodel
Add dataclass vs Pydantic BaseModel convention to CLAUDE.md
2026-02-23 19:32:31 -05:00
kompfner
cdd65b6c0a Merge pull request #3714 from pipecat-ai/pk/service-settings-refactor
Broad refactor of service settings and how they’re updated at runtime
2026-02-23 17:15:15 -05:00
Paul Kompfner
71fc078c24 Refine ServiceSettings docstring: clarify NOT_GIVEN semantics and fix frame reference
Use wildcard `*UpdateSettingsFrame` to cover all frame types. Clarify that NOT_GIVEN only appears in update deltas, not in the service's current settings state.
2026-02-23 16:55:11 -05:00
Paul Kompfner
7556427862 Revise changelog entries for service settings refactor
Split the single "changed" entry into separate "added", "changed", and "deprecated" entries for clarity. Add a note about the subtle behavior change in the deprecated set_model/set_voice/set_language methods.
2026-02-23 16:52:11 -05:00
Paul Kompfner
bcf11ecbd4 Looks like the Deepgram Sagemaker TTS services aren't able yet to successfully disconnect/reconnect to apply runtime settings updates. For now, marking them as not yet supporting runtime settings updates. 2026-02-23 16:02:00 -05:00
Paul Kompfner
ff174dd1c2 Fix STT/TTS Deepgram Sagemaker 55-series examples (examples updating settings at runtime) 2026-02-23 16:02:00 -05:00
Paul Kompfner
e804060e17 Update COMMUNITY_INTEGRATIONS.md _update_settings examples
Simplify the reconnect example to show a common pattern (reconnect on any change) and improve the _warn_unhandled_updated_settings example to show selective handling of specific fields.
2026-02-23 15:45:00 -05:00
Paul Kompfner
30db5fea7c Clarify that ServiceSettings and subclasses represent runtime-updatable settings
Update docstrings for ServiceSettings, LLMSettings, TTSSettings, and STTSettings to make clear these capture only the subset of service configuration that can be changed while the pipeline is running via UpdateSettingsFrame, not all constructor parameters.
2026-02-23 15:38:57 -05:00
Mark Backman
c527e1f30f Add dataclass vs Pydantic BaseModel rule to CLAUDE.md
Document the existing convention: use @dataclass for frames and
internal pipeline data, use Pydantic BaseModel for configuration,
parameters, metrics, and external API data.
2026-02-23 14:26:16 -05:00
Paul Kompfner
029f3dbefb Updating 55o ElevenLabsTTSService example to also exercise switching voices, which requires reconnect 2026-02-23 12:08:13 -05:00
kompfner
03cb0054f9 Merge branch 'main' into pk/service-settings-refactor 2026-02-23 11:46:03 -05:00
Mark Backman
6a7e9358c6 Merge pull request #3803 from pipecat-ai/mb/inline-smart-turn-v3-deps
Inline local-smart-turn-v3 deps for Poetry compatibility
2026-02-23 09:29:52 -05:00
Mark Backman
6a3718d33d Inline local-smart-turn-v3 deps for Poetry compatibility
Replace self-referential `pipecat-ai[local-smart-turn-v3]` extra in core
dependencies with the actual packages (`transformers`, `onnxruntime`).
Self-referential extras are not supported by Poetry and cause dependency
resolution failures. Since these are required by the default turn stop
strategy (LocalSmartTurnAnalyzerV3), they belong in core dependencies.

- Remove `local-smart-turn-v3` optional extra from pyproject.toml
- Remove try/except ModuleNotFoundError guard (now always installed)
- Remove `--extra local-smart-turn-v3` from CI workflows
2026-02-23 09:00:36 -05:00
Om Chauhan
b390dc369c added changelog 2026-02-21 18:33:29 +05:30
Om Chauhan
a18aa738e0 fix(realtime): handle response_cancel_not_active as non-fatal 2026-02-21 18:26:31 +05:30
Om Chauhan
9476b5d184 added changelog 2026-02-21 17:35:08 +05:30
Om Chauhan
f49658de15 skipping provider-specific messages during summarization 2026-02-21 17:19:50 +05:30
Aleix Conchillo Flaqué
b67af19d47 Merge pull request #3793 from pipecat-ai/changelog-0.0.103
Release 0.0.103 - Changelog Update
2026-02-20 16:42:46 -08:00
aconchillo
6d9c07b945 Update changelog for version 0.0.103 2026-02-20 16:39:36 -08:00
Aleix Conchillo Flaqué
18429f80f1 github(changelog): allow performance type 2026-02-20 16:32:40 -08:00
Aleix Conchillo Flaqué
0a54dc9721 Merge pull request #3792 from pipecat-ai/aleix/update-anthropic-default-model
Update default Anthropic model to claude-sonnet-4-6
2026-02-20 16:28:08 -08:00
Aleix Conchillo Flaqué
521f669051 Add changelog entries for PR #3792 2026-02-20 16:18:21 -08:00
Aleix Conchillo Flaqué
abb20f34ba Update default Anthropic model to claude-sonnet-4-6
Update the default model in AnthropicLLMService and remove the
now-unnecessary explicit model from the function calling example.
2026-02-20 16:17:51 -08:00
Joshua Primas
d38b1d97d4 Added changelog 2026-02-20 16:13:44 -08:00
Joshua Primas
0b4568843b Improved logging + error handling + pipecat bot name usage 2026-02-20 15:59:52 -08:00
Joshua Primas
35aba4128c Adding the LemonSlice transport integration 2026-02-20 15:24:48 -08:00
Aleix Conchillo Flaqué
b1e72ad4b7 Merge pull request #3789 from pipecat-ai/aleix/fix-missing-await-and-interruption-hang
Fix missing await and interruption timeout hang
2026-02-20 14:59:11 -08:00
Aleix Conchillo Flaqué
f610fb95f9 Add changelog entries for PR #3789 2026-02-20 14:56:46 -08:00
Aleix Conchillo Flaqué
827032fefb Unblock push_interruption_task_frame_and_wait after timeout
When the InterruptionFrame does not complete within the timeout the
caller was stuck in an infinite loop logging warnings. Now the event
is set after the first timeout so the processor can continue.

Also adds a keyword timeout parameter so callers can customize the
wait duration.
2026-02-20 14:56:42 -08:00
Aleix Conchillo Flaqué
af4ef95dc6 Fix missing await on add_audio_frames_message in Google audio examples
The method is async but was being called without await, silently
discarding the coroutine.
2026-02-20 14:24:22 -08:00
Paul Kompfner
af4226adbf Add changelog entries for service settings refactor PR #3714 2026-02-20 15:26:17 -05:00
Paul Kompfner
29e2a861dc Update AIService.set_model_name to AIService._sync_model_name_to_metrics to:
- indicate clearly that it's not meant for public use
- make it clear the `self._settings` is the single source of truth for model information
- set the stage for an upcoming change where `AIService` subclasses won't have to ever worry about explicitly calling an `AIService` method to sync model name to metrics

Across all services, switch from accessing `self._model_name` or `self.model_name` in favor of `self._settings.model`.
2026-02-20 15:02:50 -05:00
Paul Kompfner
f5b86d9cdc Actually, revert the change making it so that STTService takes model and language args at init time. It'll be up to the subclasses to append those to _settings (or better yet, provide their own service-specific _settings). This avoids rocking the boat too too much. 2026-02-20 11:26:28 -05:00
Paul Kompfner
f4e9825c03 Remove self._voice_id from TTS Service implementations in favor of self._settings.voice 2026-02-20 10:52:57 -05:00
Paul Kompfner
5d8a5bf750 Add initialization of self._settings to service superclasses (STTService, TTSService, LLMService), using "generic" settings for those services (STTSettings, TTSSettings, LLMSettings) 2026-02-20 09:31:22 -05:00
Paul Kompfner
fb27642190 Add self._settings to 6 remaining services
- AWSNovaSonicLLMService: new `AWSNovaSonicLLMSettings` with `voice_id` and `endpointing_sensitivity`; remove `self._params` entirely, storing audio I/O config as plain instance variables
- NeuphonicHttpTTSService: reuse `NeuphonicTTSSettings`; use inherited `language` field instead of bespoke `lang_code`
- NvidiaTTSService: new `NvidiaTTSSettings` with `quality`
- PiperTTSService / PiperHttpTTSService: new `PiperTTSSettings` / `PiperHttpTTSSettings` (no extra fields)
- SpeechmaticsTTSService: new `SpeechmaticsTTSSettings` with `max_retries`

Also remove redundant `lang_code` from `NeuphonicTTSSettings` (both WS and HTTP services now use the inherited `TTSSettings.language` field, with automatic enum conversion via the base class).

HTTP services (Neuphonic HTTP, Piper HTTP, Speechmatics) don't override `_update_settings` since the base class applies changes to `self._settings` and subsequent requests read from it automatically.
2026-02-19 18:35:59 -05:00
Paul Kompfner
463ea3725b Update Deepgram Flux with the new service settings pattern 2026-02-19 17:12:24 -05:00
Paul Kompfner
6c609031ee Add more 55-series examples
Also:
- remove unnecessary pass-through `_update_settings` implementation in `FalSTTService`
- warn that `AsyncAITTSService` doesn't currently support runtime settings updates
- update how `GradiumTTSService._update_settings` checks for voice changes
- remove a couple of unnecessary args (because they specified defaults) in other examples
2026-02-19 16:46:14 -05:00
Paul Kompfner
ebb42a3c6d Fix forward reference crash in Google and Anthropic LLM ThinkingConfig
ThinkingConfig was defined as an inner class on the service but referenced in the Settings dataclass declared before the service class, causing a crash at import time. Move ThinkingConfig to a standalone class defined before Settings, and keep a class attribute alias for backward compatibility.
2026-02-19 15:06:48 -05:00
Paul Kompfner
cc54ff4708 Add more 55-series examples 2026-02-19 14:55:21 -05:00
Paul Kompfner
421696e1c2 Replace Any with specific types and add | _NotGiven to all *Settings field annotations across 49 service files
Every `*Settings` dataclass field whose default is `NOT_GIVEN` now carries `_NotGiven` in its type union so the type system accurately reflects the three-state semantics (real value, `None` where applicable, or not-yet-specified). Fields previously typed as bare `Any`, `str`, `float`, `bool`, `list`, `dict`, or `Optional[X]` are now narrowed to the specific type from the corresponding `InputParams` Pydantic model.
2026-02-19 11:28:29 -05:00
Paul Kompfner
a7edd8e441 Fix 55zp example 2026-02-18 17:15:22 -05:00
Paul Kompfner
2a07138abf Fix Grok Realtime dynamic session properties updating, and update corresponding 55zo example 2026-02-18 17:12:36 -05:00
Paul Kompfner
ad942f6e4c Update 55zn example (UIltravox dynamic settings updates) to exercise changing modality, which is a setting that supports dynamic updates 2026-02-18 16:33:05 -05:00
Paul Kompfner
97d34ef9e1 Update OpenAI Realtime to warn when you try to update settings that can't be updated dynamically.
Update corresponding example to demonstrate updating output modality.
2026-02-18 16:16:06 -05:00
Paul Kompfner
c054780477 Fix 55zh example 2026-02-18 15:59:34 -05:00
Paul Kompfner
88a2dbdb82 Update 55zf example to update a setting that is supported by the default Camb TTS model 2026-02-18 15:48:50 -05:00
Paul Kompfner
d386a0efda Update Sarvam TTS to apply all changes to settings, not just voic 2026-02-18 15:31:08 -05:00
Paul Kompfner
b718a23c17 Tweak 55zd example 2026-02-18 15:25:50 -05:00
Paul Kompfner
e38f7d9451 Fix 55zc example 2026-02-18 15:23:23 -05:00
Paul Kompfner
b00d454842 Fix Inworld TTS settings updating 2026-02-18 15:19:57 -05:00
Paul Kompfner
0fa51811ea Fix 55z example 2026-02-18 15:11:04 -05:00
Paul Kompfner
323ee00b83 Fix 55w example 2026-02-18 14:51:48 -05:00
Paul Kompfner
0c73b77327 Update Lmnt TTS to support updating settings dynamically 2026-02-18 14:47:38 -05:00
Paul Kompfner
416e1cf877 Update Rime TTS services to store voice in the standard settings.voice field, as opposed to the nonstandard speaker field 2026-02-18 14:46:47 -05:00
Paul Kompfner
b4c5cb258b Tweak 55r example to make the settings update more pronounced 2026-02-18 14:15:14 -05:00
Paul Kompfner
728a97ade3 Update Deepgram TTS to support updating settings dynamically 2026-02-18 14:11:51 -05:00
Paul Kompfner
28677ec829 Tweak 55p example to make the settings update more pronounced 2026-02-18 13:49:32 -05:00
Paul Kompfner
17886d14e8 Fix ElevenLabsTTSService settings update code 2026-02-18 13:47:02 -05:00
Paul Kompfner
caf5dacbe8 Update 55j example to avoid console warning 2026-02-18 12:37:50 -05:00
Paul Kompfner
b8b531b66a In Cartesia TTS service, we don't need to override _update_settings. Parent class handling is enough, as new settings are picked up on the next run_tts (no need to reconnect). 2026-02-18 12:37:34 -05:00
Paul Kompfner
a14690e3a0 Fix the 55i example 2026-02-18 11:55:14 -05:00
Paul Kompfner
d913d954db Fix SpeechmaticsSTTService settings update code, and augment test file to better exercise it 2026-02-18 11:34:52 -05:00
Paul Kompfner
e98bb1df66 Simplify 55* examples: inline the settings update directly in the on_client_connected handler instead of wrapping it in a separate async task 2026-02-18 11:06:33 -05:00
Paul Kompfner
a7ada79fd9 Fix ElevenLabsRealtimeSTTService:
- Move `CommitStrategy` up in the file so it could be used by `ElevenLabsRealtimeSTTSettings`
- Fix a bug where `run_tts` would erroneously try to reconnect if a reconnection was already in flight (like a reconnection triggered by `_update_settings`)
2026-02-18 10:50:53 -05:00
Paul Kompfner
7910f20e14 Update comment in Azure TTS explaining how we could support dynamic settings updates in the future 2026-02-18 10:07:33 -05:00
Paul Kompfner
d7d94a29f0 Add foundational examples (55) for runtime settings updates via *UpdateSettingsFrame
42 examples covering STT (13), TTS (21), LLM (4), and realtime (4) services. Each demonstrates updating service settings 10 seconds after client connects, verifying the typed settings machinery end-to-end for every provider.
2026-02-18 09:46:23 -05:00
Paul Kompfner
ce51df677c Add backward-compat _aliases and from_mapping overrides to TTS settings
The migration from plain-dict `self._settings` to typed dataclasses renamed keys and flattened nested dicts. The deprecated dict-based `TTSUpdateSettingsFrame(settings={...})` code path calls `from_mapping`, which silently dropped old keys into `extra`.

- Add `_aliases` so renamed flat keys (e.g. `sample_rate` → `fish_sample_rate`, camelCase Inworld keys) resolve correctly.
- Override `from_mapping` to destructure nested dicts (`output_format`, `prosody`, `audioConfig`, `voice_setting`, `audio_setting`) into their flat field equivalents.
- Fix AsyncAI constructor bug passing `output_format={...}` dict instead of individual `output_container`/`output_encoding`/`output_sample_rate` fields.
2026-02-17 17:07:14 -05:00
Paul Kompfner
68ebd3d063 Migrate HumeTTSService to standard TTSSettings pattern and remove dead TTSService.update_setting
HumeTTSService now stores its params (description, speed, trailing_silence) in a proper `HumeTTSSettings` dataclass instead of a separate `_params` Pydantic model, making it work with `TTSUpdateSettingsFrame(update=...)`. The old `update_setting(key, value)` method is kept but deprecated.

Also removes the unused no-op `TTSService.update_setting` base method, which was never called by the `TTSUpdateSettingsFrame` pipeline.
2026-02-17 15:44:41 -05:00
Radek Sedlák
5ea2d47d39 feat: Add support for private endpoint in Azure STT 2026-02-17 21:42:00 +01:00
Paul Kompfner
94a651cee2 Remove dead ServiceSettings.to_dict method 2026-02-17 15:15:18 -05:00
Paul Kompfner
1cad4210ce Deprecate dict-based *UpdateSettingsFrame(settings={...}) code path in STT, TTS, and LLM services.
The dataclass-based API (`*UpdateSettingsFrame(update=*Settings(...))`) is the preferred path since 0.0.103. The dict path still works but now emits a `DeprecationWarning`.
2026-02-17 15:09:39 -05:00
Paul Kompfner
1cec8d119d Expand language field docstrings to clarify storage invariant.
The union type reflects the input side; after construction and `_update_settings`, the stored value is always a service-specific string.
2026-02-17 14:57:38 -05:00
Paul Kompfner
7dc16b1d92 Type language fields and centralize conversion in STT services.
Change `TTSSettings.language` and `STTSettings.language` from `Any` to `Language | str | _NotGiven`. Add `language_to_service_language` base method and centralized `isinstance`-guarded conversion in `STTService._update_settings` (mirroring TTS). Update the TTS guard from `is not None` to `isinstance(…, Language)` so raw strings pass through unchanged.

Remove now-redundant per-service language conversion from `_update_settings` overrides (ElevenLabs, Azure, Fal, Whisper). Add `language_to_service_language` to Azure STT so the centralized conversion picks it up. Fix AWS and NVIDIA STT `__init__` to convert language at construction time, then simplify their runtime accessors to read `_settings.language` directly.
2026-02-17 14:49:26 -05:00
Paul Kompfner
d2372c127a Add specific type annotations to ServiceSettings fields, replacing Any with str, float, int unions as appropriate. 2026-02-17 11:56:37 -05:00
Paul Kompfner
3b1ba57452 Change apply_update / _update_settings return type from set[str] to dict[str, Any]. The dict maps each changed field name to its pre-update value, enabling services to do granular diffing of complex settings objects. Existing call-site patterns ("field" in changed, if changed, iteration) work unchanged; set-difference sites use changed.keys() - {...}. 2026-02-17 11:49:15 -05:00
Paul Kompfner
02c2778b8d Document _warn_unhandled_updated_settings pattern in COMMUNITY_INTEGRATIONS.md. 2026-02-17 11:08:26 -05:00
Paul Kompfner
fa6a6dabee Fix DeepgramSageMakerSTTService._update_settings live_options sync to match DeepgramSTTService pattern.
Add missing reverse sync (live_options → top-level model/language) and `set_model_name()` call.
2026-02-17 11:02:13 -05:00
Paul Kompfner
3a77b4c1d8 In services that don't handle runtime settings updates—or don't handle them for *all* available settings—log a warning about which fields specifically aren't handled. Revert new apply-settings-updates logic across various services, to reduce PR testing scope. This logic can be added service by service gradually as future work.
Note that for services that previously handled applying updates (through methods like `set_model` and `set_language`), we're keeping the update-applying logic (some or most of which is already well-tested) and expanding it to cover all relevant settings fields. Services under this bucket are:
- Deepgram STT
- Deepgram Sagemaker STT
- Elevenlabs STT
- Google STT
- Gradium STT
- OpenAI STT
- Speechmatics STT
2026-02-17 10:58:29 -05:00
Paul Kompfner
66b7b4a5d4 Update COMMUNITY_INTEGRATIONS.md for the new dataclass-based service settings pattern. 2026-02-13 16:04:49 -05:00
Paul Kompfner
b08548af9d Remove typed-settings migration scaffolding and rename _update_settings_from_typed to _update_settings.
Now that all services use typed `ServiceSettings` objects, this removes the interim scaffolding that supported both dict-based and typed settings paths in parallel. Specifically: removes old dict-based `_update_settings(settings: Mapping)` methods from base classes, removes `isinstance(self._settings, ServiceSettings)` guards, simplifies `process_frame` branching, and renames `_update_settings_from_typed` to `_update_settings` across all ~30 service implementations. Also renames the no-arg `_update_settings()` helper on realtime services to `_send_session_update()` to avoid collision, adds `from_mapping` overrides on `GoogleLLMSettings` and `AnthropicLLMSettings` for ThinkingConfig dict-to-object conversion, and replaces a broken no-arg `_update_settings()` call in Gemini Live with a TODO.
2026-02-13 15:12:26 -05:00
Paul Kompfner
ab92a0e1d7 Remove/deprecate service-specific set_model and set_voice overrides.
- NvidiaSTTService.set_model: convert to proper DeprecationWarning (model can't change at runtime for Riva streaming STT)
- NvidiaTTSService.set_model: same treatment for Riva TTS
- NvidiaSegmentedSTTService.set_model: remove — base class now routes through _update_settings_from_typed which re-creates the recognition config
- GeminiTTSService.set_voice: remove — move AVAILABLE_VOICES validation into _update_settings_from_typed so it fires on both legacy and new paths
2026-02-13 15:12:26 -05:00
Paul Kompfner
e37f2f99c4 Deprecate set_model, set_voice, and set_language in favor of *UpdateSettingsFrame. 2026-02-13 15:12:26 -05:00
Paul Kompfner
e43351f5f8 Add class-level _settings type annotations to all service classes for better editor support.
Standardize all STT, TTS, and LLM service classes to declare `_settings` with the narrowed Settings type as a class-level annotation. This gives editors and type checkers the specific type when hovering or autocompleting on `self._settings` in each service and its subclasses. Inline `self._settings: Type = ...` assignments are replaced with plain `self._settings = ...`.
2026-02-13 15:12:26 -05:00
Paul Kompfner
444cbb6499 Add turn-completion fields to LLMSettings and handle them in the typed-service-settings path.
`filter_incomplete_user_turns` and `user_turn_completion_config` were only handled in the legacy dict-based `_update_settings` code path. This adds them to `LLMSettings` and introduces `LLMService._update_settings_from_typed` so the typed path handles them too.
2026-02-13 15:12:26 -05:00
Paul Kompfner
8a4ab611be Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with *UpdateSettingsFrames.
Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields.

Other than discoverability/typing, these are some other improvements brought by this refactor:
- There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed
- `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change.
- Several services now properly react to changed settings by reconnecting:
  - `AWSTranscribeSTTService`
  - `AzureSTTService`
  - `SonioxSTTService`
  - `GladiaSTTService`
  - `SpeechmaticsSTTService`
  - `AssemblyAISTTService`
  - `CartesiaSTTService`
  - `FishAudioTTSService` (would previously only reconnect when `model` changed)
  - `GoogleSTTService`
  - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method)
  - `GradiumSTTService`
  - `NvidiaSegmentedSTTService` (which previously only handled changes to language)
- Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth

NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
2026-02-13 15:12:26 -05:00
590 changed files with 46748 additions and 14304 deletions

View File

@@ -0,0 +1,27 @@
{
"name": "pipecat-dev-skills",
"owner": {
"name": "Pipecat"
},
"metadata": {
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0"
},
"plugins": [
{
"name": "pipecat-dev",
"description": "Development workflow skills for contributing to the Pipecat project",
"version": "1.0.0",
"source": "./",
"skills": [
"./.claude/skills/changelog",
"./.claude/skills/cleanup",
"./.claude/skills/code-review",
"./.claude/skills/docstring",
"./.claude/skills/pr-description",
"./.claude/skills/pr-submit",
"./.claude/skills/update-docs"
]
}
]
}

View File

@@ -32,6 +32,20 @@ Create changelog files for the important commits in this PR. The PR number is pr
6. Use ⚠️ emoji prefix for breaking changes.
7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
**Good** (user-facing first, implementation detail as context):
```
- Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
```
**Bad** (implementation detail only, no user-facing framing):
```
- Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
```
Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
## Example
For PR #3519 with a new feature and a bug fix:
@@ -43,5 +57,5 @@ For PR #3519 with a new feature and a bug fix:
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
```

View File

@@ -1,6 +1,6 @@
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecats architecture, coding standards, and example patterns**.
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
It focuses on **readability, correctness, performance, and consistency**, while avoiding breaking changes.
---
@@ -28,9 +28,9 @@ This skill analyzes all changes introduced in your branch and performs the follo
Invoke the skill using any of the following commands:
- Clean up my branch code
- Refactor the changes in my branch
- Review and improve my branch code
- "Clean up my branch code"
- "Refactor the changes in my branch"
- "Review and improve my branch code"
- `/cleanup`
---

View File

@@ -3,21 +3,20 @@ name: docstring
description: Document a Python module and its classes using Google style
---
Document a Python module and its classes using Google-style docstrings following project conventions. The class name is provided as an argument.
Document a Python module or class using Google-style docstrings following project conventions. The argument can be a class name or a module path.
## Instructions
1. First, find the class in the codebase:
```
Search for "class ClassName" in src/pipecat/
```
1. Determine what to document based on the argument:
2. If multiple files contain that class name:
- List all matches with their file paths
- Ask the user which one they want to document
- Wait for confirmation before proceeding
**If a module path is provided** (e.g. `src/pipecat/audio/vad/vad_analyzer.py`):
- Use that file directly
3. Once the file is identified, read the module to understand its structure:
**If a class name is provided** (e.g. `VADAnalyzer`):
- Search for `class ClassName` in `src/pipecat/`
- If multiple files contain that class name, list all matches with their file paths, ask the user which one they want to document, and wait for confirmation
2. Once the file is identified, read the module to understand its structure:
- Identify all classes, functions, and important type aliases
- Understand the purpose of each component

View File

@@ -157,7 +157,11 @@ After processing all mapped pairs, check for two kinds of gaps:
**Missing sections**: Mapped doc pages that are missing standard sections compared to the source. For example, a transport page with no Configuration section, or a service page with no InputParams table when the source defines `InputParams(BaseModel)`. Flag these and offer to add the missing sections.
If the user wants a new page, create it using this template structure:
If the user wants a new page, do all three of the following:
#### 8a: Create the doc page
Create the new `.mdx` file using this template structure:
```
---
title: "Service Name"
@@ -207,6 +211,53 @@ pip install "pipecat-ai[package-name]"
[Event table and example code]
```
#### 8b: Add to docs.json
Add the new page path to `DOCS_PATH/docs.json` in the correct navigation group. The path format is `server/services/{category}/{provider}` (without the `.mdx` extension).
Find the matching group in the navigation structure:
- **STT** → `"group": "Speech-to-Text"` under Services
- **TTS** → `"group": "Text-to-Speech"` under Services
- **LLM** → `"group": "LLM"` under Services
- **S2S** → `"group": "Speech-to-Speech"` under Services
- **Transport** → `"group": "Transport"` under Services
- **Serializer** → `"group": "Serializers"` under Services
- **Image generation** → `"group": "Image Generation"` under Services
- **Video** → `"group": "Video"` under Services
- **Memory** → `"group": "Memory"` under Services
- **Vision** → `"group": "Vision"` under Services
- **Analytics** → `"group": "Analytics & Monitoring"` under Services
Insert the new entry **alphabetically** within the group's `pages` array. For example, adding a new STT service "foo":
```json
{
"group": "Speech-to-Text",
"pages": [
"server/services/stt/assemblyai",
"server/services/stt/aws",
...
"server/services/stt/foo",
...
]
}
```
#### 8c: Add to supported-services.mdx
Add a new row to the correct category table in `DOCS_PATH/server/services/supported-services.mdx`.
Use this format:
```
| [DisplayName](/server/services/{category}/{provider}) | `pip install "pipecat-ai[package]"` |
```
To determine the correct values:
- **DisplayName**: Use the service's human-readable name (e.g., "ElevenLabs", "AWS Polly", "Google Gemini")
- **package**: Look at the service's `pyproject.toml` extras or the import pattern in the source code. For example, if the service is in `src/pipecat/services/foo/`, the package is typically `foo`.
- If no pip dependencies are required, use `No dependencies required` instead.
Insert the new row **alphabetically** within the table. Match the column alignment of the existing rows.
### Step 9: Output summary
After all edits are complete, print a summary:
@@ -221,6 +272,9 @@ After all edits are complete, print a summary:
### Updated guides
- `guides/learn/speech-to-text.mdx` — Updated code example (renamed `old_param` → `new_param`)
### New service pages
- `server/services/tts/newprovider.mdx` — Created page, added to docs.json (Text-to-Speech), added to supported-services.mdx
### Unmapped source files
- `src/pipecat/services/newprovider/tts.py` — NewProviderTTSService (no doc page exists)
@@ -247,4 +301,6 @@ Before finishing, verify:
- [ ] New parameters have accurate types and defaults from source
- [ ] Formatting matches the existing page style
- [ ] Guides referencing changed APIs were checked and updated
- [ ] New service pages were added to `docs.json` in the correct group, alphabetically
- [ ] New service pages were added to `supported-services.mdx` in the correct table, alphabetically
- [ ] Unmapped files were reported to the user

View File

@@ -37,11 +37,12 @@ jobs:
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra local-smart-turn-v3 \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket

View File

@@ -86,7 +86,7 @@ jobs:
fi
# Validate fragment types
VALID_TYPES="added changed deprecated removed fixed security other"
VALID_TYPES="added changed deprecated removed fixed performance security other"
INVALID_FRAGMENTS=""
for file in changelog/*.md; do

View File

@@ -41,11 +41,12 @@ jobs:
uv sync --group dev \
--extra anthropic \
--extra aws \
--extra deepgram \
--extra google \
--extra langchain \
--extra livekit \
--extra local-smart-turn-v3 \
--extra piper \
--extra sagemaker \
--extra tracing \
--extra websocket

147
.github/workflows/update-docs.yml vendored Normal file
View File

@@ -0,0 +1,147 @@
name: Update Documentation on PR Merge
on:
pull_request_target:
types: [closed]
branches: [main]
paths:
- "src/pipecat/services/**"
- "src/pipecat/transports/**"
- "src/pipecat/serializers/**"
- "src/pipecat/processors/**"
- "src/pipecat/audio/**"
- "src/pipecat/turns/**"
- "src/pipecat/observers/**"
- "src/pipecat/pipeline/**"
workflow_dispatch:
inputs:
pr_number:
description: "PR number to generate docs for"
required: true
type: string
jobs:
update-docs:
if: >-
github.event_name == 'workflow_dispatch' ||
github.event.pull_request.merged == true
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: read
pull-requests: read
id-token: write
steps:
- name: Checkout pipecat
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Checkout docs
uses: actions/checkout@v4
with:
repository: pipecat-ai/docs
token: ${{ secrets.DOCS_SYNC_TOKEN }}
path: _docs
- name: Resolve PR number
id: pr
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
echo "number=${{ inputs.pr_number }}" >> "$GITHUB_OUTPUT"
else
echo "number=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
fi
- name: Update documentation
uses: anthropics/claude-code-action@v1
env:
DOCS_SYNC_TOKEN: ${{ secrets.DOCS_SYNC_TOKEN }}
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
github_token: ${{ secrets.GITHUB_TOKEN }}
prompt: |
You are updating documentation for the pipecat-ai/docs repository based on
changes merged in PR #${{ steps.pr.outputs.number }} of pipecat-ai/pipecat.
## Setup
1. Read the skill instructions at `.claude/skills/update-docs/SKILL.md`
2. Read the source-to-doc mapping at `.claude/skills/update-docs/SOURCE_DOC_MAPPING.md`
3. The docs repository is checked out at `./_docs/`
## Get the diff
Run `gh pr diff ${{ steps.pr.outputs.number }}` to see what changed in the PR.
Also run `gh pr diff ${{ steps.pr.outputs.number }} --name-only` to get the list of changed files.
Filter to source files matching the directories listed in SKILL.md Step 3.
If no relevant source files were changed, exit with "No documentation changes needed."
## Follow the skill instructions
Apply the SKILL.md workflow (Steps 3-9) with these adaptations for automation:
### Docs path
Use `./_docs/` — it's already checked out. Do not ask for a path.
### Branch management
- Branch name: `docs/pr-${{ steps.pr.outputs.number }}`
- Work inside `./_docs/` for all doc edits and git operations
- Check if the branch already exists on the remote:
```bash
cd _docs && git fetch origin docs/pr-${{ steps.pr.outputs.number }} 2>/dev/null
```
- If it exists: check it out (supports workflow re-runs)
- If not: create it from main
### Git config
Before committing in `_docs`, set:
```bash
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
```
### No interactive questions
Do not ask questions. If you encounter gaps (unmapped files, missing sections,
ambiguous changes), note them in the PR body under "## Gaps identified".
### Creating the docs PR
After committing all changes in `_docs`, push and create a PR:
```bash
cd _docs
git push -u origin docs/pr-${{ steps.pr.outputs.number }}
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
--repo pipecat-ai/docs \
--label auto-docs \
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
--body "$(cat <<'BODY'
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).
## Changes
<summarize each doc page updated and what changed>
## Gaps identified
<any unmapped files, missing doc pages, or missing sections — or "None">
BODY
)"
```
### Re-run handling
If `gh pr create` fails because a PR from that branch already exists,
push the updated commits and use `gh pr edit` to update the body instead.
### No-op
If after analyzing the diff you determine no documentation changes are needed
(e.g., only skip-listed files changed, or changes don't affect public API docs),
exit cleanly without creating a branch or PR. Output "No documentation changes needed."
## Important rules
- Only modify files inside `./_docs/` — never modify pipecat source code
- Follow the conservative editing rules from SKILL.md Step 6
- Read each doc page fully before editing (SKILL.md Guidelines)
- Use `GH_TOKEN=$DOCS_SYNC_TOKEN` for all `gh` commands targeting pipecat-ai/docs
claude_args: |
--model claude-sonnet-4-5-20250929
--max-turns 30
--allowedTools "Read,Write,Edit,Glob,Grep,Bash"

File diff suppressed because it is too large Load Diff

View File

@@ -25,7 +25,7 @@ uv run pytest tests/test_name.py
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
@@ -74,7 +74,7 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
@@ -90,23 +90,26 @@ All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
### Key Directories
| Directory | Purpose |
|---------------------------|----------------------------------------------------|
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/`| Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
@@ -152,4 +155,3 @@ When adding a new service:
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

View File

@@ -25,7 +25,6 @@ Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
- Usage instructions with Pipecat Pipeline
@@ -110,7 +109,6 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
@@ -233,24 +231,137 @@ def can_generate_metrics(self) -> bool:
return True
```
### Dynamic Settings Updates
### Service Settings
STT, LLM, and TTS services support `ServiceUpdateSettingsFrame` for dynamic configuration changes. The base STTService has an `_update_settings()` method that handles settings, and the private `_settings` `Dict` is used to store settings and provide access to the subclass.
Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
#### Defining your Settings class
Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:
```python
async def set_language(self, language: Language):
"""Set the recognition language and reconnect.
from dataclasses import dataclass, field
Args:
language: The language to use for speech recognition.
from pipecat.services.settings import TTSSettings, NOT_GIVEN
@dataclass
class MyTTSSettings(TTSSettings):
"""Settings for MyTTS service.
Parameters:
speaking_rate: Speed multiplier (0.52.0).
"""
logger.info(f"Switching STT language to: [{language}]")
self._settings["language"] = language
await self._disconnect()
await self._connect()
speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
```
Note that, in this example, Deepgram requires the websocket connection be disconnected and reconnected to reinitialize the service with the new value. Consider if your service requires reconnection.
**What goes in Settings vs. `__init__` params:**
| Belongs in Settings | Stays as `__init__` params |
| -------------------------------------------------------- | ----------------------------------------- |
| Model name, voice, language | API keys, auth tokens |
| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides |
| Anything users may want to change mid-session | Audio encoding, sample format |
| | Connection parameters (timeouts, retries) |
The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
#### Wiring settings into `__init__`
Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
```python
from typing import Optional
class MyTTSService(TTSService):
Settings = MyTTSSettings
_settings: Settings
def __init__(
self,
*,
api_key: str,
settings: Optional[Settings] = None,
**kwargs,
):
# 1. Defaults — every field has a real value (store mode).
default_settings = self.Settings(
model="my-model-v1",
voice="default-voice",
language="en",
speaking_rate=1.0,
)
# 2. Merge caller overrides (only given fields win).
if settings is not None:
default_settings.apply_update(settings)
# 3. Pass the fully-populated settings to the base class.
super().__init__(settings=default_settings, **kwargs)
# 4. Init-only config stored separately.
self._api_key = api_key
```
This pattern lets callers override only what they care about:
```python
# Uses all defaults
svc = MyTTSService(api_key="sk-xxx")
# Overrides just the voice — access Settings through the service class
svc = MyTTSService(
api_key="sk-xxx",
settings=MyTTSService.Settings(voice="custom-voice"),
)
```
#### Reacting to runtime changes
AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
```python
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the connection if needed."""
changed = await super()._update_settings(update)
if not changed:
return changed
await self._disconnect()
await self._connect()
return changed
```
The dict keys work like a set for membership tests (`"language" in changed`) and truthiness (`if changed`). Use `changed.keys() - {"language"}` for set difference, or `changed["language"]` to inspect the previous value of a field.
Note that, in this example, the service requires a reconnect to apply the new language. Consider, for each setting, whether your service requires reconnection or can apply changes in-place.
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
```python
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
changed = await super()._update_settings(update)
if not changed:
return changed
if "language" in changed:
await self._update_language()
else:
# TODO: this should be temporary - handle changes to other settings soon!
self._warn_unhandled_updated_settings(changed.keys() - {"language"})
return changed
```
### Sample Rate Handling
@@ -260,7 +371,7 @@ Sample rates are set via PipelineParams and passed to each frame processor at in
async def start(self, frame: StartFrame):
"""Start the service."""
await super().start(frame)
self._settings["output_format"]["sample_rate"] = self.sample_rate
self._settings.output_sample_rate = self.sample_rate
await self._connect()
```

View File

@@ -49,12 +49,12 @@ Every pull request that makes a user-facing change should include a changelog en
```
2. Choose the appropriate type:
- `added.md` - New features
- `changed.md` - Changes in existing functionality
- `deprecated.md` - Soon-to-be removed features
- `removed.md` - Removed features
- `fixed.md` - Bug fixes
- `performance.md` - Performance improvements
- `security.md` - Security fixes
- `other.md` - Other changes (documentation, dependencies, etc.)
@@ -80,7 +80,6 @@ Every pull request that makes a user-facing change should include a changelog en
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```
@@ -105,7 +104,6 @@ changelog/1234.changed.2.md
```markdown
- Updated service configuration:
- Changed default timeout to 30 seconds
- Added retry logic for failed connections
```

View File

@@ -55,6 +55,20 @@ Looking for help debugging your pipeline and processors? Check out [Whisker](htt
Love terminal applications? Check out [Tail](https://github.com/pipecat-ai/tail), a terminal dashboard for Pipecat.
### 🤖 Claude Code Skills
Use [Pipecat Skills](https://github.com/pipecat-ai/skills) with [Claude Code](https://claude.ai/code) to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:
```
claude plugin marketplace add pipecat-ai/skills
```
and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -71,19 +85,20 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
@@ -163,6 +178,15 @@ You can get started with Pipecat running on your local machine, then move your a
> **Note**: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.
### Claude Code Skills
Install development workflow skills for contributing to Pipecat with [Claude Code](https://claude.ai/code):
```
claude plugin marketplace add pipecat-ai/pipecat
claude plugin install pipecat-dev@pipecat-dev-skills
```
### Running tests
To run all tests, from the root directory:

View File

@@ -1 +0,0 @@
- Fixed race condition where `RTVIObserver` could send messages before `DailyTransport` join completed. Outbound messages are now queued & delivered after the transport is ready.

View File

@@ -1 +0,0 @@
- Added `"timestampTransportStrategy": "ASYNC"` to `InworldAITTSService`. This allows timestamps info to trail audio chunks arrival, resulting in much better first audio chunk latency

View File

@@ -1 +0,0 @@
- Added model-specific `InputParams` to `RimeTTSService`: arcana params (`repetition_penalty`, `temperature`, `top_p`) and mistv2 params (`no_text_normalization`, `save_oovs`, `segment`). Model, voice, and param changes now trigger WebSocket reconnection.

View File

@@ -1 +0,0 @@
- ⚠️ `RimeTTSService` now defaults to `model="arcana"` and the `wss://users-ws.rime.ai/ws3` endpoint. `InputParams` defaults changed from mistv2-specific values to `None` — only explicitly-set params are sent as query params.

View File

@@ -1,3 +0,0 @@
- `AICFilter` now shares read-only AIC models via a singleton `AICModelManager` in `aic_filter.py`.
- Multiple filters using the same model path or `(model_id, model_download_dir)` share one loaded model, with reference counting and concurrent load deduplication.
- Model file I/O runs off the event loop so the filter does not block.

View File

@@ -1 +0,0 @@
- Fixed async generator cleanup in OpenAI LLM streaming to prevent `AttributeError` with uvloop on Python 3.12+ (MagicStack/uvloop#699).

View File

@@ -1 +0,0 @@
- Added `X-User-Agent` and `X-Request-Id` headers to `InworldTTSService` for better traceability.

View File

@@ -1 +0,0 @@
- Fixed `SmallWebRTCTransport` input audio resampling to properly handle all sample rates, including 8kHz audio.

View File

@@ -1 +0,0 @@
- Fixed a race condition in `RTVIObserver` where bot output messages could be sent before the bot-started-speaking event.

View File

@@ -1 +0,0 @@
- Added `write_transport_frame()` hook to `BaseOutputTransport` allowing transport subclasses to handle custom frame types that flow through the audio queue.

View File

@@ -1 +0,0 @@
- Added `DailySIPTransferFrame` and `DailySIPReferFrame` to the Daily transport. These frames queue SIP transfer and SIP REFER operations with audio, so the operation executes only after the bot finishes its current utterance.

View File

@@ -1 +0,0 @@
- `DailyUpdateRemoteParticipantsFrame` is no longer deprecated and is now queued with audio like other transport frames.

View File

@@ -1 +0,0 @@
- Fixed Grok Realtime `session.updated` event parsing failure caused by the API returning prefixed voice names (e.g. `"human_Ara"` instead of `"Ara"`).

View File

@@ -1 +0,0 @@
- Bumped Pillow dependency upper bound from `<12` to `<13` to allow Pillow 12.x.

View File

@@ -1 +0,0 @@
- Fixed context ID reuse issue in `ElevenLabsTTSService`, `InworldTTSService`, `RimeTTSService`, `CartesiaTTSService`, `AsyncAITTSService`, and `PlayHTTTSService`. Services now properly reuse the same context ID across multiple `run_tts()` invocations within a single LLM turn, preventing context tracking issues and incorrect lifecycle signaling.

View File

@@ -1 +0,0 @@
- Fixed word timestamp interleaving issue in `ElevenLabsTTSService` when processing multiple sentences within a single LLM turn.

View File

@@ -1 +0,0 @@
- Added keepalive support to `SarvamSTTService` to prevent idle connection timeouts (e.g. when used behind a `ServiceSwitcher`).

View File

@@ -1 +0,0 @@
- Moved STT keepalive mechanism from `WebsocketSTTService` to the `STTService` base class, allowing any STT service (not just websocket-based ones) to use idle-connection keepalive via the `keepalive_timeout` and `keepalive_interval` parameters.

View File

@@ -1,3 +0,0 @@
- Improved audio context management in `AudioContextTTSService` by moving context ID tracking to the base class and adding `reuse_context_id_within_turn` parameter to control concurrent TTS request handling.
- Added helper methods: `has_active_audio_context()`, `get_active_audio_context_id()`, `remove_active_audio_context()`, `reset_active_audio_context()`
- Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS implementations by removing duplicate context management code

View File

@@ -1 +0,0 @@
- Deprecated unused `Traceable`, `@traceable`, `@traced`, and `AttachmentStrategy` in `pipecat.utils.tracing.class_decorators`. This module will be removed in a future release.

View File

@@ -1 +0,0 @@
- Fixed tracing service decorators executing the wrapped function twice when the function itself raised an exception (e.g., LLM rate limit, TTS timeout).

View File

@@ -1 +0,0 @@
- Fixed `LLMUserAggregator` broadcasting mute events before `StartFrame` reaches downstream processors.

View File

@@ -1 +0,0 @@
- Fixed `UserIdleController` false idle triggers caused by gaps between user and bot activity frames. The idle timer now starts only after `BotStoppedSpeakingFrame` and is suppressed during active user turns and function calls.

View File

@@ -1 +0,0 @@
- Added `UserIdleTimeoutUpdateFrame` to enable or disable user idle detection at runtime by updating the timeout dynamically.

View File

@@ -1 +0,0 @@
- `UserIdleController` is now always created with a default timeout of 0 (disabled). The `user_idle_timeout` parameter changed from `Optional[float] = None` to `float = 0` in `UserTurnProcessor`, `LLMUserAggregatorParams`, and `UserIdleController`.

View File

@@ -1 +0,0 @@
- Switched `GradiumTTSService` from `InterruptibleWordTTSService` to `AudioContextWordTTSService`, eliminating websocket disconnect/reconnect on every interruption by using `client_req_id`-based multiplexing.

View File

@@ -1 +0,0 @@
- Change the version specifier from `>=0.2.8` to `~=0.2.8` for the `speechmatics-voice` package to ensure compatibility with future patch versions.

View File

@@ -1 +0,0 @@
- Updated `InworldTTSService` and `InworldHttpTTSService` to use `ASYNC` timestamp transport strategy by default

View File

@@ -1 +0,0 @@
- Fixed incorrect `sample_rate` assignment in `TavusInputTransport._on_participant_audio_data` (was using `audio.audio_frames` instead of `audio.sample_rate`).

View File

@@ -1 +0,0 @@
- Added `broadcast_sibling_id` field to the base `Frame` class. This field is automatically set by `broadcast_frame()` and `broadcast_frame_instance()` to the ID of the paired frame pushed in the opposite direction, allowing receivers to identify broadcast pairs.

View File

@@ -1 +0,0 @@
- Fixed `RTVIObserver` not processing upstream-only frames. Previously, all upstream frames were filtered out to avoid duplicate messages from broadcasted frames. Now only upstream copies of broadcasted frames are skipped.

View File

@@ -1 +0,0 @@
- Added `start_time` and `end_time` parameters to `start_ttfb_metrics()`, `stop_ttfb_metrics()`, `start_processing_metrics()`, and `stop_processing_metrics()` in `FrameProcessor` and `FrameProcessorMetrics`, allowing custom timestamps for metrics measurement. `STTService` now uses these instead of custom TTFB tracking.

View File

@@ -1 +0,0 @@
- Added `ignored_sources` parameter to `RTVIObserverParams` and `add_ignored_source()`/`remove_ignored_source()` methods to `RTVIObserver` to suppress RTVI messages from specific pipeline processors (e.g. a silent evaluation LLM).

View File

@@ -1 +0,0 @@
- Fixed mutable default arguments in `LLMContextAggregatorPair.__init__()` that could cause shared state across instances.

View File

@@ -1 +0,0 @@
- Fixed `DeepgramSageMakerSTTService` to properly track finalize lifecycle using `request_finalize()` / `confirm_finalize()` and use `is_final` (instead of `is_final and speech_final`) for final transcription detection, matching `DeepgramSTTService` behavior.

View File

@@ -1 +0,0 @@
- Added `DeepgramSageMakerTTSService` for running Deepgram TTS models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling, and per-turn TTFB metrics.

View File

@@ -1 +0,0 @@
- Fixed a race condition in `AudioContextTTSService` where the audio context could time out between consecutive TTS requests within the same turn, causing audio to be discarded.

View File

@@ -42,7 +42,7 @@ This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva and PlayHT)
- Handles conflicting dependencies (like grpcio versions for Riva)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
@@ -74,7 +74,6 @@ start _build/html/index.html
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── requirements-playht.txt # PlayHT-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
```

View File

@@ -86,9 +86,6 @@ GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Hathora
HATHORA_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
HEYGEN_LIVE_AVATAR_API_KEY=...
@@ -104,9 +101,14 @@ INWORLD_API_KEY=...
KRISP_MODEL_PATH=...
# Krisp Viva
KRISP_VIVA_API_KEY=...
KRISP_VIVA_FILTER_MODEL_PATH=...
KRISP_VIVA_TURN_MODEL_PATH=...
# LemonSlice
LEMONSLICE_API_KEY=...
LEMONSLICE_AGENT_ID=...
# LiveKit
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
@@ -146,10 +148,6 @@ KOALA_ACCESS_KEY=...
# Piper
PIPER_BASE_URL=...
# PlayHT
PLAYHT_USER_ID=...
PLAYHT_API_KEY=...
# Plivo
PLIVO_AUTH_ID=...
PLIVO_AUTH_TOKEN=...

View File

@@ -39,7 +39,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
base_url=os.getenv("PIPER_BASE_URL"),
aiohttp_session=session,
sample_rate=24000,
)
task = PipelineTask(

View File

@@ -39,8 +39,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
aiohttp_session=session,
settings=RimeHttpTTSService.Settings(
voice="rex",
),
)
task = PipelineTask(

View File

@@ -37,7 +37,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
task = PipelineTask(

View File

@@ -29,7 +29,9 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
pipeline = Pipeline([tts, transport.output()])

View File

@@ -37,7 +37,9 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
runner = PipelineRunner()

View File

@@ -39,17 +39,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
messages = [
{
"role": "system",
"content": "You are an LLM in a WebRTC session, and this is a 'hello world' demo. Say hello to the world.",
}
]
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
task = PipelineTask(
Pipeline([llm, tts, transport.output()]),
@@ -59,7 +59,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Register an event handler so we can play the audio when the client joins
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
await task.queue_frames([LLMContextFrame(LLMContext(messages)), EndFrame()])
context = LLMContext()
context.add_message({"role": "user", "content": "Say hello to the world."})
await task.queue_frames([LLMContextFrame(context), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

View File

@@ -45,7 +45,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -37,7 +37,9 @@ async def main():
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -67,19 +67,19 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +109,7 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -50,19 +50,19 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o")
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -91,7 +91,9 @@ async def main():
async def on_first_participant_joined(transport, participant):
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "user", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_participant_left")

View File

@@ -55,24 +55,21 @@ async def main():
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -16,11 +16,12 @@ from pipecat.frames.frames import (
Frame,
LLMContextFrame,
LLMFullResponseStartFrame,
OutputImageRawFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.sync_parallel_pipeline import FrameOrder, SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
@@ -30,6 +31,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -44,6 +46,18 @@ class MonthFrame(DataFrame):
return f"{self.name}(month: {self.month})"
class MarkImageForPlaybackSync(FrameProcessor):
"""Marks output image frames to be synchronized with audio playback."""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, OutputImageRawFrame):
frame.sync_with_audio = True
await self.push_frame(frame, direction)
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
@@ -98,11 +112,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
# No need to aggregate by sentences (the default), as we already know we're getting full sentences
# (Otherwise the service will unnecessarily wait for follow-up input to confirm the sentence is complete,
# which, sadly, actually breaks the synchronization mechanism)
text_aggregation_mode=TextAggregationMode.TOKEN,
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -115,17 +137,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# that, each pipeline runs concurrently and `SyncParallelPipeline` will
# wait for the input frame to be processed.
#
# We use `FrameOrder.PIPELINE` so that each synchronized batch of output
# frames is pushed in the order the pipelines are listed: image first,
# then audio. This ensures the transport receives the image before the
# audio frames it should accompany.
#
# Note that `SyncParallelPipeline` requires the last processor in each
# of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# `FalImageGenService` and `CartesiaHttpTTSService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
SyncParallelPipeline( # Run pipelines in parallel aggregating the result
[
imagegen, # Generate image
MarkImageForPlaybackSync(), # Mark image as needing sync w/audio during playback
],
[month_prepender, tts], # Create "Month: sentence" and output audio
[imagegen], # Generate image
frame_order=FrameOrder.PIPELINE,
),
transport.output(), # Transport output
]
@@ -148,7 +179,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]:
messages = [
{
"role": "system",
"role": "user",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]

View File

@@ -1,198 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
Frame,
LLMContextFrame,
OutputAudioRawFrame,
TextFrame,
TTSAudioRawFrame,
URLImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
self.audio.extend(frame.audio)
self.frame = OutputAudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels
)
await self.push_frame(frame, direction)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame
await self.push_frame(frame, direction)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
sentence_aggregator = SentenceAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
# With `SyncParallelPipeline` we synchronize audio and images by
# pushing them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2
# I3 A3). To do that, each pipeline runs concurrently and
# `SyncParallelPipeline` will wait for the input frame to be
# processed.
#
# Note that `SyncParallelPipeline` requires the last processor in
# each of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
description, # Store sentence
SyncParallelPipeline(
[tts, audio_grabber], # Generate and store audio for the given sentence
[imagegen, image_grabber], # Generate and storeimage for the given sentence
),
]
)
task = PipelineTask(pipeline)
await task.queue_frame(LLMContextFrame(LLMContext(messages)))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TkTransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify a few months as we create tasks all at once and we
# might get rate limited otherwise.
months: list[str] = [
"January",
"February",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -83,21 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
ml = MetricsLogger()
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,7 +129,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -100,19 +100,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),

View File

@@ -6,6 +6,7 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
@@ -52,64 +53,68 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
async with aiohttp.ClientSession() as session:
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
aiohttp_session=session,
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "user", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -4,7 +4,6 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
@@ -22,9 +21,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.playht.tts import PlayHTHttpTTSService
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -54,22 +53,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = PlayHTHttpTTSService(
user_id=os.getenv("PLAYHT_USER_ID"),
api_key=os.getenv("PLAYHT_API_KEY"),
voice_url="s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json",
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,19 +55,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +98,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -21,7 +21,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -93,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
# focus_speakers=["S1"],
@@ -104,32 +103,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
"Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
),
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -160,7 +148,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
context.add_message({"role": "user", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -76,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
@@ -84,31 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
),
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -139,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
context.add_message({"role": "user", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -71,15 +71,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
"You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),

View File

@@ -10,6 +10,7 @@ import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -55,24 +56,32 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramFluxSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
params=DeepgramFluxSTTService.InputParams(min_confidence=0.3),
settings=DeepgramFluxSTTService.Settings(
min_confidence=0.3,
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
@@ -100,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,20 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramHttpTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-2-andromeda-en",
settings=DeepgramHttpTTSService.Settings(
voice="aura-2-andromeda-en",
),
aiohttp_session=session,
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "user", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.deepgram.stt_sagemaker import DeepgramSageMakerSTTService
from pipecat.services.deepgram.tts_sagemaker import DeepgramSageMakerTTSService
from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -69,23 +69,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
voice="aura-2-andromeda-en",
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
model="us.amazon.nova-pro-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
settings=AWSBedrockLLMSettings(
model="us.amazon.nova-pro-v1:0",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -116,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -7,7 +7,6 @@
import os
from deepgram import LiveOptions
from dotenv import load_dotenv
from loguru import logger
@@ -56,21 +55,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
settings=DeepgramSTTService.Settings(
vad_events=True,
utterance_end_ms="1000",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -101,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,18 +55,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -97,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -63,20 +63,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
aiohttp_session=session,
settings=ElevenLabsHttpTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +107,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "user", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -57,19 +57,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
settings=ElevenLabsTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -100,7 +100,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -65,17 +65,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -65,17 +65,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -106,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -11,7 +11,6 @@ from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -55,22 +54,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
messages = [
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +106,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,27 +55,28 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
# Uses local VAD by default.
# To enable server-side VAD, set turn_detection=None or
# a dict with server_vad settings.
# turn_detection={"type": "server_vad", "threshold": 0.5},
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
messages = [
{
"role": "system",
"content": "You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -107,7 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -57,7 +57,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
timestamp = int(time.time())
@@ -65,16 +67,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"},
settings=OpenPipeLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -105,7 +103,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,20 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = XTTSService(
aiohttp_session=session,
voice_id="Claribel Dervla",
settings=XTTSService.Settings(
voice="Claribel Dervla",
),
base_url="http://localhost:8000",
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -103,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "user", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.config import LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
@@ -58,7 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
params=GladiaInputParams(
settings=GladiaSTTService.Settings(
language_config=LanguageConfig(
languages=[Language.EN],
),
@@ -68,19 +68,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY", ""),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
@@ -114,7 +114,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -23,7 +23,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.gladia.config import GladiaInputParams, LanguageConfig
from pipecat.services.gladia.config import LanguageConfig
from pipecat.services.gladia.stt import GladiaSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transcriptions.language import Language
@@ -57,7 +57,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GladiaSTTService(
api_key=os.getenv("GLADIA_API_KEY", ""),
region=os.getenv("GLADIA_REGION"),
params=GladiaInputParams(
settings=GladiaSTTService.Settings(
language_config=LanguageConfig(
languages=[Language.EN],
)
@@ -66,19 +66,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY", ""),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY", ""))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY", ""),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": f"You are a helpful LLM. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -109,7 +109,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,18 +54,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = LmntTTSService(api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan")
tts = LmntTTSService(
api_key=os.getenv("LMNT_API_KEY"),
settings=LmntTTSService.Settings(
voice="morgan",
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -96,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,19 +55,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GroqSTTService(api_key=os.getenv("GROQ_API_KEY"))
llm = GroqLLMService(
api_key=os.getenv("GROQ_API_KEY"), model="meta-llama/llama-4-maverick-17b-128e-instruct"
api_key=os.getenv("GROQ_API_KEY"),
settings=GroqLLMService.Settings(
model="llama-3.1-8b-instant",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = GroqTTSService(api_key=os.getenv("GROQ_API_KEY"))
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -98,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -95,13 +95,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
settings=AWSPollyTTSService.Settings(
voice="Joanna",
engine="generative",
rate="1.1",
),
)
# Create Strands agent processor
try:
agent = build_agent(model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0", max_tokens=8000)
agent = build_agent(model_id="us.anthropic.claude-sonnet-4-6", max_tokens=8000)
llm = StrandsAgentsProcessor(agent=agent)
logger.info("Successfully created Strands agent for NAB customer service coaching")
except Exception as e:
@@ -149,7 +152,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
messages=[
{
"role": "user",
"content": f"Greet the user and introduce yourself.",
"content": f"Greet the user and introduce yourself. Don't use emojis.",
}
],
run_llm=True,

View File

@@ -54,24 +54,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = AWSPollyTTSService(
region="us-west-2", # only specific regions support generative TTS
voice_id="Joanna",
params=AWSPollyTTSService.InputParams(engine="generative", rate="1.1"),
settings=AWSPollyTTSService.Settings(
voice="Joanna",
engine="generative",
rate="1.1",
),
)
llm = AWSBedrockLLMService(
aws_region="us-west-2",
model="us.anthropic.claude-haiku-4-5-20251001-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
settings=AWSBedrockLLMService.Settings(
model="us.anthropic.claude-sonnet-4-6",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +101,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "user", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -70,30 +70,30 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
),
)
tts = GoogleTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash-image",
# model="gemini-3-pro-image-preview", # A more powerful model, but slower,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -124,7 +124,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation with a styled introduction
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,15 +54,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot with Gemini TTS")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
tts = GeminiTTSService(
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
model="gemini-2.5-flash-tts",
voice_id="Charon",
params=GeminiTTSService.InputParams(
settings=GeminiTTSService.Settings(
model="gemini-2.5-flash-tts",
voice="Charon",
language=Language.EN_US,
prompt="You are a helpful AI assistant. Speak in a natural, conversational tone.",
),
@@ -71,13 +73,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
)
# System message that instructs the AI on how to speak
messages = [
{
"role": "system",
"content": """You are a helpful AI assistant in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way.
settings=GoogleLLMService.Settings(
system_instruction="""You are a helpful assistant in a voice conversation.
IMPORTANT: You're using Gemini TTS which supports expressive markup tags. You can use these tags in your responses:
- [sigh] - Insert a sigh sound
@@ -94,11 +91,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
- "[whispering] Let me tell you a secret."
- "The answer is... [long pause] ...42!"
Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.""",
},
]
Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Keep responses concise. Respond to what the user said in a creative and helpful way.""",
),
)
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -129,9 +126,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation
messages.append(
context.add_message(
{
"role": "system",
"role": "user",
"content": "You are an AI assistant. You can help with a variety of tasks. Introduce yourself and ask the user what they would like to know.",
}
)

View File

@@ -54,34 +54,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
# Add model to use a specific model
# model="chirp_3",
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleHttpTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleHttpTTSService.InputParams(language=Language.EN_US),
settings=GoogleHttpTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,34 +54,34 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GoogleSTTService(
params=GoogleSTTService.InputParams(languages=Language.EN_US, model="chirp_3"),
settings=GoogleSTTService.Settings(
languages=[Language.EN_US],
# Add model to use a specific model
# model="chirp_3",
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
location="us",
)
tts = GoogleTTSService(
voice_id="en-US-Chirp3-HD-Charon",
params=GoogleTTSService.InputParams(language=Language.EN_US),
settings=GoogleTTSService.Settings(
voice="en-US-Chirp3-HD-Charon",
language=Language.EN_US,
),
credentials=os.getenv("GOOGLE_TEST_CREDENTIALS"),
)
llm = GoogleLLMService(
api_key=os.getenv("GOOGLE_API_KEY"),
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# params=GoogleLLMService.InputParams(
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096)
# ),
settings=GoogleLLMService.Settings(
model="gemini-2.5-flash",
# force a certain amount of thinking if you want it
# thinking=GoogleLLMService.ThinkingConfig(thinking_budget=4096),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -112,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -0,0 +1,178 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.assemblyai.stt import AssemblyAISTTService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"""AssemblyAI u3-rt-pro with Built-in Turn Detection
This example demonstrates using AssemblyAI's u3-rt-pro Speech-to-Text model
with AssemblyAI's built-in turn detection for more natural conversation flow.
Key features:
1. AssemblyAI Turn Detection
- Set `vad_force_turn_endpoint=False` to use AssemblyAI's built-in turn detection
- AssemblyAI's model determines when user starts/stops speaking
- Uses `ExternalUserTurnStrategies` to delegate turn control to AssemblyAI
- More natural turn detection based on speech patterns and pauses
2. Advanced Turn Detection Tuning
- `min_turn_silence`: Minimum silence (ms) when confident about end-of-turn.
Lower values = faster responses. Default: 100ms
- `max_turn_silence`: Maximum silence (ms) before forcing end-of-turn.
Prevents long pauses. Default: 1000ms
3. Prompt-Based Transcription Enhancement
- Use `prompt` parameter to improve accuracy for specific names/terms
- Particularly useful for proper nouns, technical terms, domain vocabulary
- Example: "Names: Xiomara, Saoirse, Krzystof. Technical terms: API, OAuth."
4. Speaker Diarization (Optional)
- Enable with `speaker_labels=True`
- Automatically identifies different speakers in multi-party conversations
- TranscriptionFrame includes speaker_id field (e.g., "Speaker A", "Speaker B")
5. Language Detection (Optional, multilingual model only)
- Enable with `language_detection=True`
- Automatically detects spoken language
- Available with universal-streaming-multilingual model
For more information: https://www.assemblyai.com/docs/speech-to-text/streaming
"""
logger.info(f"Starting bot")
stt = AssemblyAISTTService(
api_key=os.getenv("ASSEMBLYAI_API_KEY"),
vad_force_turn_endpoint=False, # Use AssemblyAI's built-in turn detection
settings=AssemblyAISTTService.Settings(
model="u3-rt-pro",
# Optional: Tune turn detection timing (defaults shown below)
# min_turn_silence=100, # Default
# max_turn_silence=1000, # Default
# Optional: Boost accuracy for specific names/terms
# keyterms_prompt=["Xiomara", "Saoirse", "Krzystof", "API", "OAuth"],
# Optional: Enable speaker diarization
# speaker_labels=True,
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -59,19 +59,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +102,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -31,6 +31,8 @@ from pipecat.audio.filters.krisp_viva_filter import KrispVivaFilter
from pipecat.audio.turn.krisp_viva_turn import KrispVivaTurn
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.metrics.metrics import TurnMetricsData
from pipecat.observers.loggers.metrics_log_observer import MetricsLogObserver
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -41,32 +43,37 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.deepgram.tts import DeepgramTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
krisp_viva_filter = KrispVivaFilter()
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=KrispVivaFilter(),
audio_in_filter=krisp_viva_filter,
),
}
@@ -76,18 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-helios-en")
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = [
{
"role": "system",
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
@@ -117,13 +127,14 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
observers=[MetricsLogObserver(include_metrics={TurnMetricsData})],
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

Some files were not shown because too many files have changed in this diff Show More