Compare commits

..

599 Commits

Author SHA1 Message Date
Aleix Conchillo Flaqué
644babe241 scripts(evals): run evals in parallel 2026-03-27 15:37:01 -07:00
kompfner
b33df03724 Merge pull request #4179 from pipecat-ai/pk/fix-gemini-live-vertex
Don't send history_config for Gemini Live Vertex (unsupported)
2026-03-27 17:34:29 -04:00
Paul Kompfner
28fbe1db08 Don't send history_config for Gemini Live Vertex (unsupported) 2026-03-27 17:30:47 -04:00
kompfner
9240e92d9f Merge pull request #4177 from pipecat-ai/pk/tweak-26i-for-gemini-3.1-flash-live-support
Tweak 26i example system instruction for Gemini 3.1 Flash Live compat…
2026-03-27 17:20:06 -04:00
Paul Kompfner
5caf53f086 Tweak 26i example system instruction for Gemini 3.1 Flash Live compatibility
Gemini 3.1 Flash Live won't reliably report ending its turn until
after it says something following a tool call. Restructure the system
instruction so the model says goodbye *after* calling
end_conversation, and add a comment explaining the deferred EndFrame
behavior that makes this work.
2026-03-27 17:13:17 -04:00
Mark Backman
ac2716811c Merge pull request #4176 from pipecat-ai/mb/fix-websocket-rtvi-messages
Fix RTVI events not delivered over WebSocket transports
2026-03-27 16:50:37 -04:00
Mark Backman
d313d56776 Fix RTVI events not delivered over WebSocket transports
The base serializer filters out RTVI protocol messages by default
(ignore_rtvi_messages=True) to prevent them from being sent over
telephony media streams. ProtobufFrameSerializer is used by WebSocket
transports, which are the delivery channel for these messages, so
disable the filter there.
2026-03-27 16:47:11 -04:00
kompfner
159776f106 Merge pull request #4175 from pipecat-ai/pk/gemini-live-dropped-support-for-text-modality
Warn when TEXT modality is set for Gemini Live, and remove 26d text example
2026-03-27 16:26:36 -04:00
kompfner
a23803478f Merge pull request #4171 from pipecat-ai/pk/fix-gemini-3.1-flash-live-video
Gate Gemini Live sending real-time input messages to the API until it…
2026-03-27 16:26:03 -04:00
Mark Backman
bae193ab4d Merge pull request #4172 from pipecat-ai/mb/rime-tts-fixes
Fix Rime TTS stop-frame handling and handle done message
2026-03-27 16:22:25 -04:00
Paul Kompfner
04adb697be Warn when TEXT modality is set for Gemini Live, and remove 26d text example
All recent Gemini Live models (including the default
gemini-2.5-flash-native-audio-preview-12-2025, and going at least as
far back as gemini-2.5-flash-native-audio-preview-09-2025) only
support AUDIO as a response modality. We considered using
`modalities=TEXT` as a Pipecat-level signal to suppress audio output
frames (so developers could pair Gemini Live with an external TTS),
but the output transcription from the API arrives too late relative
to the audio to be useful for driving an external TTS service.

For now, just log a warning when a TEXT modality is configured
(at init or via set_model_modalities) and proceed as normal. The 26d
text-modality example is removed since it no longer represents a
viable configuration.
2026-03-27 16:21:15 -04:00
Mark Backman
4f9c8a6860 Merge pull request #4174 from pipecat-ai/fix/deepgram-sdk-6.1.0-compat
Fix Deepgram STT compatibility with deepgram-sdk 6.1.0
2026-03-27 15:11:43 -04:00
Mark Backman
a1a29b3933 Add changelog for #4174 2026-03-27 14:50:12 -04:00
Mark Backman
0798803c70 Bump deepgram-sdk minimum version to 6.1.0 2026-03-27 14:46:17 -04:00
Mark Backman
6422661d08 Fix Deepgram STT compatibility with deepgram-sdk 6.1.0
The SDK now requires explicit message objects for send_keep_alive,
send_close_stream, and send_finalize instead of no-arg calls.
2026-03-27 14:40:48 -04:00
Mark Backman
ed94b65d83 Merge pull request #4173 from pipecat-ai/filipi/updating_inworld_examples
Removing the models from the Inworld example so we can use the default model.
2026-03-27 14:02:55 -04:00
filipi87
f9670b9601 Removing the models from the Inworld example so we can use the default model. 2026-03-27 14:23:20 -03:00
Paul Kompfner
5b2991f47f Gate Gemini Live sending real-time input messages to the API until it's ready, i.e. after we've sent the initial conversation history (or determined that we don't need to).
This fixes the 26c example when using Gemini 3.1 Flash Live, which seems to be more strict about not receiving real-time input (at least, video messages) before conversation history.
2026-03-27 12:41:05 -04:00
Mark Backman
fc3186dc0d Add changelog entries for PR #4172 2026-03-27 12:38:53 -04:00
Mark Backman
1808b447c9 Handle done message from Rime TTS to avoid stop-frame timeout
Rime's WebSocket API sends a done message when synthesis completes.
Handle it to stop TTFB metrics, push TTSStoppedFrame, and remove the
audio context immediately instead of relying on the 3-second
stop_frame_timeout_s fallback.
2026-03-27 12:37:03 -04:00
Mark Backman
70df9d3fe4 Fix duplicate TTSStoppedFrame in TTS service timeout path 2026-03-27 12:07:37 -04:00
Filipi da Silva Fuchter
a8bfc23d3a Merge pull request #4167 from pipecat-ai/filipi/inworld_improvements
InworldTTSService improvements.
2026-03-27 11:15:14 -04:00
filipi87
e2870fc2ac Changing to debug the log when we are not able to append audio to the context. 2026-03-27 12:12:16 -03:00
filipi87
e851f8c1d5 Adding changelog entry for the fix. 2026-03-27 12:11:35 -03:00
filipi87
b31bece617 Not trying to recreate the context. 2026-03-27 12:06:21 -03:00
kompfner
9e350bcc2f Merge pull request #4147 from pipecat-ai/cb/gemini-transcript-fixes
Fix Gemini Live to handle bundled server_content fields
2026-03-27 11:00:10 -04:00
Paul Kompfner
9c2594c484 Remove brittle test 2026-03-27 10:56:39 -04:00
Mark Backman
900fc88430 Merge pull request #4128 from pipecat-ai/mb/end-of-turn-assembly 2026-03-27 10:47:09 -04:00
filipi87
4ef5ac6f0c InworldTTSService improvements. 2026-03-27 11:33:32 -03:00
Mark Backman
cbb3d99493 Merge pull request #4166 from pipecat-ai/mb/fix-example-ordering-56
Fix example numbering, add LemonSlice to evals
2026-03-27 10:29:07 -04:00
Filipi da Silva Fuchter
fb1996cedc Merge pull request #4143 from pipecat-ai/cb/sagemaker-flux
Add Deepgram Flux STT service for AWS SageMaker
2026-03-27 10:27:49 -04:00
Filipi da Silva Fuchter
95c55ec6c3 Merge pull request #4145 from pipecat-ai/filipi/tts_improvements_remove_reset
TTS improvements.
2026-03-27 10:24:59 -04:00
Mark Backman
a45de9af7f Merge pull request #4161 from tanmayc25/fix/lemonslice-missing-dtmf-callback
fix(lemonslice): add missing on_dtmf_event callback in DailyCallbacks construction
2026-03-27 10:19:54 -04:00
Mark Backman
5e61a57582 Fix changelog entry for #4161 2026-03-27 10:16:25 -04:00
Mark Backman
d8b0ed18fd Fix example numbering, add LemonSlice to evals 2026-03-27 10:11:37 -04:00
Mark Backman
789275a57b Merge pull request #4164 from pipecat-ai/mb/update-community-integrations-guide
docs: update COMMUNITY_INTEGRATIONS.md for accuracy
2026-03-27 09:38:31 -04:00
Filipi da Silva Fuchter
38c961a363 Merge pull request #4113 from inworld-ai/ian/lang-timestamps
fix(inworld): fallback to full text when TTS timestamps are not received
2026-03-27 09:34:05 -04:00
Mark Backman
41a86a51bf docs: update COMMUNITY_INTEGRATIONS.md for accuracy
- Replace deprecated TTS classes (AudioContextWordTTSService, WordTTSService)
  with current hierarchy (WebsocketTTSService, InterruptibleTTSService, TTSService)
- Add WebsocketSTTService and SDK-based STTService categories
- Fix LLM section: document _process_context, adapter_class, remove deprecated
  create_context_aggregator guidance, add thought frames for reasoning models
- Fix Vision section: run_vision takes UserImageRawFrame not LLMContext,
  yields Vision*Frame types not TextFrame
- Fix push_error API: takes (error_msg, exception) not ErrorFrame
- Fix frame name: TTSRawAudioFrame → TTSAudioRawFrame
- Remove stale v13+ version reference
- Clarify @traced_stt method convention
2026-03-27 09:22:32 -04:00
Filipi da Silva Fuchter
e1bfa4cf21 Merge pull request #4152 from vpalmisano/vpalmisano-patch-1
Fix audio transcript check in base_llm.py
2026-03-27 08:34:15 -04:00
filipi87
537d57449e Fixing the format and including the changelog. 2026-03-27 09:29:46 -03:00
Tanmay Chaudhari
33e146decd fix(lemonslice): add missing on_dtmf_event callback in DailyCallbacks construction
DailyCallbacks gained a required on_dtmf_event field in PR #4047.
PR #4079 fixed this for TavusTransportClient but
LemonSliceTransportClient.setup() was not updated, causing a pydantic
ValidationError at pipeline setup time.
2026-03-27 12:06:26 +05:30
Mark Backman
eee47deb34 Merge pull request #4060 from alpsencer/fix/empty-tool-call-arguments
fix(openai): handle tool calls with empty/null arguments
2026-03-26 22:04:37 -04:00
Mark Backman
21a729ae5d Merge pull request #4146 from pipecat-ai/mb/gemini-live-local-vad 2026-03-26 17:48:21 -04:00
Filipi da Silva Fuchter
1870f4010e Merge pull request #4158 from pipecat-ai/filipi/flux_refactor
Creating a base class, DeepgramFluxSTTBase, to reuse Deepgram Flux logic
2026-03-26 17:33:35 -04:00
filipi87
28683a7296 Moving flux_stt.py to deepgram/flux/sagemaker/stt.py 2026-03-26 17:43:51 -03:00
filipi87
0e504d876d Creating a base class DeepgramFluxSTTBase so we can reuse Deepgram Flux logic. 2026-03-26 17:37:37 -03:00
Mark Backman
5c51981207 Merge pull request #4149 from pipecat-ai/mb/fix-service-switcher-passthrough-errors
Fix ServiceSwitcher reacting to pass-through ErrorFrames
2026-03-26 16:34:45 -04:00
Mark Backman
a13c4d1248 Narrow ServiceSwitcher error check to active service only
Only trigger handle_error for ErrorFrames originating from the active
service, not any managed service. This prevents edge cases where errors
from a non-active service could incorrectly trigger failover.
2026-03-26 15:28:19 -04:00
filipi87
ca1b4ad124 Organizing the methods from Deepgram Flux and Flux SageMaker in the same position. 2026-03-26 16:05:17 -03:00
Mark Backman
533dcdba3f Merge pull request #4154 from pipecat-ai/mb/deprecate-sambanova-stt
Remove SambaNovaSTTService
2026-03-26 14:10:14 -04:00
Mark Backman
7eec03cb77 Merge pull request #4156 from pipecat-ai/mb/mem0-improvements
fix(mem0): improve Mem0 service reliability and add get_memories() method
2026-03-26 14:09:34 -04:00
Mark Backman
83911dced6 docs: add changelog entries for #4156 2026-03-26 13:30:00 -04:00
Mark Backman
4e4a8c45d5 build(mem0): bump mem0ai dependency to >=1.0.8,<2 2026-03-26 13:28:41 -04:00
Mark Backman
9c6d51c570 feat(mem0): add get_memories() convenience method to Mem0MemoryService
Expose a public method for retrieving all stored memories outside the
pipeline, avoiding the need for callers to reimplement client branching,
OR filter construction, and asyncio.to_thread wrapping. Simplify the
example get_initial_greeting() to use it.
2026-03-26 13:28:41 -04:00
Mark Backman
9152d85824 fix(mem0): filter to user/assistant roles before storing in Mem0
Mem0 API only accepts user and assistant roles. Filter out system,
developer, and other roles before calling add() to avoid 400 errors.
2026-03-26 13:28:41 -04:00
Mark Backman
6a87d0e87d fix(mem0): make memory service non-blocking and use position parameter
Move blocking Mem0 API calls off the event loop using asyncio.to_thread().
Store messages as a fire-and-forget background task via create_task() since
the result is not needed. Insert memory messages at the configured position
in the context instead of always appending.

Closes #1741
2026-03-26 13:28:41 -04:00
Mark Backman
fe0633ecd1 Add 14s to release evals 2026-03-26 12:27:27 -04:00
Mark Backman
ca2bfd6f12 Remove SambaNovaSTTService
SambaNova no longer offers speech-to-text audio models.
2026-03-26 12:22:06 -04:00
kompfner
345ccc0abe Merge pull request #4148 from pipecat-ai/khk/gemini-transcription-fixes-addon
Fix bundled Gemini Live transcription ordering
2026-03-26 11:33:10 -04:00
namanbansal013
800fd6a916 Changelog entry for the websocket word context leak. 2026-03-26 11:52:34 -03:00
filipi87
d286991257 Changelog entry for the changes involving add_word_timestamp. 2026-03-26 11:51:31 -03:00
namanbansal013
a06bf47ed2 Discard any pre-audio word timestamps from the interrupted turn. 2026-03-26 11:42:24 -03:00
Mark Backman
5ad4aa9bea Merge pull request #4153 from pipecat-ai/mb/deepgram-stt-try-except
Handle Deepgram SDK 6.x send_media() exceptions
2026-03-26 10:15:21 -04:00
filipi87
c4466ba678 Adding changelog for the InterruptibleTTSService race condition fix 2026-03-26 10:58:57 -03:00
filipi87
df602b900d Preventing a race condition in the InterruptibleTTSServices in cases where run_tts has been invoked but the BotStartedSpeakingFrame has not yet been received. 2026-03-26 10:39:44 -03:00
Mark Backman
c331c75d66 Add tests for send_media() exception handling in DeepgramSTTService 2026-03-26 09:20:58 -04:00
filipi87
f7ec6befe1 Invoking superclass method when audio context is interrupted or completed. 2026-03-26 10:14:21 -03:00
Mark Backman
6a6ee8d563 Merge pull request #4150 from pipecat-ai/mb/resolve-dependabot-2026-03-25
Bump nltk minimum version to 3.9.4 to resolve CVE-2026-33230
2026-03-26 09:10:47 -04:00
Mark Backman
259f5e124c Add changelog for #4153 2026-03-26 08:48:45 -04:00
Mark Backman
cfe91d11ec Handle Deepgram SDK 6.x send_media() exceptions
Deepgram SDK 6.x surfaces connection errors from send_media() instead
of silently swallowing them. This causes error floods when the WebSocket
disconnects since every queued audio frame hits the dead connection.

Wrap send_media() in try/except: on failure, log one warning and set
self._connection = None so subsequent frames skip until the existing
_connection_handler reconnects.
2026-03-26 08:45:42 -04:00
Vittorio Palmisano
467184e63e Fix audio transcript check in base_llm.py
In some cases the openai provider could answer with a `chunk.choices[0].delta.audio = None`, so the process context fails with error:
```
pipecat/services/openai/base_llm.py:552): Error during completion: 'NoneType' object has no attribute 'get'
```
2026-03-26 13:09:36 +01:00
Mark Backman
af566ac936 Merge pull request #4151 from ajmeraharsh/fix/livekit-clear-audio-queue-on-interruption
fix(livekit): clear AudioSource buffer on interruption
2026-03-26 00:52:26 -04:00
ajmeraharsh
62484a4fc3 fix(livekit): clear AudioSource buffer on interruption
When an InterruptionFrame arrives, the Python-side audio task is
cancelled but frames already submitted to rtc.AudioSource continue
playing from its internal buffer. This causes the bot to keep speaking
for several seconds after being interrupted.

Fix by overriding process_frame in LiveKitOutputTransport to call
audio_source.clear_queue() on InterruptionFrame, immediately flushing
the buffered audio.
2026-03-26 09:47:00 +05:30
Mark Backman
7fef3b01eb Merge pull request #4142 from pipecat-ai/mb/grok-move-to-xai-module
Consolidate Grok services into xai module
2026-03-25 23:32:18 -04:00
Mark Backman
6d1918f12a Update GROK_API_KEY to XAI_API_KEY 2026-03-25 23:23:58 -04:00
Mark Backman
e58740e948 Bump nltk minimum version to 3.9.4 to resolve CVE-2026-33230 2026-03-25 23:16:46 -04:00
Mark Backman
ddfe44940d Add changelog for #4149 2026-03-25 22:54:25 -04:00
Mark Backman
fdbdbc8be3 Fix ServiceSwitcher reacting to pass-through ErrorFrames from other pipeline stages
ErrorFrames propagating upstream from downstream processors (e.g. TTS) would
enter the ServiceSwitcher via process_frame, traverse the active service sub-pipeline,
and reach push_frame where they incorrectly triggered failover. Now only errors whose
processor is one of the managed services trigger handle_error. Also fix the log in
handle_error to attribute errors to the actual source processor rather than the
current active_service.

Closes #4139
2026-03-25 22:53:04 -04:00
Kwindla Hultman Kramer
3cd7d882fb Fix bundled Gemini Live transcription ordering 2026-03-25 18:56:00 -07:00
Chad Bailey
2d78533d77 Add changelog for Gemini Live server_content fix 2026-03-25 23:42:42 +00:00
Chad Bailey
c1dd44f947 Fix Gemini Live message handling to process all server_content fields
Gemini 3.x can bundle multiple fields (e.g. model_turn and
output_transcription) on the same server_content message. The previous
elif chain would only process the first matching field and silently
drop the rest. Switch to independent if checks so every field is
handled.
2026-03-25 23:42:07 +00:00
Mark Backman
9db15e7942 Add changelog entry for #4146 2026-03-25 18:07:05 -04:00
Mark Backman
503e5e9106 Fix Gemini Live local VAD by sending correct activity events to server
When Gemini Live was configured with local VAD (server-side VAD disabled),
the service was listening for the wrong frame types and not sending
ActivityStart/ActivityEnd events to the server. Now it listens for
VADUserStartedSpeakingFrame/VADUserStoppedSpeakingFrame and sends the
appropriate activity signals when local VAD is in use.

Also removes the unnecessary local SileroVADAnalyzer from server-side VAD
examples and adds a new 26a example demonstrating local VAD configuration.
2026-03-25 18:00:13 -04:00
filipi87
2ff4b3f4a3 Improving docstring based on the recent changes. 2026-03-25 17:52:05 -03:00
filipi87
b4096f9a11 Refactoring to remove "Reset" and "TTSStoppedFrame" from word. 2026-03-25 17:47:24 -03:00
filipi87
c4253a7d98 Refactoring to invoke append_to_audio_context instead of direct queue put. 2026-03-25 17:21:55 -03:00
Filipi da Silva Fuchter
2441c4f801 Merge pull request #4135 from pipecat-ai/filipi/audio_buffer
Fixed audio crackling and popping artifacts in AudioBufferProcessor
2026-03-25 15:40:17 -04:00
Mark Backman
a7a55dd30e Merge pull request #4136 from pipecat-ai/mb/bump-package-version-nvidia
Upgrade protobuf to 6.x for nvidia-riva-client compatibility
2026-03-25 15:27:48 -04:00
Mark Backman
de6a7223ba Suppress verbose gRPC C-core logging in nvidia services
Set GRPC_VERBOSITY=ERROR by default so users do not see noisy fork
handler and abseil warnings from the gRPC C library. Users can still
override by setting GRPC_VERBOSITY themselves.
2026-03-25 15:23:54 -04:00
Mark Backman
165932e1cc Add changelog for #4136 2026-03-25 15:23:54 -04:00
Mark Backman
1f0d9ad01a Upgrade protobuf to 6.x for nvidia-riva-client 2.25.1 compatibility
nvidia-riva-client 2.25.1 ships with gencode compiled against protobuf
6.31.1, which requires a runtime >= 6.31.1. Update protobuf from 5.29.6
to >=6.31.1,<7 and grpcio-tools from 1.67.1 to 1.78.0 to match.
Regenerate frames_pb2.py with the new compiler.
2026-03-25 15:23:53 -04:00
Chad Bailey
052075c244 updated changelog 2026-03-25 19:12:37 +00:00
Chad Bailey
a8d0e1de9f Update changelog filename with PR number 2026-03-25 19:10:20 +00:00
Chad Bailey
4f0b2066c0 Add Deepgram Flux STT service for AWS SageMaker
Add DeepgramFluxSageMakerSTTService that combines SageMaker's HTTP/2
transport with Flux's JSON turn detection protocol (StartOfTurn,
EndOfTurn, EagerEndOfTurn, TurnResumed). Includes mid-stream Configure
support, silence watchdog, and an example bot.
2026-03-25 19:09:52 +00:00
filipi87
413dbaf974 Automated tests to validate the silence injection guards. 2026-03-25 16:05:58 -03:00
Ian Lee
5645909d34 [inworld] add falbback for empty timestamps from server 2026-03-25 11:55:09 -07:00
filipi87
da3f184316 Automated tests to validate the silence injection guards. 2026-03-25 15:38:21 -03:00
filipi87
e5a2723632 Fixed audio crackling and popping artifacts in AudioBufferProcessor. 2026-03-25 15:29:50 -03:00
Mark Backman
4ee4002d5d Merge pull request #4137 from pipecat-ai/mb/language-string-log-level-debug
Downgrade unrecognized language string log from warning to debug
2026-03-25 12:26:46 -04:00
Mark Backman
54a17ab1f3 Add changelog for #4142 2026-03-25 12:22:37 -04:00
Mark Backman
1c99a537b2 Consolidate Grok services into xai module
Both GrokLLMService and XAIHttpTTSService use the same xAI API (api.x.ai),
so move Grok source files into the xai module. Leave deprecation shims in
the old grok/ paths for backward compatibility.
2026-03-25 12:07:40 -04:00
Mark Backman
ff5d055b3c Merge pull request #4031 from niczy/xai-tts-service
Add xAI TTS service
2026-03-25 10:57:08 -04:00
Mark Backman
adc003d6c7 Code review cleanup 2026-03-25 10:53:07 -04:00
Nicholas Zhao
bbd14de9c5 Address PR review: rename to XAIHttpTTSService, add language map, clean up API
- Rename XAITTSService → XAIHttpTTSService and XAITTSSettings → XAIHttpTTSSettings
- Add language_to_xai_language() with explicit LANGUAGE_MAP using resolve_language()
- Remove deprecated InputParams, params, voice, language init params
- Remove XAI_DEFAULT_SAMPLE_RATE and XAI_PCM_CODEC constants; add encoding param
- Set sample_rate=None default (picked up from PipelineParams or user)
- Use Language.EN enum instead of string "en" for default language
- Add changelog/4031.added.md
- Add 07e-interruptible-xai.py foundational example
- Update 14g-function-calling-grok.py to use XAIHttpTTSService
- Register 07e in run-release-evals.py
2026-03-25 10:46:54 -04:00
Nicholas Zhao
02b97035f8 Add xAI TTS service 2026-03-25 10:45:15 -04:00
Mark Backman
f470ff193e Update language tests to expect debug instead of warning 2026-03-25 10:26:10 -04:00
Mark Backman
7bc8b89a54 Add changelog for #4137 2026-03-25 10:21:44 -04:00
Mark Backman
a8eff6fbbf Downgrade unrecognized language string log from warning to debug
Service-specific language strings like Deepgram's "multi" are valid
pass-through values, not issues worth warning about.
2026-03-25 10:20:36 -04:00
kompfner
86e086c6b5 Merge pull request #4130 from pipecat-ai/pk/realtime-services-init-v-context-system-instructions-cleanup
Prefer init-provided system instructions in realtime services
2026-03-25 09:13:52 -04:00
Paul Kompfner
4bdfe1cf31 Add changelog for realtime system instruction preference change 2026-03-24 17:34:50 -04:00
Paul Kompfner
bb33045389 Add system instruction conflict resolution tests for realtime adapters
Test that OpenAI Realtime, Grok Realtime, and Nova Sonic adapters
prefer init-provided system_instruction over context-provided, warn
on conflicts, and don't warn for developer messages.
2026-03-24 17:30:35 -04:00
Paul Kompfner
ac2b1ecd47 Prefer init-provided system instruction in Grok Realtime
Add system_instruction parameter to the Grok Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Update the Grok Realtime example to use settings.system_instruction
instead of session_properties.instructions.
2026-03-24 17:29:19 -04:00
Paul Kompfner
e7dd84b552 Prefer init-provided system instruction in OpenAI Realtime
Add system_instruction parameter to the OpenAI Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.
2026-03-24 17:21:53 -04:00
Paul Kompfner
39329aaddb Prefer init-provided system instruction in Nova Sonic
Add system_instruction parameter to the Nova Sonic adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Remove the service-side fallback logic, as the adapter now handles
resolution.
2026-03-24 17:18:44 -04:00
Paul Kompfner
56a56a4174 Prefer init-provided system instruction in Gemini Live
Pass self._system_instruction_from_init to the adapter's
get_llm_invocation_params(), which calls _resolve_system_instruction()
to prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Also fix the reconnect check to only reconnect when the resolved
system instruction actually differs from what the initial connection
used, avoiding unnecessary reconnects.
2026-03-24 17:06:56 -04:00
kompfner
b80328e038 Merge pull request #4125 from pipecat-ai/pk/gemini-live-endframe-deferral-issue
Gemini Live: fix EndFrame-deferral hang
2026-03-24 17:02:46 -04:00
kompfner
3a80be760b Merge pull request #4089 from pipecat-ai/pk/system-and-developer-message-handling-update
Centralize system message handling in adapters; add developer message support
2026-03-24 16:24:11 -04:00
Mark Backman
b66c892100 Add changelog for #4128 2026-03-24 16:15:00 -04:00
Mark Backman
6c30371295 Fix Deepgram Flux event handler docstring to match implementation
Update documented event signatures to include transcript argument
where the code actually passes it. Remove stale on_speech_started
and on_utterance_end entries that were never registered.
2026-03-24 16:12:25 -04:00
Mark Backman
ddf6a41854 Add on_end_of_turn event handler to AssemblyAI STT
Fires after the final transcript is pushed in both Pipecat and
AssemblyAI turn detection modes, giving users a reliable hook
that arrives after all transcript frames. Matches the existing
Deepgram Flux on_end_of_turn pattern.
2026-03-24 16:11:35 -04:00
Paul Kompfner
e0c49927cf Remove hard-coded model overrides from Together and Groq examples
Prefer service defaults — the hard-coded models we were using are no
longer available on these providers.
2026-03-24 16:05:15 -04:00
Paul Kompfner
45926a7135 Update Together.ai default model to openai/gpt-oss-20b
The previous default (meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) is
no longer available as a serverless Together.ai model and now requires a
custom deployment. The new default is openai/gpt-oss-20b, one of
Together's recommended models for small & fast use-cases.
2026-03-24 16:05:15 -04:00
Paul Kompfner
8c678c1c98 Set supports_developer_role = False for more OpenAI-compatible services
DeepSeek, Mistral, OLLama, Qwen, SambaNova, and Together don't support
the "developer" message role.
2026-03-24 16:05:15 -04:00
Paul Kompfner
4c121332cf Convert developer messages to user for Cerebras (and lay groundwork for other incompatible services)
OpenAI-compatible services that don't support the "developer" message
role can now set supports_developer_role = False on the service class.
BaseOpenAILLMService passes this as convert_developer_to_user to the
adapter, which converts developer messages to user messages before
sending them to the API. Applied to Cerebras and Perplexity.

Also removes the now-redundant developer→user conversion step from
PerplexityLLMAdapter (handled by the parent adapter via the flag).
2026-03-24 16:05:15 -04:00
Paul Kompfner
74686f9190 Add changelog for Gemini Live system_instruction fix 2026-03-24 16:05:15 -04:00
Paul Kompfner
19bcc8620c Fix Gemini Live not honoring settings.system_instruction
_system_instruction_from_init was being set from the deprecated
`system_instruction` constructor parameter instead of
`self._settings.system_instruction`, so system instructions provided
via settings were silently ignored.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0530722c58 Convert developer messages to user in Perplexity adapter
Perplexity doesn't support the "developer" role. Developer messages are
now converted to "user" before other transformations are applied.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0d1b834770 Add developer message support to realtime adapters
OpenAI Realtime, Grok Realtime, and AWS Nova Sonic adapters now convert
"developer" role messages to "user" (consistent with all other non-OpenAI
adapters). Previously these messages were silently dropped. Adds starter
unit tests for all three realtime adapters.
2026-03-24 16:05:15 -04:00
Paul Kompfner
7a0f7b58d1 Remove bit of unintentionally-left-in debugging logic 2026-03-24 16:05:15 -04:00
Paul Kompfner
5806a3f0fa Use "developer" role for remaining developer-intent messages in examples 2026-03-24 16:05:04 -04:00
Paul Kompfner
27fabfc1b3 Improve warning message wording and formatting 2026-03-24 16:02:42 -04:00
Paul Kompfner
d779a5b4ea Use "developer" role for programmatic conversation-kickoff messages
These messages are developer instructions to the assistant (e.g. "Please
introduce yourself to the user"), not simulated user input. The
"developer" role is semantically correct for this purpose.
2026-03-24 16:02:42 -04:00
Paul Kompfner
2bb36b5b66 Update changelog for developer message simplification 2026-03-24 16:02:42 -04:00
Paul Kompfner
e0bc9c73c6 Add Anthropic interruptible example (07e) and register in release evals 2026-03-24 16:02:42 -04:00
Paul Kompfner
2135557689 Simplify: don't promote developer messages to system instruction
Developer messages are now always converted to "user" in non-OpenAI
adapters, never promoted to the system instruction. This removes an
inconsistency where adding an unrelated message to context would change
whether a developer message got promoted.

Simplifications:
- Rename _extract_initial_system_or_developer → _extract_initial_system
- Return Optional[str] instead of Tuple (role is always "system")
- Drop initial_context_message_role from _resolve_system_instruction
- Drop system_role fields from all ConvertedMessages dataclasses
2026-03-24 16:02:42 -04:00
Paul Kompfner
a0393b9af6 Fix: warn on system_instruction conflict even with single system message
When the only message in context was a system message,
_extract_initial_system_or_developer would convert it to "user" (to
prevent empty history) without warning about the conflict with
system_instruction. Now warns inline before converting, with a message
explaining both the conflict and the user-role conversion.
2026-03-24 16:02:42 -04:00
Paul Kompfner
64ba013b68 Move OpenAI Responses adapter tests into test_get_llm_invocation_params.py
Consolidates all adapter get_llm_invocation_params tests in one file.
Adds new tests for developer message handling in the Responses adapter.
2026-03-24 16:02:42 -04:00
Paul Kompfner
7377d88cf5 Move system_instruction tests into test_get_llm_invocation_params.py 2026-03-24 16:02:42 -04:00
Paul Kompfner
3bbec0a2c8 Broaden docstring: all non-OpenAI providers need non-empty messages 2026-03-24 16:02:42 -04:00
Paul Kompfner
e29a63e1ae Improve _extract_initial_system_or_developer docstring clarity 2026-03-24 16:02:42 -04:00
Paul Kompfner
45178972d7 Fix stale docstring in PerplexityLLMAdapter 2026-03-24 16:02:42 -04:00
Paul Kompfner
bb7199d143 Add changelog entries for #4089 2026-03-24 16:02:42 -04:00
Paul Kompfner
d4dea30407 Centralize system message handling in adapters; add developer message support
Two goals:

1. Centralize system_instruction vs context system message resolution into
   the LLM adapters. This eliminates duplication between in-pipeline and
   out-of-band (run_inference) code paths across ~16 locations in service
   llm.py files.

2. Add support for "developer" role messages in conversation context, which
   is facilitated by the above centralization.

Shared helpers on BaseLLMAdapter:
- _extract_initial_system_or_developer: extracts/converts messages[0]
  based on role and whether system_instruction is provided
- _resolve_system_instruction: warns on conflicts between system_instruction
  and context system messages, returns the effective instruction

Developer message handling (new):
- Non-OpenAI adapters: an initial "developer" message is promoted to the
  system instruction when no system_instruction is provided; otherwise it
  is converted to "user". Subsequent "developer" messages are always
  converted to "user". No conflict warning is emitted for developer
  messages (unlike "system" messages).
- OpenAI adapter: "developer" messages pass through in conversation
  history without triggering conflict warnings.
- OpenAI Responses adapter: "developer" messages are kept as "developer"
  role (same as "system", which is also converted to "developer" for the
  Responses API).

Other behavior changes:
- Gemini: "initial" system message detection now checks messages[0] only
  (previously searched anywhere in the list)
- Bedrock: a lone system message is now converted to "user" instead of
  being extracted to an empty message list (matches existing Anthropic
  behavior)
2026-03-24 16:02:42 -04:00
Mark Backman
b49bf1c83f Merge pull request #4127 from pipecat-ai/mb/tts-text-frame-ordering
Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
2026-03-24 15:39:06 -04:00
Mark Backman
1b0f7ecb0e Merge pull request #4126 from pipecat-ai/mb/fix-tts-flush-phantom-contexts
Fix TTS flush creating phantom contexts on ElevenLabs
2026-03-24 15:33:58 -04:00
Mark Backman
8e57dd67a2 Add changelog for #4127 2026-03-24 15:10:48 -04:00
Mark Backman
5d71de8aad Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
Route LLMFullResponseEndFrame through the serialization queue instead
of pushing it directly downstream when push_text_frames is enabled.
This ensures the frame is emitted only after the audio context is
fully drained, preserving correct ordering relative to TTSTextFrames.

Previously, the final sentence TTSTextFrame would arrive at the
LLMAssistantAggregator after LLMFullResponseEndFrame, causing it to
be dropped from the conversation context (especially with RTVI text
input where no subsequent interruption would flush the orphaned text).
2026-03-24 15:09:42 -04:00
Paul Kompfner
dc56cb2ccc Gemini Live: reset _bot_is_responding when releasing deferred EndFrame
Without this, the released EndFrame re-enters process_frame, sees
_bot_is_responding is still True, defers again, and loops indefinitely.
2026-03-24 15:01:07 -04:00
Paul Kompfner
063955b7eb Gemini Live: clean up EndFrame deferral state on disconnect
Cancel the deferral timeout task and clear the pending EndFrame during
disconnect, which could otherwise be left dangling after a
CancelFrame-triggered shutdown.
2026-03-24 14:30:14 -04:00
Mark Backman
e05bd54743 Add changelog for #4126 2026-03-24 13:43:07 -04:00
Mark Backman
35f52f70ab Fix TTS flush creating phantom contexts on providers like ElevenLabs
When an interruption arrives before any LLM text reaches run_tts, the
turn context ID exists but was never registered via create_audio_context.
Calling flush_audio for this unregistered context sends a message to the
provider (e.g. ElevenLabs) with a context_id it has never seen, which
implicitly creates a server-side context that is never closed. After
enough rapid interruptions these phantom contexts accumulate and exceed
the providers limit (ElevenLabs: 5 simultaneous contexts, 1008 policy
violation).

Guard the flush call with audio_context_available so it only fires when
the context was actually opened.

Fixes #4114
2026-03-24 13:42:01 -04:00
Paul Kompfner
d05eb02b98 Add changelog for #4125 2026-03-24 12:54:50 -04:00
Paul Kompfner
4abd4d031d Gemini Live: add safety timeout to EndFrame deferral to prevent indefinite pipeline hang
When an EndFrame arrives while the bot is mid-response, it is deferred
until turn_complete is received. If turn_complete never arrives, the
EndFrame gets stuck forever and the pipeline hangs indefinitely.

Add a 30-second timeout: if turn_complete hasn't arrived by then, the
deferred EndFrame is released anyway with a warning log. The timeout
is cancelled if turn_complete arrives normally.
2026-03-24 12:50:07 -04:00
Paul Kompfner
7e42998e9e Gemini Live: fix potential EndFrame-deferral hang by handling turn_complete without usage_metadata
We observed a case where a deferred EndFrame was never released in
Gemini Live, causing the pipeline to hang indefinitely. The EndFrame
deferral mechanism waits for _handle_msg_turn_complete to set
_bot_is_responding back to False, but turn_complete messages were only
processed if they also contained usage_metadata. If Gemini ever sent
turn_complete without usage_metadata, the message would be silently
dropped and the deferred EndFrame would never be released.

Now turn_complete is always handled regardless of usage_metadata
presence, with usage_metadata processing only when available.

Note: we have not actually observed a turn_complete without
usage_metadata in practice, so this is a theoretical fix for the
EndFrame-deferral hang. The actual root cause of the observed hang
may lie elsewhere.
2026-03-24 12:32:14 -04:00
Filipi da Silva Fuchter
28eb4544d3 Merge pull request #4122 from pipecat-ai/filipi/inworld_follow_up
Invoking on_turn_context_created when we receive a TTSSpeakFrame.
2026-03-24 12:28:00 -04:00
Filipi da Silva Fuchter
b45dcb1ae0 Merge pull request #4028 from inworld-ai/ian/close-on-turn-complete
fix(inworld): close context at end of turn instead of relying on idle timeout
2026-03-24 12:07:51 -04:00
Mark Backman
6eb988b729 Merge pull request #4092 from harshitajain165/harshita/smallest-tts-only
Add Smallest AI TTS service integration
2026-03-24 11:54:34 -04:00
Mark Backman
f68b3222b3 Fix SmallestTTSService to use InterruptibleTTSService audio context system
- Route audio through audio contexts (append_to_audio_context) instead of
  pushing frames directly, enabling proper turn management and interruptions
- Add push_stop_frames and push_start_frame so the base class handles
  TTSStartedFrame/TTSStoppedFrame lifecycle
- Remove manual context_id tracking (self._context_id) in favor of
  get_active_audio_context_id()
- Don't call remove_audio_context on "complete" — Smallest sends one
  per request, not per turn; let the base class timeout handle cleanup
- Guard v2-only params (consistency, similarity, enhancement) so they
  aren't sent to lightning-v3.1
- Remove request_id from request payload (not a documented request field)
- Add flush_audio override to send flush to WebSocket
2026-03-24 11:46:28 -04:00
filipi87
3274235ea1 Adding missing changelog entry. 2026-03-24 12:42:56 -03:00
filipi87
05b9c514fb Invoking TTSSpeakFrame when we receive a TTSSpeakFrame. 2026-03-24 12:39:28 -03:00
Filipi da Silva Fuchter
03c0d7c345 Merge pull request #4013 from inworld-ai/ian/prewarm-context-inworld-v2
[inworld] Pre-open WebSocket TTS context on LLM response start
2026-03-24 11:37:28 -04:00
Filipi da Silva Fuchter
0783edb185 Merge pull request #4120 from pipecat-ai/filipi/krisp-viva-vad-support
Added cleanup() method to VADAnalyzer base class
2026-03-24 11:26:53 -04:00
Mark Backman
51d28b4a9f Code review fixes 2026-03-24 11:21:04 -04:00
kompfner
cf083b8411 Merge pull request #4078 from pipecat-ai/cb/gemini-updates
Updates for Gemini Live
2026-03-24 11:18:00 -04:00
Harshita Jain
099814d74a Add Smallest AI TTS service integration
Adds SmallestTTSService, a WebSocket-based TTS service using Smallest AI's
Lightning v3.1 model. Follows current Pipecat service conventions:

- SmallestTTSSettings dataclass with runtime-updatable settings (voice,
  language, speed, etc.)
- Reconnects on model change; keepalive every 30s to prevent idle timeout
- TTS settings default to None so the API applies its own defaults
- Model enum: SmallestTTSModel.LIGHTNING_V3_1

Includes a foundational example (07zl-interruptible-smallest.py) using
Deepgram STT + Smallest TTS + OpenAI LLM.

STT integration will follow in a separate PR once the hallucination/finalize
behaviour is resolved.

Made-with: Cursor
2026-03-24 11:11:10 -04:00
Mark Backman
dd45843c42 Merge pull request #4117 from m-ods/feat/assemblyai-domain-param
feat(assemblyai): add domain parameter for Medical Mode
2026-03-24 11:02:01 -04:00
Mark Backman
fe15d8654b Add changelog for #4117 2026-03-24 10:57:55 -04:00
Paul Kompfner
68a440ae2e Move inference_on_context_initialization comment to constructor level 2026-03-24 10:49:45 -04:00
Paul Kompfner
8109ab6135 Further tweaks and improvements to Gemini 3 support in Gemini Live
Gets Gemini 3 support to the point where it works with:
- The "legacy" pattern from the previous (removed) 26- example
- inference_on_context_initialization=True (the default)
- inference_on_context_initialization=False
2026-03-24 10:45:41 -04:00
Filipi da Silva Fuchter
f311a0b6e4 Merge pull request #4084 from pipecat-ai/filipi/refactor_stop_frame
Refactoring the way we automatically push TTSStoppedFrame.
2026-03-24 10:06:02 -04:00
filipi87
9df8985d60 Refactoring the way we automatically push TTSStoppedFrame. 2026-03-24 11:00:06 -03:00
filipi87
b3a25e0ebe Adding changelog entry for cleanup method. 2026-03-24 10:53:07 -03:00
filipi87
02cfb129d3 Invoke cleanup method on VAD analyzer. 2026-03-24 10:49:14 -03:00
filipi87
311afef7da Fixing Krisp Viva example. 2026-03-24 10:48:22 -03:00
Filipi da Silva Fuchter
5ed183d215 Merge pull request #4022 from krispai/krisp-viva-vad-support
Draft Implementation for Krisp VIVA VAD.
2026-03-24 09:44:32 -04:00
Mark Backman
5c3d3aea2b Merge pull request #4115 from pipecat-ai/mb/user-turn-stop-warnings
Warn when VAD stop_secs misconfiguration may degrade turn detection
2026-03-24 09:32:20 -04:00
Mark Backman
0651569a4e Merge pull request #4119 from Alex-wuhu/novita-integration
feat: add Novita AI as LLM provider
2026-03-24 09:30:05 -04:00
Mark Backman
bf04ea2043 Add changelog for #4119 2026-03-24 09:23:40 -04:00
Mark Backman
aa0b49d69f Code review fixes 2026-03-24 09:22:08 -04:00
Alex-wuhu
8c6f4a8d7b Add Novita AI LLM service provider 2026-03-24 09:20:50 -04:00
Mark Backman
bbaa5971c4 Merge pull request #3981 from dhruvladia-sarvam/feat/sarvam-llm-integration
Sarvam LLM Integration
2026-03-24 09:00:01 -04:00
Mark Backman
cdd8c3e5bb Fix examples 2026-03-24 08:53:56 -04:00
Mark Backman
1c8a8f51d4 Code review fixes 2026-03-24 08:46:03 -04:00
dhruvladia-sarvam
349b8645f3 Merge branch 'main' into feat/sarvam-llm-integration 2026-03-24 16:34:12 +05:30
dhruvladia-sarvam
696196e30c alignment with pr 4081 2026-03-24 16:29:58 +05:30
Garegin Harutyunyan
dacffccd3a fixed runtime issue. 2026-03-24 12:56:19 +04:00
Martin Schweiger
f21b262969 feat(assemblyai): add domain parameter for Medical Mode
Add `domain` field to AssemblyAISTTSettings to support AssemblyAI's
streaming API `domain` query parameter, enabling specialized recognition
modes like Medical Mode (`medical-v1`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 13:09:42 +08:00
Aleix Conchillo Flaqué
7414b30308 Merge pull request #4116 from pipecat-ai/changelog-0.0.107
Release 0.0.107 - Changelog Update
2026-03-23 20:13:49 -07:00
aconchillo
3268cb93d5 Update changelog for version 0.0.107 2026-03-23 20:11:31 -07:00
Aleix Conchillo Flaque
9211379720 update uv.lock 2026-03-23 20:06:28 -07:00
Mark Backman
42cab7eea0 Add changelog entry for #4115 2026-03-23 18:01:04 -04:00
Mark Backman
483b643b07 Warn when VAD stop_secs misconfiguration may degrade turn detection
Add warnings in SpeechTimeoutUserTurnStopStrategy and
TurnAnalyzerUserTurnStopStrategy when stop_secs differs from the
recommended default (0.2s) or when stop_secs >= STT p99 latency,
which collapses the STT wait timeout to 0s. Document the stop_secs=0.2
assumption in stt_latency.py.
2026-03-23 17:57:51 -04:00
Filipi da Silva Fuchter
12dc429761 Merge pull request #4104 from pipecat-ai/filipi/audio_issue
Allow defining whether to insert silence in the output transport.
2026-03-23 17:17:37 -04:00
filipi87
066b206b3d Renaming insert_silence to auto_silence 2026-03-23 18:12:26 -03:00
filipi87
ddd1b71b56 Renaming audio_out_insert_silence to audio_out_auto_silence 2026-03-23 17:57:42 -03:00
filipi87
8612c9f50a Updating to use daily-python 0.27.0 2026-03-23 17:52:41 -03:00
Mark Backman
d314e2831a Simplify 26 name, update evals 2026-03-23 15:46:13 -04:00
Mark Backman
fd0bfe141f Merge pull request #4109 from pipecat-ai/pk/tiny-fix 2026-03-23 15:17:19 -04:00
filipi87
3042929989 Fixing changelog description. 2026-03-23 15:57:25 -03:00
dhruvladia-sarvam
0f6cc231cf removing error wraps and model validation check 2026-03-24 00:06:15 +05:30
Chad Bailey
844555c520 removed old Gemini Live example 2026-03-23 18:31:36 +00:00
dhruvladia-sarvam
3428a4c6ad fixes post PR 4081 2026-03-23 23:45:27 +05:30
Mark Backman
f283cc5bc6 Merge pull request #4091 from pipecat-ai/mb/gradium-multiplexing-setup
feat: send per-context setup in Gradium TTS multiplexing
2026-03-23 12:00:53 -04:00
Mark Backman
70552d7697 Add changelog entry for #4091 2026-03-23 11:58:14 -04:00
Mark Backman
84c2a24c9f feat: send per-context setup messages in Gradium TTS multiplexing
Send a setup message with client_req_id before the first text message
for each context, matching Gradium multiplexing protocol. This allows
Gradium to associate each session with its setup configuration when
using close_ws_on_eos=False.
2026-03-23 11:58:14 -04:00
Garegin Harutyunyan
f8c7414ea7 format fix. 2026-03-23 18:58:19 +04:00
Garegin Harutyunyan
f1f51de962 Merge branch 'main' into krisp-viva-vad-support 2026-03-23 18:35:58 +04:00
Paul Kompfner
e93b0ace06 Remove an unnecessary check in SyncParallelPipeline 2026-03-23 10:00:32 -04:00
Garegin Harutyunyan
c32240e14b Fixed review comments. 2026-03-23 17:44:48 +04:00
filipi87
e6602f9244 Disabling auto_silence for tavus video service. 2026-03-22 18:28:57 -03:00
filipi87
9a30b18f21 Configuring Daily CustomAudioSource to automatically inject silence or not. 2026-03-22 17:29:01 -03:00
filipi87
936a39f4a1 Updating tavus examples to not send silence. 2026-03-22 14:41:23 -03:00
filipi87
3b1cb30926 Adding changelog entry. 2026-03-22 13:26:00 -03:00
filipi87
ce36487143 Allow defining whether to insert silence in the output transport. 2026-03-22 13:09:09 -03:00
Mark Backman
ec3bd8c5b1 Merge pull request #4097 from pipecat-ai/mb/update-minimax-docs-link
Update MiniMaxHttpTTSService platform docs link
2026-03-21 07:08:40 -04:00
Mark Backman
622ebd5d74 Update MiniMaxHttpTTSService platform docs link 2026-03-21 07:02:06 -04:00
Mark Backman
a9a1941a45 Merge pull request #4093 from poislagarde/fix/genesys-pong-parameters 2026-03-20 19:52:58 -04:00
Pablo Ois Lagarde
53e0136366 chore: rename changelog fragment to PR #4093
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:46:35 -03:00
Pablo Ois Lagarde
bc0e7130b8 fix: always include parameters field in Genesys AudioHook messages
The AudioHook protocol requires every message to carry a `parameters`
object. `_create_message` conditionally included it only when parameters
were truthy, so pong responses and closed responses without
outputVariables were sent without the field.

Clients that validate message structure (including the Genesys reference
implementation) rejected these messages, which broke server sequence
tracking and prevented outputVariables from reaching the Architect flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:37:53 -03:00
Mark Backman
d8af4447ff Merge pull request #3449 from kingster/telemetry-fix-system-message
fix: Record correct system_instruction in LLM spans for LLM services
2026-03-20 13:42:47 -04:00
Mark Backman
c89e366739 refactor: align tracing attributes with OpenTelemetry GenAI conventions
- gen_ai.system -> gen_ai.provider.name (deprecated)
- system / system_instructions -> gen_ai.system_instructions
- gen_ai.usage.cache_read_input_tokens -> gen_ai.usage.cache_read.input_tokens
- gen_ai.usage.cache_creation_input_tokens -> gen_ai.usage.cache_creation.input_tokens
2026-03-20 13:36:20 -04:00
Ian Lee
e9f3086ea3 add on_turn_context_created hook instead 2026-03-20 10:33:25 -07:00
Mark Backman
b5c362d6e6 refactor: rename tracing span attribute "system" to "system_instructions"
Align with the OpenTelemetry GenAI semantic convention
gen_ai.system_instructions for system prompts. The old "system"
attribute name was unrelated to gen_ai.system (which is for
provider name).
2026-03-20 13:20:03 -04:00
Mark Backman
e5aaa4c4eb fix: read system_instruction from _settings instead of removed attribute
Replace adapter-based extraction in traced_llm with direct reads from
_settings.system_instruction (priority) and context messages (fallback).
The old approach had three bugs: signature mismatch with Anthropic
adapter, key name inconsistency, and unnecessary overhead from full
message/tools conversion.

Also deduplicate the system instruction in spans -- it was appearing as
both "system" and "param.system_instruction".
2026-03-20 13:12:40 -04:00
Varun Singh
a12ad27348 enable_dialout should not depend on sip_caller_phone being set (#4087)
* enable_dialout should not depend on SIP being set

* we still need room_prefix to have pipecat-sip, s/sip/telephony in room prefix
2026-03-20 10:01:31 -07:00
Mark Backman
44504efdc7 Merge pull request #4090 from pipecat-ai/mb/fix-tts-audio-context-routing
fix: route TTS audio through audio context queue in Fish, LMNT, Neuph…
2026-03-20 11:06:53 -04:00
Mark Backman
da8070e98e Add changelog entry for #4090 2026-03-20 10:46:36 -04:00
Mark Backman
b98ad7fb64 fix: route TTS audio through audio context queue in Fish, LMNT, Neuphonic, Rime NonJson
These services were pushing audio frames directly via push_frame() in their
WebSocket receive loops, bypassing the base TTSService audio context
serialization queue. This causes incorrect frame ordering and broken
interruption handling.

Changes per service:
- Fish Audio: use append_to_audio_context(), replace _handle_interruption
  with on_audio_context_interrupted()
- LMNT: use append_to_audio_context(), remove redundant push_frame override
- Neuphonic: use append_to_audio_context(), remove redundant push_frame and
  process_frame overrides (base class handles pause/resume)
- Rime NonJson: use append_to_audio_context(), remove redundant push_frame
  override
2026-03-20 10:41:43 -04:00
Mark Backman
10ddf45015 Merge pull request #4088 from pipecat-ai/mb/add-community-integrations-README
Add community integrations to README
2026-03-20 10:34:09 -04:00
Filipi da Silva Fuchter
e41cb2cd0c Merge pull request #4083 from pipecat-ai/filipi/deepgram_sagemaker_tts_improvements
Improvements to DeepgramSageMakerTTSService
2026-03-20 10:30:48 -04:00
Filipi da Silva Fuchter
a69abcc67a Merge pull request #4082 from pipecat-ai/filipi/sarvam_tts_improvements
Improvements to SarvamTTSService.
2026-03-20 10:28:02 -04:00
Mark Backman
a11c48d5b0 Add community integrations to README 2026-03-20 10:09:58 -04:00
Kinshuk Bairagi
7caec9018b Merge branch 'pipecat-ai:main' into telemetry-fix-system-message 2026-03-20 18:36:31 +05:30
kompfner
08052d8880 Merge pull request #4085 from pipecat-ai/pk/remove-broken-05a-example
Remove 05a example, which was broken and isn't currently a priority t…
2026-03-19 15:59:39 -04:00
Paul Kompfner
4c456ada04 Remove 05a example, which was broken and isn't currently a priority to fix 2026-03-19 15:52:48 -04:00
kompfner
488dc1d07e Merge pull request #4074 from pipecat-ai/pk/openai-responses-llm-service
feat: add OpenAI Responses API LLM service
2026-03-19 15:44:26 -04:00
Paul Kompfner
dafbb2eb66 fix: typo "conversatione" → "conversation" in 20- examples 2026-03-19 15:38:38 -04:00
Paul Kompfner
ea1534f9f8 docs: note input_audio coming soon, no conversion needed
The LLMContext format already matches the expected Responses API
shape for input_audio, so no adapter conversion will be needed
once OpenAI enables support.
2026-03-19 15:36:23 -04:00
kompfner
f6e7599e49 Merge pull request #4029 from pipecat-ai/pk/sync-parallel-pipeline-fixes
`SyncParallelPipeline` and related fixes
2026-03-19 14:41:16 -04:00
Paul Kompfner
6424c36666 refactor: remove model init param from OpenAIResponsesLLMService
Model is only configurable via settings, matching the canonical API.
2026-03-19 14:38:01 -04:00
Paul Kompfner
05e344b9ec docs: port _closing comments from BaseOpenAILLMService 2026-03-19 14:30:34 -04:00
Paul Kompfner
4ec7be8850 feat: include cached_tokens and reasoning_tokens in usage metrics 2026-03-19 14:23:39 -04:00
Paul Kompfner
0533ea7b7f refactor: use direct attribute access for typed stream events
Replace getattr() calls with direct attribute access and isinstance()
checks on the strongly-typed OpenAI SDK event models.
2026-03-19 14:19:10 -04:00
Paul Kompfner
a3431d3b01 fix: prefer _full_model_name over _settings.model in tracing
The API-provided full model name is more specific than the
user-provided model name (e.g. includes version/snapshot details).
Reorder the lookup in _get_model_name and add a comment where the
Responses service sets the field.
2026-03-19 13:58:35 -04:00
Paul Kompfner
348df9d4ce fix: remove redundant instructions override in run_inference
The override would re-add `instructions` after the adapter had
intentionally converted it to a developer message for empty contexts.
Added a regression test.
2026-03-19 13:34:41 -04:00
Filipi da Silva Fuchter
a9256ebc35 Merge pull request #4075 from pipecat-ai/filipi/tts_frame_order
Fixing TTS frame order
2026-03-19 13:30:28 -04:00
filipi87
a0f311158d Changelog entry for the DeepgramSageMakerTTSService improvements. 2026-03-19 11:46:49 -03:00
filipi87
d3ca034c4f Routing the audio through the audio context queue. 2026-03-19 11:40:43 -03:00
filipi87
39425a675a Improvements to DeepgramSageMakerTTSService. 2026-03-19 11:32:56 -03:00
filipi87
c4d1b89049 Adding changelog entry for the Sarvam fixes. 2026-03-19 11:17:39 -03:00
filipi87
fd8c6c88bb Improvements to SarvamTTSService. 2026-03-19 11:13:17 -03:00
Paul Kompfner
57fd29f0c4 Remove changelog fragment that no longer applies after a rebase 2026-03-19 09:57:26 -04:00
Paul Kompfner
06f7da44f1 Clarify SyncParallelPipeline docstrings
Rewrite docstrings to more clearly explain what SyncParallelPipeline
does: hold all output until every parallel branch finishes, so frames
produced in response to a single input are released together.
2026-03-19 09:43:51 -04:00
Paul Kompfner
d702ebd6a2 Add frame_order parameter to SyncParallelPipeline
Adds a FrameOrder enum with ARRIVAL (default, existing behavior) and
PIPELINE (pushes frames in pipeline definition order). This lets callers
guarantee output ordering between parallel pipelines — e.g. ensuring
image frames precede audio frames — without needing a separate reordering
processor downstream.

Updates the 05-sync-speech-and-image example to use FrameOrder.PIPELINE,
removing the ImageBeforeAudioReorderer class entirely.
2026-03-19 09:43:51 -04:00
Paul Kompfner
26fc238eb7 Add changelog entry for Whisker debugger fix 2026-03-19 09:43:51 -04:00
Paul Kompfner
61ff53f2b9 Add changelog entries for PR #4029 2026-03-19 09:43:51 -04:00
Paul Kompfner
5e7639812a Add ImageBeforeAudioReorderer to sync-speech-and-image example
Add a processor after SyncParallelPipeline that ensures each image frame
precedes its corresponding TTS audio frames. SyncParallelPipeline batches
them together but doesn't guarantee branch ordering. The reorderer detects
when TTS frames arrive before their image (via context_id tracking) and
holds them until the image arrives.

Also rename ImageAudioSync to MarkImageForPlaybackSync for clarity.
2026-03-19 09:43:51 -04:00
Paul Kompfner
ba779f920f Revert a couple of logs that were changed from trace to debug just for debugging 2026-03-19 09:43:51 -04:00
Paul Kompfner
c3d6e965d8 Use TextAggregationMode.TOKEN in the 05-sync-speech-and-image
example since the SentenceAggregator already provides complete sentences.
2026-03-19 09:43:37 -04:00
Paul Kompfner
0f1ff16af1 Add sync_with_audio support for OutputImageRawFrame
Add a `sync_with_audio` field to `OutputImageRawFrame` that routes image
frames through the audio queue in the output transport, ensuring images
are only displayed after all preceding audio has been sent. This enables
proper audio/image synchronization in pipelines like the calendar month
narration example.

Update the 05-sync-speech-and-image example to use an `ImageAudioSync`
processor that sets this flag on image frames.
2026-03-19 09:41:21 -04:00
Paul Kompfner
1ede8460a2 Fix SyncParallelPipeline race condition with concurrent SystemFrame processing
The FrameProcessor two-queue architecture processes SystemFrames and
non-SystemFrames on separate concurrent async tasks. Both paths called
SyncParallelPipeline.process_frame(), which used the same per-pipeline
sink queues. A SystemFrame's wait_for_sync could steal frames from a
concurrent non-SystemFrame's wait_for_sync, corrupting synchronization
and stalling the pipeline.

This was triggered by the auto-embedded RTVI processor (added in
v0.0.101) which floods OutputTransportMessageUrgentFrame SystemFrames
through the pipeline during LLM responses.

Fix: SystemFrames (except EndFrame) now take a fast path — passed
through internal pipelines and pushed downstream directly without
touching the sink queues or drain logic. EndFrame retains the full
drain behavior as a lifecycle frame.
2026-03-19 09:41:21 -04:00
Paul Kompfner
463db59bb5 Minor comment typo fix 2026-03-19 09:41:21 -04:00
Paul Kompfner
0be4084683 Fix bug resulting in SyncParallelPipeline breaking the Whisker debugger 2026-03-19 09:41:21 -04:00
filipi87
8f6dfc4777 Mentioning the frame order fix in the changelog. 2026-03-19 10:26:58 -03:00
filipi87
6841c0719b Always appending TTSTextFrame to the audio context. 2026-03-19 10:12:01 -03:00
filipi87
2836b1ea7e Fixing the frame ordering of the AggregatedTextFrame. 2026-03-19 10:07:25 -03:00
filipi87
5fd98e1391 Fixing TTS frame order. 2026-03-19 09:43:40 -03:00
Mark Backman
ef419cd87a Merge pull request #4073 from joachimchauvet/fix/livekit-mixer-invalidstate-log-spam
Suppress InvalidState log spam from audio mixer during interruptions in LiveKit transport
2026-03-19 08:39:42 -04:00
Aleix Conchillo Flaqué
8750c26cdc Merge pull request #4080 from pipecat-ai/changelog-0.0.106
Release 0.0.106 - Changelog Update
2026-03-18 23:39:22 -07:00
aconchillo
3e0c536fe7 Update changelog for version 0.0.106 2026-03-18 23:36:18 -07:00
Aleix Conchillo Flaqué
7ee5fa9e20 Merge pull request #4079 from pipecat-ai/aleix/fix-tavus-dtmf-callback
Add missing on_dtmf_event callback to Tavus transport
2026-03-18 21:47:28 -07:00
Aleix Conchillo Flaqué
7dfcaf8096 Add missing on_dtmf_event callback to Tavus transport
The on_dtmf_event callback was added to DailyCallbacks in #4047 but
the Tavus transport was not updated, causing a missing argument error.
2026-03-18 21:46:06 -07:00
Chad Bailey
05157129e2 added changelog 2026-03-18 23:31:18 +00:00
Chad Bailey
4a0411cbc4 disabled single responses for gemini 3 live models 2026-03-18 23:23:45 +00:00
Chad Bailey
6cd39b8b42 updates 2026-03-18 23:04:22 +00:00
Chad Bailey
38d7882f0f updated context seeding to allow gemini 3.1 to greet the user 2026-03-18 21:28:17 +00:00
Filipi da Silva Fuchter
4aea7784c9 Fixed the ordering of _maybe_pause_frame_processing call in TTSService (#4071)
* Fixing the invocation of pause_frame_processing at the correct time when receiving LLMFullResponseEndFrame and EndFrame.
2026-03-18 16:55:59 -04:00
Mark Backman
bad10177d4 Add WakePhraseUserTurnStartStrategy (#4064)
- Add WakePhraseUserTurnStartStrategy for gating interaction behind wake                                                                            
  phrase detection, with timeout and single_activation modes                                                                                        
- Add default_user_turn_start_strategies() and                                                                                                      
  default_user_turn_stop_strategies() helper functions                                                                                              
- Deprecate WakeCheckFilter in favor of the new strategy
- Extend ProcessFrameResult to stop strategies for short-circuit evaluation
- Fix MinWordsUserTurnStartStrategy including filtered text in output
2026-03-18 16:47:17 -04:00
Mark Backman
c4be513044 Improvements for Nova Sonic LLM and TTS output frames (#4042)
* Fix empty user transcription causing spurious interruption in Nova Sonic

Skip _report_user_transcription_ended() when _user_text_buffer is empty,
which happens when the initial prompt is text-only. Previously, an empty
TranscriptionFrame was pushed upstream, triggering a chain reaction:
on_user_turn_stopped → UserStartedSpeakingFrame → interruption →
premature BotStoppedSpeaking → multiple response start/stop cycles.

* Improve TextFrame and assistant end of turn logic

Now, SPECULATIVE text results are used to push the LLMTextFrame,
AggregatedTextFrame, and TTSTextFrame. Additionally, the TTSTextFrames
are push at the end of the corresponding audio segment. 

* Remove BotStoppedSpeakingFrame fallback from Nova Sonic

Now that assistant response end is detected directly from Nova Sonic
contentEnd events (END_TURN and INTERRUPTED), the BotStoppedSpeakingFrame
handler is no longer needed. Inline the cleanup logic in reset_conversation.
2026-03-18 16:04:12 -04:00
Mark Backman
4b704e6d3a GradiumSTTService improvements (#4066)
* Remove duplicate reconnection logic from Gradium STT

The _receive_messages method had its own while-True reconnect loop,
duplicating the reconnection handling already provided by
WebsocketService._receive_task_handler (exponential backoff, max
retries, error reporting). Flatten to just the inner message loop
and let the base class handle reconnection.

* Align Gradium STT VAD handling with base class patterns

Replace the process_frame override with a _handle_vad_user_stopped_speaking
override, which is the proper hook provided by STTService. Move
start_processing_metrics() into run_stt (matching Gladia's pattern).
Remove unused FrameDirection and VADUserStartedSpeakingFrame imports.

* Add transcript aggregation delay after flushed to capture trailing tokens

Gradium flushed response can arrive before all text tokens have been
delivered. Instead of finalizing immediately on flushed, start a short
timer (100ms) that allows trailing tokens to accumulate before pushing
the final TranscriptionFrame.

* Add changelog for PR #4066

* Change default encoding to pcm_16000

* Decouple encoding from sample_rate in Gradium STT

The encoding parameter now takes just the base type (pcm, wav, opus)
and the sample rate is derived from the pipeline audio_in_sample_rate,
assembled dynamically via input_format_from_encoding(). This fixes the
mismatch where SAMPLE_RATE=24000 was passed to the base class while
encoding defaulted to pcm_16000.
2026-03-18 15:57:34 -04:00
Paul Kompfner
b1a8588209 feat: add 12- and 14d- image/video examples for OpenAI Responses 2026-03-18 15:39:06 -04:00
Paul Kompfner
5de794e1da feat: add service_tier support to OpenAIResponsesLLMService 2026-03-18 15:29:04 -04:00
Paul Kompfner
891966346c feat: add 55zi update-settings example for OpenAI Responses 2026-03-18 15:17:16 -04:00
Paul Kompfner
2001ab4577 feat: add 20a persistent context example for OpenAI Responses 2026-03-18 15:14:28 -04:00
Paul Kompfner
0449df828c chore: update previous_response_id comment 2026-03-18 15:07:10 -04:00
Paul Kompfner
951bb0c1a7 feat: set store=False and add run_inference tests
Set store=False in Responses API calls since we send full conversation
history as input items and don't use previous_response_id.

Add 5 run_inference tests for OpenAIResponsesLLMService using real
LLMContext and adapter (only HTTP client mocked).
2026-03-18 14:47:12 -04:00
Paul Kompfner
21b1812c71 chore: add note about previous_response_id and empty input handling 2026-03-18 14:26:51 -04:00
Paul Kompfner
c4f21ef76b test: add run_inference tests for OpenAIResponsesLLMService
Uses real LLMContext and adapter (only HTTP client is mocked) to test
basic inference, client exception propagation, system_instruction
override, empty context fallback, and max_tokens override.
2026-03-18 14:17:21 -04:00
Paul Kompfner
a7167ad121 test: add run_inference tests for OpenAIResponsesLLMService
Tests cover basic inference, client exception propagation,
system_instruction override, and max_tokens override.
2026-03-18 14:09:17 -04:00
Paul Kompfner
eaccb96454 docs: add changelog for OpenAI Responses API service 2026-03-18 11:46:49 -04:00
Paul Kompfner
45186cc4ce feat: add OpenAI Responses API LLM service
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).

- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
2026-03-18 11:45:23 -04:00
joachimchauvet
0378fb0d91 fix(livekit): suppress InvalidState log spam from audio mixer during interruptions 2026-03-18 16:04:42 +02:00
Mark Backman
53388e0426 Merge pull request #4063 from pipecat-ai/mb/wake-word-start-strategy 2026-03-17 21:05:10 -04:00
Mark Backman
edf16c5533 fix: pass list-type Deepgram settings as lists instead of stringifying
List-valued settings like keyterm, keywords, search, redact, and replace
were being converted to strings before being passed to the SDK connect()
method. The SDK expects lists so its encode_query can produce repeated
query params (keyterm=a&keyterm=b).
2026-03-17 18:24:20 -04:00
Mark Backman
d4f69dd333 Merge pull request #4046 from pipecat-ai/mb/fix-4045
Fix SonioxSTTService crash when language_hints contains plain strings…
2026-03-17 16:41:11 -04:00
Mark Backman
a32f558b07 Merge pull request #4026 from pipecat-ai/mb/fix-deepgram-base-url
Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
2026-03-17 16:39:24 -04:00
Mark Backman
4e99cb39b0 Merge pull request #4056 from pipecat-ai/mb/fix-filter-turns-deprecation
Fix deprecation warning when using filter_incomplete_user_turns
2026-03-17 16:23:43 -04:00
Mark Backman
10b3bff525 Merge pull request #4058 from pipecat-ai/mb/improve-stt-tts-language-code-robustness
fix: resolve raw language strings through Language enum for proper service conversion
2026-03-17 16:20:12 -04:00
Mark Backman
95ee096622 Merge pull request #4057 from pipecat-ai/mb/fix-4053
Fix stale state in user turn stop strategies between turns
2026-03-17 16:19:31 -04:00
Mark Backman
6799995b0a Merge pull request #4062 from pipecat-ai/mb/update-pyasn1-0.6.3
Update uv.lock with pyasn1 v0.6.3
2026-03-17 16:19:13 -04:00
Mark Backman
05abc95b5f Update uv.lock with pyasn1 v0.6.3 2026-03-17 16:10:35 -04:00
Yavuz Alp Sencer ÖZTÜRK
9a55eb67cf fix(openai): handle tool calls with empty/null arguments
When an LLM returns a tool call with no arguments (arguments=null in
the streaming chunks), the tool call is silently dropped because:

1. `tool_call.function.arguments` is None, so nothing is accumulated
   and `arguments` stays as "" (empty string)
2. `if function_name and arguments:` treats "" as falsy, skipping the
   entire tool call execution

OpenAI always sends arguments="{}" even for parameterless tools,
masking this bug. But vLLM, Ollama, and other OpenAI-compatible
providers may omit arguments entirely when the tool schema has no
required parameters, causing tool calls to be silently ignored.

Fix: check only `function_name` (not `arguments`) and default empty
arguments to "{}" so `json.loads` produces an empty dict. Apply the
same fallback for intermediate tool calls in multi-tool responses.
2026-03-17 19:44:59 +03:00
Mark Backman
18e654b3f0 docs: add changelog for #4058 2026-03-17 12:01:50 -04:00
Mark Backman
790a23d2e5 fix: resolve raw language strings through Language enum for proper service conversion
Raw strings like "de-DE" passed as the language parameter to TTS/STT services
were bypassing the Language enum resolution logic, causing silent failures
(e.g. ElevenLabs expects "de" not "de-DE"). Now raw strings are first converted
to Language enums so they go through the same resolve_language() path, with a
warning logged for unrecognized strings.
2026-03-17 12:00:28 -04:00
Mark Backman
d70df1d8b0 Add changelog for #4057 2026-03-17 11:35:38 -04:00
Mark Backman
5000b040dd Fix stale state in user turn stop strategies between turns
Reset stop strategies at turn start (not just turn stop) so that late
transcriptions arriving between turns do not leave stale _text that
causes premature stops on the next turn. Also cancel pending timeout
tasks in reset() for both SpeechTimeout and TurnAnalyzer strategies.
2026-03-17 11:31:08 -04:00
Mark Backman
248419a7c4 Merge pull request #4050 from pipecat-ai/copilot/update-enable-dialout-to-false
Fix PSTN runner defaulting enable_dialout to True
2026-03-17 11:07:23 -04:00
Mark Backman
024e2ebd4e Fix deprecation warning when using filter_incomplete_user_turns 2026-03-17 10:51:01 -04:00
Mark Backman
091f88e42e feat: add enable_dialout parameter to configure() for dial-out rooms
Expose enable_dialout as a configure() parameter (default False) so
dial-out examples can opt in without needing to build DailyRoomProperties
manually.
2026-03-17 09:03:50 -04:00
Mark Backman
e11b486312 fix: clean up configure() type hints, deduplicate token expiry, and improve comment
Narrow misleading Optional type hints on parameters that never accept
None, extract the duplicated token_exp_duration * 60 * 60 calculation,
remove unnecessary forward-reference quotes on DailyMeetingTokenProperties,
and clarify why enable_dialout is explicitly set to False.
2026-03-17 08:54:07 -04:00
Mark Backman
f54b3c6884 Merge pull request #4048 from julienvantyghem/daily-audio-only-docstring
update enable_recording param  documentation
2026-03-17 08:21:50 -04:00
copilot-swe-agent[bot]
7e60320a74 fix: set enable_dialout to False in PSTN runner to prevent room creation failures
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-03-17 04:04:11 +00:00
copilot-swe-agent[bot]
89cb0f089e Initial plan 2026-03-17 04:01:00 +00:00
Julien Vantyghem
e5b4403ed4 update docstring following https://github.com/pipecat-ai/pipecat/pull/3916 2026-03-16 19:54:04 -06:00
Mark Backman
a0595adbdc Merge pull request #4012 from pipecat-ai/mb/deprecate-old-local-smart-turn 2026-03-16 21:09:26 -04:00
Mark Backman
dc1632bbac Merge pull request #4023 from pipecat-ai/mb/update-small-webrtc-prebuilt-2.4.0 2026-03-16 21:09:08 -04:00
Mark Backman
53f49ac094 Merge pull request #4024 from pipecat-ai/mb/fix-lang-enum-stt-tts 2026-03-16 21:08:48 -04:00
Mark Backman
bf02d61418 Merge pull request #4025 from pipecat-ai/mb/fix-example-system-instruction 2026-03-16 21:07:01 -04:00
Mark Backman
154a8d1987 Merge pull request #4035 from pipecat-ai/mb/bump-pyjwt-version 2026-03-16 21:06:31 -04:00
Mark Backman
fa5b757408 Merge pull request #4044 from pipecat-ai/mb/pyopenssl-upgrade 2026-03-16 21:06:09 -04:00
Aleix Conchillo Flaqué
c765bc98d3 Merge pull request #4047 from pipecat-ai/aleix/daily-python-0.25.0-dtmf-events
Update daily-python to 0.25.0 and add DTMF input events
2026-03-16 18:05:10 -07:00
Aleix Conchillo Flaqué
59486d5abf Add changelog entries for PR #4047 2026-03-16 17:58:12 -07:00
Aleix Conchillo Flaqué
5cb6aecc9f Add DTMF input event support to Daily transport
Handle Daily's on_dtmf_event callback, convert it to an
InputDTMFFrame pushed into the input transport. Also add __str__
methods to InputDTMFFrame and OutputDTMFFrame for better logging.
2026-03-16 17:57:39 -07:00
Aleix Conchillo Flaqué
5c685c35d7 pyproject: update daily-python to 0.25.0 2026-03-16 17:41:44 -07:00
Aleix Conchillo Flaqué
1a1d5e6a84 Merge pull request #4006 from pipecat-ai/aleix/task-frame-flush-ordering
handle EndTaskFrame, StopTaskFrame and CancelTaskFrame downstream
2026-03-16 17:35:11 -07:00
Mark Backman
abb8bae6f7 Add changelog for #4046 2026-03-16 19:51:37 -04:00
Mark Backman
2801439e48 Fix OpenAI STT crash when language is a plain string instead of Language enum 2026-03-16 19:48:49 -04:00
Mark Backman
3b8d040e41 Fix SonioxSTTService crash when language_hints contains plain strings (#4045)
Refactor language_to_soniox_language to use resolve_language + LANGUAGE_MAP
pattern consistent with other services. Fix resolve_language fallback to use
str(language) instead of language.value so plain strings don't crash.
2026-03-16 19:45:03 -04:00
Mark Backman
538b9fa2d9 Bump pyopenssl in uv.lock to 26.0.0 2026-03-16 17:58:44 -04:00
dhruvladia-sarvam
8a4f6b486e wrapper fixes 2026-03-17 02:47:47 +05:30
dhruvladia-sarvam
8745f20330 fix llm wrapper redundancy and restore run_inference parity 2026-03-15 22:24:06 +05:30
Mark Backman
b437cbe126 Merge pull request #4037 from omChauhanDev/fix/llm-switcher-timeout-secs
forward timeout_secs in LLMSwitcher register methods
2026-03-15 10:08:11 -04:00
Om Chauhan
ed0f5ab09b added changelog 2026-03-15 19:15:18 +05:30
Om Chauhan
a6ad8a355b forward timeout_secs in LLMSwitcher register methods 2026-03-15 19:10:32 +05:30
Mark Backman
e8415b7451 Add changelog for #4035 2026-03-15 08:56:54 -04:00
Mark Backman
24c3d23229 Bump PyJWT minimum version to 2.12.0 for CVE-2026-32597
Addresses Dependabot alert #165 (GHSA-752w-5fwx-jx9f) where PyJWT
<= 2.11.0 accepts unknown `crit` header extensions.
2026-03-15 08:53:06 -04:00
Ian Lee
3e5be23bd8 fix(inworld): close context at end of turn instead of relying on idle timeout
The Inworld WS TTS plugin previously relied on the base TTS service's 3-second AUDIO_CONTEXT_TIMEOUT to detect when audio was done, then sent close_context in on_audio_context_completed. This added unnecessary latency before TTSStoppedFrame was emitted.

The original implementation likely borrowed this idea from the 11labs' impelementation. But it's likely better to mirror the Cartesia plugin pattern where on_audio_context_completed is a no-op because the server signals completion directly.

Now close_context is sent in on_turn_context_completed (right after flush_context), so the server responds with contextClosed immediately after the last audio byte. The existing receive handler already calls remove_audio_context on contextClosed, which exits the audio context handler cleanly.
2026-03-13 12:52:07 -07:00
Mark Backman
2f7c441c1c Add changelog for #4026 2026-03-13 13:55:27 -04:00
Mark Backman
79b7a0f969 Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
The base_url parameter previously forced wss:// and https:// schemes,
breaking air-gapped or private deployments that need ws:// or http://.
Extract URL derivation into _derive_deepgram_urls() helper that respects
the developers scheme choice while deriving the paired WebSocket and
HTTP URLs the Deepgram SDK requires.

Closes #4019
2026-03-13 13:53:06 -04:00
Mark Backman
978a1a2083 Update the system_instruction wording in the foundational examples to not mention WebRTC call 2026-03-13 12:22:10 -04:00
Mark Backman
0ec5f5e5ac Add missing language deprecations for XTTSService, LmntTTSService 2026-03-13 11:33:59 -04:00
Garegin Harutyunyan
33f042b500 format fix. 2026-03-13 19:32:39 +04:00
Garegin Harutyunyan
0722784f3a tests for VAD. 2026-03-13 19:30:03 +04:00
Mark Backman
1ea23ad362 Add changelog for #4024 2026-03-13 10:58:51 -04:00
Mark Backman
9f2f73b6b4 Remove redundant per-service language conversion from subclasses
Now that the base TTSService and STTService handle Language enum
conversion at init time, subclasses no longer need to convert in their
own __init__ methods. Remove conversion calls from hardcoded defaults,
params paths, and deprecated direct arg paths across 22 service files.

Services just pass raw Language enums and let the base class convert
via language_to_service_language() polymorphic dispatch.
2026-03-13 10:57:04 -04:00
Mark Backman
8467058e48 Fix Language enum conversion at init time in base TTS/STT services
When a Language enum (e.g. Language.ES) is passed via
settings=Service.Settings(language=Language.ES), it gets stored as-is
without conversion to the service-specific code. The base
_update_settings() handles this for runtime updates, but at init time
apply_update() copies the raw enum. This causes API errors because
services send the unconverted enum value.

Add language conversion in TTSService.__init__ and STTService.__init__
after super().__init__(), using the subclass language_to_service_language()
via normal method resolution.
2026-03-13 10:56:33 -04:00
Garegin Harutyunyan
cbc1c275b3 num_frames_required() implementation. 2026-03-13 18:28:22 +04:00
Mark Backman
7365ebfdf9 Add changelog for #4023 2026-03-13 10:22:58 -04:00
Garegin Harutyunyan
14ca70f13e Fixed format issue. 2026-03-13 18:22:56 +04:00
Mark Backman
1064482ade Update pipecat-ai-small-webrtc-prebuilt to 2.4.0 2026-03-13 10:20:51 -04:00
Garegin Harutyunyan
f7568a91b1 Draft Implementation for Krisp VIVA VAD. 2026-03-13 18:12:21 +04:00
Ian Lee
dfe5fec8f9 [inworld] prewarm context on llm response start 2026-03-12 15:34:57 -07:00
Mark Backman
ed0b8dadb5 Add changelog for #4012 2026-03-12 17:22:13 -04:00
Mark Backman
de38ca626d Deprecate LocalSmartTurnAnalyzerV2 and LocalCoreMLSmartTurnAnalyzer
Both analyzers are superseded by LocalSmartTurnAnalyzerV3. Added
deprecation warnings and docstring notices following the existing
pattern from LocalSmartTurnAnalyzer.
2026-03-12 17:19:32 -04:00
kompfner
30d95e3b84 Merge pull request #4009 from pipecat-ai/pk/perplexity-message-ordering-strictness
Add PerplexityLLMAdapter for message ordering strictness
2026-03-12 16:51:11 -04:00
Paul Kompfner
99f28120b7 Remove trailing system→user conversion for cross-call stability
Perplexity appears to have statefulness within a conversation, so
converting a system message to "user" in one call and then back to
"system" in the next (after more messages are appended) causes API
errors. Remove the trailing system→user conversion entirely — if the
context only has system messages, the API call will fail but the
mistake will be caught right away.
2026-03-12 16:07:39 -04:00
Paul Kompfner
e69f5a76e1 Add test for trailing assistant+system ordering, improve docstring
Add test exercising the step 3 ordering where stripping a trailing
assistant exposes a system message that then gets converted to user.
Move the reasoning about when a trailing system message can occur
into the docstring.
2026-03-12 15:24:17 -04:00
Paul Kompfner
7f98cc9921 Remove initial system message merging, handle trailing system messages
Perplexity allows multiple initial system messages, so don't merge them.
Instead, skip system-system pairs during the consecutive same-role merge
step. Broaden the trailing message fix to convert any trailing system
message to user (not just a lone system message), so contexts with only
system messages don't fail.
2026-03-12 15:14:56 -04:00
Mark Backman
43a2d55c61 Merge pull request #4010 from pipecat-ai/mb/quickstart-cloud-build
Update quickstart to use cloud builds
2026-03-12 15:07:06 -04:00
Paul Kompfner
e4bf6281c6 Add changelog for #4009 2026-03-12 14:56:37 -04:00
Paul Kompfner
0373f85b85 Add PerplexityLLMAdapter to enforce Perplexity's message ordering constraints
Perplexity's API is stricter than OpenAI about conversation history:
- Requires strict alternation between user/tool and assistant messages
- Disallows system messages except as the initial message
- Requires the last message to be user or tool

The new adapter transforms messages before sending to satisfy all three
constraints: merging consecutive initial system messages, converting
non-initial system to user, merging consecutive same-role messages, and
removing trailing assistant messages.

Also adds dual-system-instruction warnings to Cerebras, Fireworks,
Mistral, Perplexity, and SambaNova services (matching the existing
BaseOpenAILLMService pattern), and updates the warning text in
BaseOpenAILLMService to be more descriptive.
2026-03-12 14:56:30 -04:00
Mark Backman
38a4d4ff23 Update quickstart to use cloud builds 2026-03-12 14:46:49 -04:00
Aleix Conchillo Flaqué
f6f08d19a8 Add changelog for #4006 2026-03-12 11:34:25 -07:00
Aleix Conchillo Flaqué
2eccd28cf0 handle EndTaskFrame, StopTaskFrame and CancelTaskFrame downstream
EndTaskFrame and StopTaskFrame are now ControlFrames instead of
SystemFrames, so they flow through the pipeline and queue behind
pending work. This prevents races where EndFrame could overtake
in-flight frames (e.g. function call responses).

CancelTaskFrame and InterruptionTaskFrame remain SystemFrames
(via new TaskSystemFrame base): since they need immediate propagation.

The sink now catches EndTaskFrame, StopTaskFrame and CancelTaskFrame
downstream and re-queues it upstream to the task, ensuring the full
pipeline drains before shutdown begins.
2026-03-12 11:34:25 -07:00
Aleix Conchillo Flaqué
374bfd4068 Merge pull request #4007 from pipecat-ai/aleix/fix-parallel-pipeline-flush-and-tts-stop-order
Fix ParallelPipeline flush ordering and TTS stop sequence
2026-03-12 10:21:31 -07:00
Aleix Conchillo Flaqué
a461b2b9e6 Add changelog entries for PR #4007 2026-03-12 10:16:29 -07:00
Aleix Conchillo Flaqué
1a66bdef8e Fix TTS stop ordering to drain audio contexts before canceling
Wait for _audio_context_task to finish draining the contexts queue
before canceling _stop_frame_task, ensuring all pending audio
contexts are processed during shutdown.
2026-03-12 10:16:29 -07:00
Aleix Conchillo Flaqué
73a56f5d81 Fix ParallelPipeline flush ordering and buffered frame handling
Flush buffered frames before pushing the synchronization frame so
downstream processors see the buffered frames first.  Switch to a
while-loop with pop(0) so frames added to the buffer during flush
are also drained.
2026-03-12 10:16:29 -07:00
kompfner
383300979d Merge pull request #4004 from pipecat-ai/pk/service-settings-update-frame-can-target-specific-service
Add optional `service` field to `ServiceUpdateSettingsFrame` for targ…
2026-03-12 11:48:41 -04:00
Paul Kompfner
27b686db8c Don't bother honoring the new LLMUpdateSettingsFrame.service field in the deprecated OpenAIRealtimeBetaLLMService 2026-03-12 11:04:49 -04:00
Mark Backman
3ffa72170b Merge pull request #3457 from ahoshaiyan/fix/reduce-tool-result-context-size
Reduce Tool Result Context Size by Using UTF-8 for JSON Serialization
2026-03-12 10:41:33 -04:00
Mark Backman
1fe1f0f439 Apply ensure_ascii=False to remaining LLM services and fix changelog format 2026-03-12 10:35:19 -04:00
Ali Alhoshaiyan
765fbeec63 Add changelog 2026-03-12 10:35:19 -04:00
Ali Alhoshaiyan
84538b0ca8 Reduce Call Tool Result Context Size by Allowing UTF-8 in JSON Serialization 2026-03-12 10:35:19 -04:00
Mark Backman
1c676c2073 Merge pull request #4005 from pipecat-ai/add-sip-provider-room-geo-to-configure
Add sip_provider and room_geo params to configure()
2026-03-12 09:28:28 -04:00
Mark Backman
bf66ae7e46 Add changelog for #4005 2026-03-12 09:22:31 -04:00
Varun Singh
7a7d600985 Add sip_provider and room_geo parameters to configure()
Add convenience parameters to configure() so callers don't need to
manually construct DailyRoomProperties/DailyRoomSipParams for common
SIP provider and geo configuration.
2026-03-11 21:50:10 -07:00
Paul Kompfner
36b57252b4 Add changelog for PR #4004 2026-03-11 21:47:51 -04:00
Paul Kompfner
65e4e365dc Add optional service field to ServiceUpdateSettingsFrame for targeting a specific service instance
When `service` is set and doesn't match, the service forwards the frame instead of consuming it. This allows targeting a specific service when multiple services of the same type exist in the pipeline.
2026-03-11 21:41:43 -04:00
kompfner
36f9a6d809 Merge pull request #4003 from pipecat-ai/pk/fix-deprecated-vad-analyzer-usage
Fix deprecated vad_analyzer usage in examples
2026-03-11 20:55:39 -04:00
Mark Backman
904331bba1 Merge pull request #4001 from pipecat-ai/mb/simli-settings
Migrate SimliVideoService to AIService with Settings pattern
2026-03-11 17:45:59 -04:00
Mark Backman
11b14b7857 Add changelog for PR #4001 2026-03-11 17:40:53 -04:00
Mark Backman
c0a3cdd35c Merge pull request #4002 from pipecat-ai/mb/update-quickstart-0.0.105
Update quickstart example for 0.0.105
2026-03-11 17:39:07 -04:00
Paul Kompfner
69e7677f4f Remove changelog for #4003 2026-03-11 17:33:20 -04:00
Paul Kompfner
9a0568e6fe Add changelog for #4003 2026-03-11 17:32:39 -04:00
Paul Kompfner
ccc2549c0c Broaden the vad_analyzer deprecation warning in BaseInputTransport to account for use-cases where there is no LLMUserAggregator at play 2026-03-11 17:28:26 -04:00
Paul Kompfner
e456a6bb23 Move away from remaining deprecated TransportParams.vad_analyzer usage in example files. Skip updates to deprecated services. 2026-03-11 17:17:40 -04:00
Mark Backman
2d9dc2fa1c Update quickstart example for 0.0.105 2026-03-11 17:12:59 -04:00
Mark Backman
59dc30a84d Merge pull request #3997 from pipecat-ai/mb/sarvam-package-0.1.26
Update sarvamai dependency from 0.1.26a2 to 0.1.26
2026-03-11 16:59:32 -04:00
Mark Backman
a54aa2d1f8 Migrate SimliVideoService to AIService with Settings pattern
Align Simli with HeyGen/Tavus by extending AIService instead of
FrameProcessor and using a ServiceSettings dataclass. InputParams is
preserved but deprecated; its fields are promoted to direct init params.
Lifecycle handling moves to start()/stop()/cancel() methods.
2026-03-11 16:56:41 -04:00
Mark Backman
3ceff3d5fd Merge pull request #4000 from pipecat-ai/mb/fix-openai-default-model
Fix: Restore default model to gpt-4.1 for OpenAI, Azure
2026-03-11 16:29:51 -04:00
kompfner
52057d628e Merge pull request #3999 from pipecat-ai/pk/camb-voice-int
Override CambTTSSettings.voice type from str to int to match Camb.ai'…
2026-03-11 16:18:59 -04:00
Mark Backman
4a45145cba Restored the default model to gpt-4.1 for OpenAI and Azure LLM services
The default model for OpenAILLMService and AzureLLMService was still set
to gpt-4o. Restored it to gpt-4.1. Also, removed hardcoded gpt-4o/gpt-4o-mini
model references from examples so they pick up the new default.
2026-03-11 16:18:47 -04:00
Paul Kompfner
080ed22ff5 Override CambTTSSettings.voice type from str to int to match Camb.ai's integer voice IDs 2026-03-11 15:44:05 -04:00
Mark Backman
71e6158861 Add changelog for PR #3997 2026-03-11 14:18:47 -04:00
Mark Backman
a9e124b84f Update sarvamai dependency from 0.1.26a2 to 0.1.26
Bump the Sarvam AI SDK to the stable release version.
2026-03-11 14:17:40 -04:00
kompfner
65561a1d83 Merge pull request #3996 from pipecat-ai/pk/prefer-nested-settings-alias
Prefer nested settings alias
2026-03-11 13:41:29 -04:00
Paul Kompfner
e5b60ba095 Make deprecated-init-param warnings recommend the preferred Service.Settings(...) pattern
Move the warning helper into AIService as _warn_init_param_moved_to_settings.
It now uses type(self).__name__ to produce messages like
"Use settings=AnthropicLLMService.Settings(model=...)" instead of the raw
settings class name "AnthropicLLMSettings(model=...)". Callers no longer need
to pass the settings class explicitly.
2026-03-11 13:04:15 -04:00
Paul Kompfner
eb9212f152 Update COMMUNITY_INTEGRATIONS.md code sample to prefer Settings alias over raw settings class name 2026-03-11 12:37:43 -04:00
Paul Kompfner
51a8a28a99 Prefer Service.ThinkingConfig over raw ThinkingConfig class names in Anthropic and Google services and examples 2026-03-11 12:34:10 -04:00
Paul Kompfner
6b168d6bbb Prefer Service.Settings over raw settings class names across all services
Replace direct references to settings class names (e.g. `FooSettings`) with the nested `Settings` alias form throughout all 87 service files:
- Type annotations: `Settings`
- Runtime code: `self.Settings`
- Docstrings: `ServiceClass.Settings`
- Cross-file inheritance: `ParentService.Settings`

This makes the `Settings` alias the canonical way to reference a service's settings, keeping only the class definition and alias assignment as the remaining hits for each raw settings class name.
2026-03-11 12:15:00 -04:00
kompfner
cbb4835e7b Merge pull request #3991 from pipecat-ai/pk/fix-out-of-date-docstrings
Fix out of date docstrings
2026-03-11 10:54:40 -04:00
Paul Kompfner
3cbd27d202 Add changelog for PR #3991 2026-03-11 10:44:15 -04:00
Paul Kompfner
42262d10bb Move OpenAIRealtimeSTTService's noise_reduction into its Settings object, as it might be useful to update it at runtime, and fix outdated OpenAIRealtimeSTTService docstring example 2026-03-11 10:44:15 -04:00
Paul Kompfner
df82df8e39 Fix outdated Google + Gemini TTS service docstring examples 2026-03-11 10:14:18 -04:00
Paul Kompfner
0ebcb55582 Fix outdated DeepgramSageMakerTTSService docstring example 2026-03-11 10:11:26 -04:00
Paul Kompfner
264ce681f7 Fix outdated DeepgramSageMakerSTTService docstring example 2026-03-11 10:10:15 -04:00
Paul Kompfner
916936d3ee Fix outdated Sarvam TTS docstring examples 2026-03-11 10:07:07 -04:00
Paul Kompfner
087abc9bb9 Fix outdated CambTTSService docstring example 2026-03-11 10:03:21 -04:00
Aleix Conchillo Flaqué
7e88b13421 Merge pull request #3983 from pipecat-ai/changelog-0.0.105
Release 0.0.105 - Changelog Update
2026-03-10 17:59:02 -07:00
aconchillo
610dc25fb1 Update changelog for version 0.0.105 2026-03-10 17:58:32 -07:00
Aleix Conchillo Flaqué
327bcfa8d2 Merge pull request #3982 from pipecat-ai/aleix/fix-examples
Fix Groq, Google, and Nvidia examples
2026-03-10 17:37:26 -07:00
Aleix Conchillo Flaqué
4c19337d89 Fix examples: Groq model, Google settings class, Nvidia system instruction 2026-03-10 15:29:52 -07:00
dhruvladia-sarvam
dc0386937a Initial 2026-03-11 02:27:57 +05:30
Aleix Conchillo Flaqué
a4310d4335 Merge pull request #3980 from pipecat-ai/aleix/move-google-vertex-openai
Move Google Vertex and OpenAI LLM modules to subpackages
2026-03-10 13:37:02 -07:00
Aleix Conchillo Flaqué
23218aaed7 Add changelog for #3980 2026-03-10 13:04:16 -07:00
Aleix Conchillo Flaqué
7be2c43e1d Update imports to use new google.gemini_live.vertex path 2026-03-10 13:00:31 -07:00
Aleix Conchillo Flaqué
ea09586db6 Add deprecation stub for google/gemini_live/llm_vertex.py 2026-03-10 13:00:02 -07:00
Aleix Conchillo Flaqué
d086b9f138 Move google/gemini_live/llm_vertex.py to google/gemini_live/vertex/llm.py 2026-03-10 12:59:36 -07:00
Aleix Conchillo Flaqué
b23652caa6 Update imports to use new google.vertex and google.openai paths 2026-03-10 12:58:04 -07:00
Aleix Conchillo Flaqué
4fa3890cec Add deprecation stub for google/llm_openai.py 2026-03-10 12:55:16 -07:00
Aleix Conchillo Flaqué
8ea006739c Move google/llm_openai.py to google/openai/llm.py 2026-03-10 12:54:37 -07:00
Aleix Conchillo Flaqué
b159d02b0c Add deprecation stub for google/llm_vertex.py 2026-03-10 12:54:05 -07:00
Aleix Conchillo Flaqué
0df421de9c Move google/llm_vertex.py to google/vertex/llm.py 2026-03-10 12:53:13 -07:00
Aleix Conchillo Flaqué
ed5b061716 Merge pull request #3979 from pipecat-ai/aleix/daily-optional-transcription-settings
Clean up start_transcription to use its settings parameter
2026-03-10 12:51:31 -07:00
kollaikal-rupesh
80bd935c19 Add ServiceSwitcherStrategyFailover for automatic failover on service errors (#3870)
* Add ServiceSwitcherStrategyFailover for automatic error-based service switching

Introduce a strategy hierarchy: ServiceSwitcherStrategy (base) →
ServiceSwitcherStrategyManual (handles ManuallySwitchServiceFrame) →
ServiceSwitcherStrategyFailover (adds error-based failover). ServiceSwitcher
now defaults to ServiceSwitcherStrategyManual with strategy_type optional.
Non-fatal ErrorFrames are forwarded to the strategy via handle_error().

* Move metadata request into _set_active_if_available

Requesting metadata is part of making a service active, so it belongs
alongside setting _active_service and firing on_service_switched. This
removes the duplicate queue_frame calls from ServiceSwitcher push_frame
and process_frame.
2026-03-10 15:37:30 -04:00
Mark Backman
43a9e9a1b5 Merge pull request #3899 from pipecat-ai/mb/tracing-service-settings-comment
Add defensive comment for given_fields() usage in tracing
2026-03-10 15:33:57 -04:00
Aleix Conchillo Flaqué
11a0c11050 Fix start_transcription ignoring its settings argument
DailyTransportClient.start_transcription() accepted a settings
parameter but always used self._params.transcription_settings
instead, silently discarding any custom settings passed by callers.
2026-03-10 12:08:53 -07:00
Aleix Conchillo Flaqué
4a2d57511d Make DailyParams.transcription_settings optional
Change transcription_settings to Optional[DailyTranscriptionSettings]
defaulting to None. The default settings are now applied at the call
site when transcription is started, and start_transcription receives
the serialized settings dict directly.
2026-03-10 11:55:38 -07:00
Aleix Conchillo Flaqué
743e2ac277 Merge pull request #3831 from pipecat-ai/aleix/custom-video-tracks
Replace VirtualCameraDevice with CustomVideoTrack + custom video track support
2026-03-10 11:44:29 -07:00
Aleix Conchillo Flaqué
86597cc9ec Add changelog entries for PR #3831 2026-03-10 11:32:16 -07:00
Aleix Conchillo Flaqué
14dd028b8f Add custom video track example with per-track params 2026-03-10 11:32:16 -07:00
Aleix Conchillo Flaqué
18e99123af Replace VirtualCameraDevice with CustomVideoTrack and add custom video track support
Use CustomVideoSource/CustomVideoTrack for the default camera output instead of
VirtualCameraDevice, mirroring how audio already uses CustomAudioSource/CustomAudioTrack.
Add support for custom video destinations (register_video_destination, add/remove
custom video tracks, routing in write_video_frame) so multiple video tracks can be
published simultaneously.
2026-03-10 11:32:16 -07:00
kompfner
6c4a46dc79 Merge pull request #3978 from pipecat-ai/pk/fix-inaccurate-comment
Fix an out-of-date comment for accuracy. In the OpenAI LLM service, w…
2026-03-10 14:29:39 -04:00
Mark Backman
9b26faff05 Merge pull request #3961 from ai-coustics/goekmengoergen/sys-663-re-enable-enhancement-level-feature-on-pipecat
Add enhancement_level support to `AICFilter`.
2026-03-10 14:24:15 -04:00
Paul Kompfner
3790640322 Fix an out-of-date comment for accuracy. In the OpenAI LLM service, we *don't* replace any context system messages with system instructions from the constructor. 2026-03-10 13:59:01 -04:00
Aleix Conchillo Flaqué
c25d5af8c8 Merge pull request #3970 from pipecat-ai/aleix/update-daily-python
Update daily-python to 0.24.0
2026-03-10 10:49:27 -07:00
Aleix Conchillo Flaqué
6e52623959 Merge pull request #3976 from pipecat-ai/aleix/fix-google-system-instruction-priority
Fix Google LLM system instruction priority
2026-03-10 10:48:28 -07:00
Mark Backman
912f1be31c Add system_instruction parameter to run_inference (#3968)
* Add system_instruction parameter to run_inference

Allow callers to provide a custom system instruction directly when calling
run_inference, without having to construct provider-specific context objects.

For OpenAI, the instruction is prepended as a system message (preserving
existing messages). For Anthropic, Google, and AWS Bedrock, it overrides the
single system field with a warning when an existing system instruction is
present in the context.

* Use system_instruction parameter in _generate_summary

Pass the summarization prompt via run_inference's system_instruction
parameter instead of embedding it as a system message in the context.

* Add changelog for #3968
2026-03-10 12:57:23 -04:00
Mark Backman
0817a57f4c Merge pull request #3974 from pipecat-ai/mb/azure-stt-region-optional
Make Azure STT region optional when private_endpoint is used
2026-03-10 12:31:39 -04:00
Aleix Conchillo Flaqué
db27aaa790 Add changelog for #3976 2026-03-10 09:26:26 -07:00
Aleix Conchillo Flaqué
153705f05b Fix Google LLM system instruction priority
Constructor/settings system_instruction now takes priority over the
context system message. Previously the context value would overwrite
the constructor value on every call. Warn when both are set.
2026-03-10 09:25:42 -07:00
Mark Backman
54c767cce3 Merge pull request #3960 from sysradium/fix-realtime-calls
Treat conversation_already_has_active_response as non-fatal in Realtime API
2026-03-10 11:40:34 -04:00
Mark Backman
2ce9179662 Merge pull request #3958 from pipecat-ai/mb/deepgram-tts-audio-context
Route Deepgram WebSocket TTS audio through audio context queue
2026-03-10 11:37:39 -04:00
Mark Backman
50cc01a578 Guard against None context ID in append_to_audio_context
After interruption, both _playing_context_id and _turn_context_id are
None. If a subclass calls append_to_audio_context(None, frame), the
recovery path matches (None == None) and creates a bogus audio context
that blocks the handler from ever processing the real context.

Early-return when context_id is falsy to prevent this.
2026-03-10 11:34:03 -04:00
Mark Backman
d5c0789ab5 Add changelog for #3958 2026-03-10 11:34:03 -04:00
Mark Backman
92b5185165 Route Deepgram WebSocket TTS audio through audio context queue
The Deepgram TTS service was bypassing pipecats audio context management
system, pushing audio frames directly via push_frame() instead of routing
them through append_to_audio_context(). This caused stale audio to leak
into the pipeline after interruptions and missed ordered playback
guarantees.

- Route audio frames through append_to_audio_context() with context
  availability checks to discard stale post-interruption frames
- Handle Flushed responses by appending TTSStoppedFrame and removing
  the audio context to signal completion
- Replace _handle_interruption override with on_audio_context_interrupted
  hook (the recommended pattern used by ElevenLabs and Cartesia)
- Remove redundant process_frame override that caused double-flush
  (base class already flushes via on_turn_context_completed)
- Remove redundant start_tts_usage_metrics call (base class handles
  aggregated usage metrics)
2026-03-10 11:34:03 -04:00
sysradium
ba0ebd5525 Treat conversation_already_has_active_response as non-fatal in Realtime API 2026-03-10 15:57:55 +01:00
Gökmen Görgen
3a6f848a5b update test description. 2026-03-10 14:54:49 +01:00
kompfner
c660152a84 Merge pull request #3966 from pipecat-ai/pk/add-some-more-missing-55-examples
Add missing 55-* update-settings examples for OpenPipe LLM and XTTS TTS
2026-03-10 09:45:58 -04:00
Gökmen Görgen
a96702acfc fix test. 2026-03-10 14:41:18 +01:00
Gökmen Görgen
780559dc32 address feedback. 2026-03-10 14:23:00 +01:00
Gökmen Görgen
8e1c8a38e4 don't change enhancement level if bypass toggled. 2026-03-10 14:18:45 +01:00
Gökmen Görgen
483f6689ed address feedback, use one logging. 2026-03-10 13:52:13 +01:00
Gökmen Görgen
bc11bf9673 remove _is_filter_enabled from AICFilter and refactor related logic and tests. 2026-03-10 13:48:32 +01:00
Gökmen Görgen
82b300298a add changelog. 2026-03-10 13:36:15 +01:00
Gökmen Görgen
0c87fcc48c re-add bypass parameter support to AICFilter and update related unit tests. 2026-03-10 13:36:15 +01:00
Gökmen Görgen
df64f3f943 add enhancement_level support to AICFilter.
# Conflicts:
#	src/pipecat/audio/filters/aic_filter.py
2026-03-10 13:36:15 +01:00
Mark Backman
db22bf0f75 Merge pull request #3973 from yuki901/fish-audio-s2-pro
Update Fish Audio default model from s1 to s2-pro
2026-03-10 07:57:27 -04:00
Mark Backman
edc65fc45e Add changelog for #3974 2026-03-10 07:48:02 -04:00
Mark Backman
233867fdfb Make region optional and validate Azure STT config
Make `region` optional so users can provide only `private_endpoint`.
Raise ValueError if neither is provided, and warn if both are given
(private_endpoint takes priority).
2026-03-10 07:47:05 -04:00
yukiobata1
ceb53e044b Add changelog for #3973 2026-03-10 19:29:47 +09:00
yukiobata1
c7ef23dd22 Update Fish Audio default model from s1 to s2-pro 2026-03-10 18:22:20 +09:00
Kinshuk Bairagi
9cc2644719 Improve system message extraction in traced_llm
Enhanced the logic for extracting the system message in the traced_llm decorator to support LLMContext via adapter and handle exceptions gracefully. This improves compatibility with different context types and ensures better tracing information.
2026-03-10 11:23:29 +05:30
Aleix Conchillo Flaqué
d79e35d84f Add changelog for #3970 2026-03-09 20:47:42 -07:00
Aleix Conchillo Flaqué
00eb190424 Update daily-python to 0.24.0 2026-03-09 20:47:13 -07:00
Mark Backman
0dc95692ba Merge pull request #3967 from pipecat-ai/mb/fix-azure-stt-private-endpoint 2026-03-09 21:57:10 -04:00
Mark Backman
07b901c2a5 Add changelog for #3967 2026-03-09 15:05:20 -04:00
Mark Backman
f533dc3203 Fix Azure STT SpeechConfig failing when private_endpoint is provided
SpeechConfig does not accept both `region` and `endpoint` simultaneously —
they are mutually exclusive. The previous code always passed both, which
raises ValueError when a user supplies a private_endpoint URL. Now we
conditionally pass either `endpoint` or `region`, never both.
2026-03-09 15:05:20 -04:00
Paul Kompfner
20c3f553b2 Add missing 55-* update-settings examples for OpenPipe LLM and XTTS TTS 2026-03-09 14:36:15 -04:00
kompfner
02791cd503 Merge pull request #3965 from pipecat-ai/pk/fix-integration-tests
Fix broken `test_unified_function_calling_anthropic` due to use of an…
2026-03-09 13:42:35 -04:00
kompfner
f2debd9b1d Merge pull request #3963 from pipecat-ai/pk/improve-claude-changelog-skill
Improve changelog skill: prioritize user-facing language and update e…
2026-03-09 13:00:32 -04:00
kompfner
c0c49d0ddc Merge pull request #3964 from pipecat-ai/pk/add-some-missing-55-examples
Add missing 55-* update-settings examples for Piper TTS, Kokoro TTS, …
2026-03-09 12:59:36 -04:00
Mark Backman
3d1f866e73 Merge pull request #3951 from pipecat-ai/mb/remove-unused-imports-2026-03-07
Remove unused imports, 2026-03-07
2026-03-09 12:49:08 -04:00
Mark Backman
786279f143 Remove unused imports, 2026-03-07 2026-03-09 12:44:47 -04:00
Paul Kompfner
9423d22051 Fix broken test_unified_function_calling_anthropic due to use of an unsupported/deprecated model.
Update the tests in test_integration_unified_function_calling.py to not specify particular models but instead just use service defaults (the tests shouldn't be model-dependent anyway)
2026-03-09 12:07:56 -04:00
Paul Kompfner
f1bb065823 Add missing 55-* update-settings examples for Piper TTS, Kokoro TTS, Whisper STT, and Whisper MLX STT
Also fix 13e-whisper-mlx.py to pass MLXModel.LARGE_V3_TURBO.value instead of the enum directly.
2026-03-09 11:54:25 -04:00
Filipi da Silva Fuchter
0c5e936aa5 Merge pull request #3936 from pipecat-ai/filipi/fix_push_aggregation
Fixed TTS context not being appended to the assistant message history
2026-03-09 11:14:38 -04:00
filipi87
f0c5925a79 Fixing Piper test. 2026-03-09 12:07:45 -03:00
Paul Kompfner
7f9169269c Improve changelog skill: prioritize user-facing language and update example changelog 2026-03-09 10:45:33 -04:00
Mark Backman
c16e534f73 Merge pull request #3952 from pipecat-ai/mb/settings-alias
Add Settings class attribute alias to all service classes
2026-03-09 10:45:10 -04:00
filipi87
8ec160f71e Making the changelog more user friendly. 2026-03-09 11:37:11 -03:00
Filipi da Silva Fuchter
1e615cd095 Merge pull request #3962 from pipecat-ai/filipi/smallwebrtc_queue
Queuing the messages received before the data channel is ready
2026-03-09 10:29:05 -04:00
filipi87
ba87d1609c Only marking self._is_yielding_frames_synchronously if receiving TTSAudioRawFrame 2026-03-09 11:24:36 -03:00
Mark Backman
f7dc13c0de Update COMMUNITY_INTEGRATIONS.md for Settings alias class 2026-03-09 10:24:24 -04:00
filipi87
c5ce667387 Retrieving the context_id from the TTSStartedFrame 2026-03-09 11:10:42 -03:00
filipi87
097f9c0896 Fixed to push LLMAssistantPushAggregationFrame when the base TTSService class is responsible for pushing the TTSStoppedFrame. 2026-03-09 11:04:09 -03:00
Filipi da Silva Fuchter
16336a3ea4 Merge pull request #3937 from pipecat-ai/filipi/fix_orphan_function_call
Fix context summarization leaving orphaned tool responses in kept context.
2026-03-09 09:19:17 -04:00
Mark Backman
9eaa99c8e2 Merge pull request #3957 from pipecat-ai/mb/user-turn-completion-system-instruction
Move turn completion instructions to system_instruction
2026-03-09 09:17:06 -04:00
filipi87
4557ef8c42 Renaming method to _get_earliest_function_call_not_resolved_in_range 2026-03-09 10:16:02 -03:00
filipi87
aa693bb5ee Adding changelog entry for the SmallWebRTCConnection fix. 2026-03-09 10:11:40 -03:00
filipi87
74a06a6968 Adding extra comment. 2026-03-09 10:06:38 -03:00
filipi87
322e317a00 Adding guardrails in case the data channel is never established. 2026-03-09 10:04:33 -03:00
filipi87
25165d6e2b Queuing the messages received before the data channel is ready to send them. 2026-03-09 09:47:45 -03:00
Mark Backman
8a02e6fbc5 Merge pull request #3959 from ajmeraharsh/fix/livekit-call-state-updated-args
fix(livekit): remove redundant self arg in on_call_state_updated
2026-03-09 08:38:12 -04:00
Mark Backman
d85ba75dda Merge pull request #3953 from pipecat-ai/mb/deepgram-flux-on-the-fly
Add on-the-fly Configure support for Deepgram Flux STT
2026-03-09 08:36:00 -04:00
ajmeraharsh
ae6f159b18 chore: add changelog entry for #3959 2026-03-09 09:15:03 +04:00
Aleix Conchillo Flaqué
30d0cccef0 Merge pull request #3947 from pipecat-ai/aleix/summary-applied-event
Expose on_summary_applied event on LLMAssistantAggregator
2026-03-08 19:05:50 -07:00
Aleix Conchillo Flaqué
3b947b7844 Add changelog for #3947 2026-03-08 19:02:51 -07:00
Aleix Conchillo Flaqué
1f8cc3d216 Expose on_summary_applied event on LLMAssistantAggregator
Forward the on_summary_applied event from the internal summarizer to
the aggregator so users can listen for it without accessing private
members. Update summarization examples to use the new public event.
2026-03-08 19:02:51 -07:00
ajmeraharsh
57c4d72bf0 fix(livekit): remove redundant self arg in on_call_state_updated event
_on_call_state_updated passes (self, state) to _call_event_handler,
but _run_handler already prepends self when invoking the handler.
This causes handlers to receive 3 positional arguments instead of 2,
making the on_call_state_updated event unusable.

This aligns with how _on_first_participant_joined correctly passes
only the data arg without self.
2026-03-09 02:51:35 +04:00
Mark Backman
64155e8f06 Add changelog for #3957 2026-03-08 10:44:45 -04:00
Mark Backman
efda57de5c Move turn completion instructions to system_instruction
Turn completion instructions were being injected as a system message in
the LLM context, which caused warning spam when system_instruction was
also set, did not persist across full context updates, and broke LLMs
that do not support consecutive system messages.

Instead, compose the turn completion instructions into the LLM service
system_instruction field. This is managed via _base_system_instruction
which stores the original value for restoration when turn completion is
disabled.
2026-03-08 10:41:40 -04:00
Mark Backman
764c3c4f32 Merge pull request #3938 from koriyoshi2041/fix/replace-bare-except-handlers
fix: replace bare except handlers with specific exception types
2026-03-08 09:04:49 -04:00
Mark Backman
a32e0be120 Merge pull request #3956 from radhikagpt1208/fix/turn-completion-mixin-state-reset
Fix turn completion mixin not resetting state when no `InterruptionFrame` is emitted
2026-03-08 08:54:34 -04:00
radhikagpt1208
b14c8e0e94 Fix turn completion mixin not resetting state after each LLM response 2026-03-08 08:46:45 -04:00
kigland
57f0b6d75b fix: address review feedback on exception handling
- mcp_service.py: remove unnecessary try/except around debug log,
  use len(available_tools.tools) to match actual iteration target
- bedrock_adapter.py, aws/llm.py: add AttributeError to except tuple
  to handle None content (previously caught by bare except)
2026-03-08 12:28:03 +08:00
Mark Backman
edd568b002 Merge pull request #3954 from pipecat-ai/mb/revert-quickstart-changes
Revert changes to quickstart
2026-03-07 15:49:30 -05:00
Mark Backman
807759b874 Revert changes to quickstart 2026-03-07 15:44:26 -05:00
Mark Backman
cd28c82de3 Update examples to use the class Settings alias 2026-03-07 09:15:24 -05:00
Mark Backman
4ebdacdea2 Add changelog for #3953 2026-03-07 08:48:11 -05:00
Mark Backman
c5da3cf2bd Add on-the-fly Configure support for Deepgram Flux STT
Wire up the existing settings update infrastructure to send a Configure
WebSocket message when keyterm, eot_threshold, eager_eot_threshold, or
eot_timeout_ms change mid-stream, avoiding a full reconnect.
2026-03-07 08:37:27 -05:00
Mark Backman
26631a9c31 Add Settings class attribute alias to all service classes
Add a `Settings` class-level alias on every STT, LLM, TTS, image,
vision, and video service class pointing to its settings dataclass.
This lets developers discover the right settings class via the service
class itself (e.g. `GoogleSTTService.Settings(...)`) without needing
to know or import the separate settings class name.
2026-03-07 08:17:40 -05:00
Mark Backman
fdf9fb6f02 Merge pull request #3946 from pipecat-ai/mb/tts-settings-review
Review TTS settings
2026-03-07 07:48:26 -05:00
Mark Backman
1cfaea2007 Address code review feedback 2026-03-07 07:42:42 -05:00
kompfner
dc97ffc909 Merge pull request #3943 from pipecat-ai/pk/llm-settings-updates
Minor findings from auditing LLM settings
2026-03-06 22:39:22 -05:00
Paul Kompfner
622d9279cb Use exact service class names in LLMSettings docstrings 2026-03-06 22:35:55 -05:00
Paul Kompfner
6088e6eb52 Make budget_tokens optional in AnthropicThinkingConfig
budget_tokens is required when type is "enabled" and rejected when type is "disabled" (this is validated by the server)
2026-03-06 22:32:14 -05:00
Paul Kompfner
256c8f87b4 Add missing step 3 comment to LLM service init methods
Adds the explicit "no params object" step 3 comment to all
LLM services that skip from step 2 to step 4 in their
settings initialization sequence, matching the pattern
established in services that do have a params object.
2026-03-06 22:32:14 -05:00
Mark Backman
f630d79900 Merge pull request #3944 from pipecat-ai/mb/gemini-live-settings-examples-fixes 2026-03-06 22:14:32 -05:00
Mark Backman
ec93cd1d51 Fix settings update handling in additional STT services 2026-03-06 21:52:45 -05:00
Mark Backman
9c42d27f4d Support runtime language updates in Azure STT
Extract recognizer setup/teardown into _connect/_disconnect so
_update_settings can reconnect when language changes at runtime.
2026-03-06 21:07:03 -05:00
Mark Backman
536f1e178a Fix race condition in Deepgram STT disconnect causing error flood
Clear self._connection before sending close stream so run_stt stops
sending audio immediately during the WebSocket close handshake.
2026-03-06 21:01:31 -05:00
Mark Backman
750b87dc24 Fix AWS examples, update to sonnet 4.6 2026-03-06 20:53:22 -05:00
Mark Backman
671e9a6846 TTS service and example updates 2026-03-06 20:53:22 -05:00
Mark Backman
2c85d2056c Examples fixes for Gemini Live 2026-03-06 18:42:22 -05:00
Mark Backman
a97a086dbd Fix GeminiLiveLLMService init referencing undefined _params variable
Replace references to undefined `_params` with `self._settings` for
language and VAD config. Add missing `system_instruction` to default
settings to satisfy validate_complete(). Remove redundant line that
read language from the deprecated `params` arg.
2026-03-06 18:41:54 -05:00
Mark Backman
4ed3480e4b Update TTSSettings docstrings with the corresponding class name(s) 2026-03-06 16:40:38 -05:00
Mark Backman
d59c0ea6c1 Merge pull request #3941 from pipecat-ai/mb/stt-settings-updates
STT services: settings and examples fixes
2026-03-06 15:21:30 -05:00
Mark Backman
7d41049b35 Review feedback, clarify corresponding class in STTSettings docstrings 2026-03-06 15:17:02 -05:00
Mark Backman
6431ad8e2a Fix service settings init ordering and example bugs
- Speechmatics: move config build after super().__init__ and settings
  delta so turn_detection_mode (e.g. ADAPTIVE) takes effect
- Google STT: fix example passing bare Language enum instead of list
- Google TTS: add missing explicit defaults for all custom settings fields
- Soniox: fix accidental tuple wrapping of STT service in example
- Speechmatics examples: fix system->user role in kick-off messages
- Deepgram Flux: move tag from settings to __init__ (billing metadata)
- ElevenLabs STT: default tag_audio_events to None (use API default)
- Fal STT: simplify language default handling
- Google TTS: rename GoogleStreamTTSSettings to GoogleTTSSettings
2026-03-06 15:17:01 -05:00
Mark Backman
c3794956ef Add deprecation version, fix foundational example double system message 2026-03-06 15:16:58 -05:00
Mark Backman
940da9eeeb Add vad_threshold to AssemblyAISTTSettings
Wire vad_threshold through Settings, default_settings, the deprecated
connection_params path, and _build_ws_url query params.
2026-03-06 15:16:10 -05:00
Mark Backman
696e431e96 Broaden Service Settings docs to cover all AI service types
Use "AI service" language instead of listing specific types, add
ServiceSettings as a fallback for direct AIService subclasses, and
clarify delta mode description with a concrete frame example.
2026-03-06 15:16:10 -05:00
kompfner
1a1c5668de Merge pull request #3942 from pipecat-ai/pk/aws-nova-sonic-audio-config
Add AudioConfig class to AWSNovaSonicLLMService for non-deprecated au…
2026-03-06 14:58:22 -05:00
Filipi da Silva Fuchter
4b9fc8a30c Merge pull request #3804 from pipecat-ai/filipi/concurrent_audio_contexts
Allowing concurrent audio contexts
2026-03-06 14:49:57 -05:00
Paul Kompfner
9b7a86bb12 Add AudioConfig class to AWSNovaSonicLLMService for non-deprecated audio configuration
The audio fields (sample rates, sample sizes, channel counts) on the deprecated `Params` class had no non-deprecated equivalent. This adds an `AudioConfig` class and `audio_config` init arg so users can specify audio configuration without relying on the deprecated `params` parameter.
2026-03-06 14:39:53 -05:00
filipi87
3000037dec Changelog entries for the TTS improvements and fixes. 2026-03-06 16:16:25 -03:00
filipi87
07abd3d60f Fixed BotStoppedSpeakingFrame emission: now emitted as soon as TTSStoppedFrame is received, with a fallback silence-based timeout increased to reduce false positives 2026-03-06 16:16:11 -03:00
filipi87
88ff7c451b Refactored all 25+ TTS service implementations to use the new push_start_frame=True pattern 2026-03-06 16:15:59 -03:00
filipi87
24430d8d45 Fixing Piper test. 2026-03-06 16:15:26 -03:00
filipi87
921e9e1fc9 Refactoring TTS services to allow concurrent audio contexts. 2026-03-06 16:15:10 -03:00
filipi87
c243850cf1 Removing observer from the inworld example. 2026-03-06 16:14:23 -03:00
kompfner
817f88e90b Merge pull request #3940 from pipecat-ai/pk/grok-realtime-settings-pattern
Adopt the `settings` pattern for Grok Realtime session properties
2026-03-06 14:09:25 -05:00
Aleix Conchillo Flaqué
e65ceb4edc Merge pull request #3931 from pipecat-ai/aleix/examples-always-use-user-role
Update foundational examples to use system_instruction
2026-03-06 10:41:33 -08:00
Aleix Conchillo Flaqué
593b75bc8b Update foundational examples to use "user" role
Use system_instruction on LLM service constructors instead of adding
system messages to LLMContext. Messages added to context now use
"user" role.
2026-03-06 09:53:33 -08:00
Paul Kompfner
f4c039048c Adopt the settings pattern for Grok Realtime session properties
Move `session_properties` into `GrokRealtimeLLMSettings`, making `settings` the canonical way to configure Grok Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=GrokRealtimeLLMSettings(session_properties=...)`.

`system_instruction` is synced bidirectionally between the top-level settings field and `session_properties.instructions`, with top-level taking precedence on conflict. (Unlike OpenAI Realtime, Grok's `SessionProperties` has no `model` field, so no model sync is needed.)
2026-03-06 12:53:26 -05:00
kompfner
d84a250b62 Merge pull request #3939 from pipecat-ai/pk/openai-realtime-settings-pattern
Adopt the `settings` pattern for OpenAI Realtime session properties
2026-03-06 12:39:08 -05:00
Paul Kompfner
2b8a6d9ca4 In OpenAI/Azure Realtime examples, migrate to settings=OpenAIRealtimeLLMSettings(...) pattern
Move `session_properties` and `system_instruction` into the `settings` arg, matching the canonical pattern used across the codebase.
2026-03-06 12:00:41 -05:00
mattie ruth backman
18494658c3 rename models_vX to models.py and models_deprecated.py 2026-03-06 11:49:59 -05:00
mattie ruth backman
da0975a4e0 Fix forward reference 2026-03-06 11:49:59 -05:00
mattie ruth backman
49fba5209c copilot feedback 2026-03-06 11:49:59 -05:00
mattie ruth backman
158424aa28 Convert RTVI framework into a structured package
Replace the monolithic rtvi.py with a proper package split by concern
protocol version:
  - models_v0.py: deprecated pre-1.0 Pydantic models
  - models_v1.py: current RTVI protocol v1 message models
  - frames.py: RTVI pipeline frame dataclasses
  - observer.py: RTVIObserver and RTVIObserverParams
  - processor.py: RTVIProcessor (now lean, imports from submodules)
  - __init__.py: re-exports full public API for backward compatability
2026-03-06 11:49:59 -05:00
Paul Kompfner
bd4229ea9d Adopt the settings pattern for OpenAI Realtime session properties
Move `session_properties` into `OpenAIRealtimeLLMSettings`, making `settings` the canonical way to configure OpenAI Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=OpenAIRealtimeLLMSettings(session_properties=...)`.

`model` and `system_instruction` are synced bidirectionally between the top-level settings fields and `session_properties.model`/`.instructions`, with top-level taking precedence on conflict.
2026-03-06 11:46:21 -05:00
kigland
848f35f5df fix: replace bare except handlers with specific exception types 2026-03-06 23:05:02 +08:00
kompfner
ac80b787bf Merge pull request #3877 from pipecat-ai/pk/service-init-cleanup
Add `settings` as canonical init arg for all AIService descendants, d…
2026-03-06 10:01:50 -05:00
Paul Kompfner
5b270fec8e In AWS Nova Sonic examples, migrate to newer pattern of passing in settings with voice and system_instruction, in favor of passing in voice_id as a direct init arg and the system instruction as the first message in the context 2026-03-06 09:57:57 -05:00
Paul Kompfner
a1641f3762 Add system_instruction to realtime service settings
Add `system_instruction=None` to `default_settings` for OpenAIRealtimeLLMService, GrokRealtimeLLMService, UltravoxRealtimeLLMService, AWSNovaSonicLLMService (Azure inherits from OpenAI), and OpenAIRealtimeBetaLLMService (Azure Beta inherits from OpenAI Beta).

Deprecate `system_instruction` init arg in AWSNovaSonicLLMService in favor of `settings=AWSNovaSonicLLMSettings(system_instruction=...)`. Use `self._settings.system_instruction` directly instead of storing a separate `self._system_instruction`.

Deprecation of `params` and `session_properties` in favor of `settings` for realtime services will be tackled in future work.
2026-03-06 09:57:34 -05:00
Paul Kompfner
78deaa735d Move system_instruction into LLMSettings
Add `system_instruction` field to `LLMSettings` so it is runtime-updatable via settings.
For Google (GoogleLLMService, GoogleVertexLLMService), deprecate the init-time arg since it was already shipped. For Anthropic, AWS Bedrock, and OpenAI, remove the init-time arg entirely since it was never shipped.

Still need to handle realtime services (OpenAI Realtime, Grok Realtime, Gemini Live).
2026-03-06 09:57:08 -05:00
filipi87
524b87f087 Adding changelog entry for the summarization fix. 2026-03-06 11:45:20 -03:00
filipi87
4ef3b52c72 Fix context summarization leaving orphaned tool responses in kept context. 2026-03-06 11:40:27 -03:00
Mark Backman
ee2895a783 Update COMMUNITY_INTEGRATIONS.md with full Service Settings guidance
Broaden the "Dynamic Settings Updates" section into "Service Settings"
covering the complete settings pattern: defining a Settings subclass,
wiring it into __init__ with defaults + apply_update, and distinguishing
init-only config from runtime-updatable fields.
2026-03-06 08:44:15 -05:00
Mark Backman
ab37185208 Update run_eval_pipeline with the latest settings, system_instruction patterns 2026-03-06 08:32:59 -05:00
Mark Backman
8a203dd98f Update more examples, misc services 2026-03-06 08:30:00 -05:00
Mark Backman
62554a2390 Update examples 2026-03-06 08:30:00 -05:00
Mark Backman
14c3a88f02 Fix tests 2026-03-06 08:29:14 -05:00
Mark Backman
939d753c2b Update LLMs 2026-03-06 08:29:14 -05:00
Mark Backman
a4375274b2 Add Settings subclasses to all services and auto-discovered init tests
- Add dedicated Settings subclasses to 20 LLM services that were
  borrowing parent Settings classes (e.g. AzureLLMSettings,
  GroqLLMSettings) so users don't need cross-module imports
- Fix field defaults to NOT_GIVEN in BaseWhisperSTTSettings,
  OpenAIRealtimeSTTSettings, and NvidiaSegmentedSTTSettings for
  delta-mode safety
- Fix incomplete default_settings in AWS, Cartesia, ElevenLabs,
  Fish, and Whisper services so validate_complete() passes
- Add auto-discovered tests that verify all Settings classes default
  to NOT_GIVEN (delta safety) and all services initialize with
  complete settings (store completeness)
2026-03-06 08:29:14 -05:00
Mark Backman
034e81ff18 Update STT service settings 2026-03-06 08:29:14 -05:00
Mark Backman
3cb792a801 Update TTS service settings 2026-03-06 08:29:14 -05:00
Mark Backman
1274bb2c55 Update deprecation version to 0.0.105 2026-03-06 08:29:14 -05:00
Mark Backman
f31bfcf4ec Clean up CartesiaTTSSettings: separate init-only vs runtime-updatable fields
Move output_container, output_encoding, output_sample_rate out of
CartesiaTTSSettings into plain instance attributes since they cannot
change at runtime without breaking the audio pipeline. Remove deprecated
speed/emotion fields and their dead references in _build_msg() and
run_tts(). Remove the from_mapping override that only existed to
destructure those now-removed output format fields.
2026-03-06 08:29:14 -05:00
Mark Backman
07f1d0cd96 Change _warn_deprecated_param to accept type references instead of strings
Update all ~192 call sites across 84 service files to pass class references
(e.g. `CartesiaTTSSettings`) instead of string names (`"CartesiaTTSSettings"`)
to `_warn_deprecated_param()`. This enables better IDE refactoring support.

Also fix `from_mapping` return type annotations in 5 settings subclasses to
use `typing.Self` instead of forward reference strings.
2026-03-06 08:29:14 -05:00
Mark Backman
bc2843e30a Fix deprecation version 2026-03-06 08:29:14 -05:00
Paul Kompfner
5dc312ce0c Add settings as canonical init arg for all AIService descendants, deprecate redundant model/voice/params args
ServiceSettings types were introduced for runtime updates via ServiceUpdateSettingsFrame, but there was tension between init-time and runtime APIs: overlapping-but-different InputParams vs ServiceSettings classes, and runtime-updatable fields like `model` and `voice` scattered as direct init args rather than living in a settings object. This unifies them so developers use the same settings type at both init and runtime, improving ergonomics and consistency.

Every concrete AIService subclass (LLM, TTS, STT, ImageGen, Vision, Video) now accepts a `settings` parameter for runtime-updatable config. Old init args (`model`, `voice_id`, `params`/`InputParams`) still work but emit DeprecationWarnings pointing to the new API. When both are provided, `settings` takes precedence. Leaf classes emit warnings; base classes do not, avoiding double warnings in inheritance chains.
2026-03-06 08:29:14 -05:00
Aleix Conchillo Flaqué
3199168d3e scripts(evals): use context.add_message() 2026-03-05 19:14:06 -08:00
Aleix Conchillo Flaqué
ea8f5f2e22 Merge pull request #3933 from pipecat-ai/aleix/misc-fixes
Fix Daily transport log level and eval script import
2026-03-05 18:48:14 -08:00
Aleix Conchillo Flaqué
1221e2dd76 Fix Daily transport log level and eval script import
Change participant_updated log from debug to trace (too noisy).
Fix deepgram LiveOptions import in eval script.
2026-03-05 16:37:02 -08:00
Aleix Conchillo Flaqué
5b598265c4 update uv.lock 2026-03-05 16:28:55 -08:00
Mark Backman
79131dd6c6 Merge pull request #3930 from dakshdua/main
Add `push_empty_transcripts` param to `BaseWhisperSTTService` to push received empty transcripts downstream
2026-03-05 19:25:15 -05:00
Aleix Conchillo Flaqué
5b808872d1 Merge pull request #3932 from pipecat-ai/aleix/system-instruction-conflict-warning
Warn when both system_instruction and context system message are set
2026-03-05 16:24:06 -08:00
Aleix Conchillo Flaqué
fda4cb6732 Add changelog for #3932 2026-03-05 16:16:41 -08:00
Daksh Dua
789ce2fd5e Add param to push empty transcripts 2026-03-05 16:16:24 -08:00
Aleix Conchillo Flaqué
f4b8245241 Warn when both system_instruction and context system message are set
system_instruction from the constructor always takes precedence. A
warning is now logged when the context also contains a system message
so users can spot the conflict.
2026-03-05 16:16:17 -08:00
Mark Backman
ca27e12c84 Merge pull request #3926 from pipecat-ai/mb/update-deps-2026-03-05
Update dependency version ranges for flexibility
2026-03-05 18:09:04 -05:00
Mark Backman
671ef5b6cc Merge pull request #3928 from zkleb-aai/simplify-assemblyai-examples
Update AssemblyAI turn detection example to use keyterms_prompt
2026-03-05 16:11:08 -05:00
zack
380726cfd3 Update AssemblyAI turn detection example to use keyterms_prompt
Change the commented example from prompt string format to keyterms_prompt
list format for better clarity and consistency with API best practices.
2026-03-05 15:47:54 -05:00
Mark Backman
f4dfeb0f8b Merge pull request #3927 from zkleb-aai/add-assemblyai-vad-threshold
feat(assemblyai): add vad_threshold parameter for U3 Pro
2026-03-05 15:36:23 -05:00
zack
11024ccc2c Add changelog entries for vad_threshold and parameter cleanup 2026-03-05 15:32:09 -05:00
zack
acfb07f859 feat(assemblyai): add vad_threshold parameter for U3 Pro
Add vad_threshold parameter to AssemblyAIConnectionParams to support
voice activity detection threshold configuration for the u3-rt-pro model.

This parameter allows users to align AssemblyAI's VAD threshold with
their external VAD systems (e.g., Silero VAD) to avoid the "dead zone"
where AssemblyAI transcribes speech that the external VAD hasn't
detected yet, which can delay interruption handling.

- Range: 0.0 to 1.0 (lower = more sensitive)
- Default: 0.3 (API default when not sent)
- Only applicable to u3-rt-pro model
- Automatically included in WebSocket query parameters

Recommended usage: Set vad_threshold to match your VAD's activation
threshold (e.g., both at 0.3) for optimal performance.
2026-03-05 15:27:13 -05:00
Mark Backman
06e49d597b Update dev dependencies 2026-03-05 15:23:07 -05:00
Mark Backman
60e9e26164 revert onnxruntime to onnxruntime~=1.23.2 to maintain Python 3.10 support 2026-03-05 15:13:28 -05:00
Mark Backman
3f97c91983 Update optional dependency version ranges and remove SDK dependencies
Widen version ranges for stable packages (anthropic, azure, deepgram,
groq, livekit, nvidia-riva-client, fastapi, ormsgpack, opentelemetry,
faster-whisper) and add upper bounds to previously uncapped packages
(hume, pyjwt, livekit-api, camb).

Replace CartesiaHttpTTSService's internal use of the Cartesia SDK with
direct aiohttp calls, accepting an optional aiohttp_session parameter.

Replace fal-client SDK calls in FalSTTService and FalImageGenService
with direct HTTP to bypass the SDK's aggressive retry/backoff logic
that caused significant latency regressions.
2026-03-05 15:06:54 -05:00
Mark Backman
05fa727c22 Update core dependency version ranges for flexibility
Widen version ranges for stable packages (aiofiles, docstring_parser,
onnxruntime) while adding upper bounds to previously uncapped packages
(transformers, numba, wait_for2). Bump soxr to 1.0.0 and pyloudnorm
to 0.2.0. Move silero extra to empty since onnxruntime is now a core dep.
2026-03-05 13:13:55 -05:00
Aleix Conchillo Flaqué
06be260e54 Merge pull request #3919 from pipecat-ai/aleix/daily-transport-event-logging
Add logging to Daily transport event handlers
2026-03-05 08:35:28 -08:00
Mark Backman
691d1d309e Merge pull request #3920 from pipecat-ai/mb/remove-hathora 2026-03-05 07:00:52 -05:00
Mark Backman
eeb8ed8588 Remove Hathora service integration
Hathora is shutting down on March 5, 2026. Remove the STT/TTS services,
examples, and related references.
2026-03-04 22:10:06 -05:00
Aleix Conchillo Flaqué
96062972db Add logging to Daily transport event handlers
Add appropriate log levels to dial-in/dial-out, participant, transcription,
and recording event handlers. Move transcription error log from client
callback to transport handler to keep logging consistent at the transport
level.
2026-03-04 13:30:43 -08:00
Mark Backman
68d7e98f95 Add defensive comment for given_fields() usage in tracing 2026-03-02 16:33:25 -05:00
598 changed files with 31662 additions and 14503 deletions

View File

@@ -32,6 +32,20 @@ Create changelog files for the important commits in this PR. The PR number is pr
6. Use ⚠️ emoji prefix for breaking changes.
7. **Write changes in user-facing terms first.** Lead with what users of the framework will notice: new APIs, changed behavior, new parameters, fixed bugs they might have hit, etc. Implementation details (internal refactoring, how something is wired up under the hood) can be included as secondary context after the user-facing description, but should never be the *only* content of a changelog entry when there is a user-visible effect.
**Good** (user-facing first, implementation detail as context):
```
- Turn completion instructions now persist correctly across full context updates when using `system_instruction`. Previously they were injected as a context system message, which caused warning spam and didn't survive context updates.
```
**Bad** (implementation detail only, no user-facing framing):
```
- Fixed turn completion instructions being injected as a context system message instead of using `system_instruction`.
```
Ask yourself: "If I'm a developer building on Pipecat, what would I notice changed?" Start there.
## Example
For PR #3519 with a new feature and a bug fix:
@@ -43,5 +57,5 @@ For PR #3519 with a new feature and a bug fix:
`changelog/3519.fixed.md`:
```
- Fixed an issue where something was not working correctly.
- Fixed an issue where something was not working correctly in some user-visible scenario. The root cause was an internal implementation detail.
```

View File

@@ -7,6 +7,581 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [0.0.107] - 2026-03-23
### Added
- Added `frame_order` parameter to `SyncParallelPipeline`. Set
`frame_order=FrameOrder.PIPELINE` to push synchronized output frames in
pipeline definition order (all frames from the first pipeline, then the
second, etc.) instead of the default arrival order.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Added `sync_with_audio` field to `OutputImageRawFrame`. When set to `True`,
the output transport queues image frames with audio so they are displayed
only after all preceding audio has been sent, enabling synchronized
audio/image playback.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Added `OpenAIResponsesLLMService`, a new LLM service that uses the OpenAI
Responses API. Supports streaming text, function calling, usage metrics, and
out-of-band inference. Works with the universal `LLMContext` and
`LLMContextAggregatorPair`. See
`examples/foundational/07-interruptible-openai-responses.py` and
`14-function-calling-openai-responses.py`.
(PR [#4074](https://github.com/pipecat-ai/pipecat/pull/4074))
- Added `audio_out_auto_silence` parameter to `TransportParams` (defaults to
`True`). When set to `False`, the transport waits for audio data instead of
inserting silence when the output queue is empty, which is useful for
scenarios that require uninterrupted audio playback without artificial gaps.
(PR [#4104](https://github.com/pipecat-ai/pipecat/pull/4104))
### Changed
- Renamed tracing span attributes to align with OpenTelemetry GenAI semantic
conventions: `gen_ai.system` to `gen_ai.provider.name`, `system` to
`gen_ai.system_instructions`, `gen_ai.usage.cache_read_input_tokens` to
`gen_ai.usage.cache_read.input_tokens`, and
`gen_ai.usage.cache_creation_input_tokens` to
`gen_ai.usage.cache_creation.input_tokens`.
(PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))
- `DeepgramSageMakerTTSService` now correctly routes audio through the base
`TTSService` audio context queue. Audio frames are delivered via
`append_to_audio_context()` instead of being pushed directly, enabling proper
ordering, interruption handling, and start/stop frame lifecycle management.
Interruptions now trigger a `Clear` message to Deepgram (flushing its text
buffer) at the right time via `on_audio_context_interrupted`.
(PR [#4083](https://github.com/pipecat-ai/pipecat/pull/4083))
- `GradiumTTSService` now sends a per-context `setup` message with
`client_req_id` before the first text message for each TTS context, following
Gradium's multiplexing protocol. Previously, a single setup message was sent
at connection time without a `client_req_id`, which prevented Gradium from
associating requests with their sessions when using `close_ws_on_eos=False`.
(PR [#4091](https://github.com/pipecat-ai/pipecat/pull/4091))
### Fixed
- Fixed stale `system_instruction` in LLM tracing spans by reading from
`_settings.system_instruction` instead of the removed `_system_instruction`
attribute.
(PR [#3449](https://github.com/pipecat-ai/pipecat/pull/3449))
- Fixed `SyncParallelPipeline` breaking the Whisker debugger.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Fixed `SyncParallelPipeline` race condition where concurrent SystemFrame
processing (e.g. from RTVI) could corrupt sink queues and cause deadlocks.
SystemFrames now take a fast path that passes them through without draining
queued output.
(PR [#4029](https://github.com/pipecat-ai/pipecat/pull/4029))
- Fixed TTS frame ordering so that non-system frames always arrive in correct
order relative to the `TTSStartedFrame`/`TTSAudioRawFrame`/`TTSStoppedFrame`
sequence. Previously these frames could race ahead of or behind audio context
frames, producing out-of-order output downstream.
(PR [#4075](https://github.com/pipecat-ai/pipecat/pull/4075))
- Fixed `SarvamTTSService` audio and error frames now route through
`append_to_audio_context()` instead of `push_frame()`, ensuring correct
behavior with audio contexts and interruptions.
(PR [#4082](https://github.com/pipecat-ai/pipecat/pull/4082))
- Fixed audio frame ordering and interruption handling in Fish Audio, LMNT,
Neuphonic, and Rime NonJson TTS services. These services were bypassing the
base `TTSService` audio context serialization queue by pushing audio frames
directly, which could cause out-of-order frames and broken interruptions
during speech.
(PR [#4090](https://github.com/pipecat-ai/pipecat/pull/4090))
- Fixed Genesys AudioHook serializer to always include the `parameters` field in
protocol messages. The AudioHook protocol requires every message to carry a
`parameters` object (even if empty), but `_create_message` omitted it when no
parameters were provided. This caused clients that validate message structure
(including the Genesys reference implementation) to reject `pong` and
parameter-less `closed` responses, breaking server sequence tracking and
preventing `outputVariables` from reaching the Architect flow.
(PR [#4093](https://github.com/pipecat-ai/pipecat/pull/4093))
## [0.0.106] - 2026-03-18
### Added
- Added optional `service` field to `ServiceUpdateSettingsFrame` (and its
subclasses `LLMUpdateSettingsFrame`, `TTSUpdateSettingsFrame`,
`STTUpdateSettingsFrame`) to target a specific service instance. When
`service` is set, only the matching service applies the settings; others
forward the frame unchanged. This enables updating a single service when
multiple services of the same type exist in the pipeline.
(PR [#4004](https://github.com/pipecat-ai/pipecat/pull/4004))
- Added `sip_provider` and `room_geo` parameters to `configure()` in the Daily
runner. These convenience parameters let callers specify a SIP provider name
and geographic region directly without manually constructing
`DailyRoomProperties` and `DailyRoomSipParams`.
(PR [#4005](https://github.com/pipecat-ai/pipecat/pull/4005))
- Added `PerplexityLLMAdapter` that automatically transforms conversation
messages to satisfy Perplexity's stricter API constraints (strict role
alternation, no non-initial system messages, last message must be user/tool).
Previously, certain conversation histories could cause Perplexity API errors
that didn't occur with OpenAI (`PerplexityLLMService` subclasses
`OpenAILLMService` since Perplexity uses an OpenAI-compatible API).
(PR [#4009](https://github.com/pipecat-ai/pipecat/pull/4009))
- Added DTMF input event support to the Daily transport. Incoming DTMF tones
are now received via Daily's `on_dtmf_event` callback and pushed into the
pipeline as `InputDTMFFrame`, enabling bots to react to keypad presses from
phone callers.
(PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))
- Added `WakePhraseUserTurnStartStrategy` for triggering user turns based on
wake phrases, with support for `single_activation` mode. Deprecates
`WakeCheckFilter`.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
- Added `default_user_turn_start_strategies()` and
`default_user_turn_stop_strategies()` helper functions for composing custom
strategy lists.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
### Changed
- Changed tool result JSON serialization to use `ensure_ascii=False`,
preserving UTF-8 characters instead of escaping them. This reduces context
size and token usage for non-English languages.
(PR [#3457](https://github.com/pipecat-ai/pipecat/pull/3457))
- `OpenAIRealtimeSTTService`'s `noise_reduction` parameter is now part of
`OpenAIRealtimeSTTSettings`, making it runtime-updatable via
`STTUpdateSettingsFrame`. The direct `noise_reduction` init argument is
deprecated as of 0.0.106.
(PR [#3991](https://github.com/pipecat-ai/pipecat/pull/3991))
- Updated `sarvamai` dependency from `0.1.26a2` (alpha) to `0.1.26` (stable
release).
(PR [#3997](https://github.com/pipecat-ai/pipecat/pull/3997))
- `SimliVideoService` now extends `AIService` instead of `FrameProcessor`,
aligning it with the HeyGen and Tavus video services. It supports
`SimliVideoService.Settings(...)` for configuration and uses
`start()`/`stop()`/`cancel()` lifecycle methods. Existing constructor usage
(`api_key`, `face_id`, etc.) remains unchanged.
(PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))
- Update `pipecat-ai-small-webrtc-prebuilt` to `2.4.0`.
(PR [#4023](https://github.com/pipecat-ai/pipecat/pull/4023))
- Nova Sonic assistant text transcripts are now delivered in real-time using
speculative text events instead of delayed final text events. Previously,
assistant text only arrived after all audio had finished playing, causing
laggy transcripts in client UIs. Speculative text arrives before each audio
chunk, providing text synchronized with what the bot is saying. This also
simplifies the internal text handling by removing the interruption re-push
hack and assistant text buffer.
(PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))
- Updated `daily-python` dependency to 0.25.0.
(PR [#4047](https://github.com/pipecat-ai/pipecat/pull/4047))
- Added `enable_dialout` parameter to `configure()` in `pipecat.runner.daily`
to support dial-out rooms. Also narrowed misleading `Optional` type hints and
deduplicated token expiry calculation.
(PR [#4048](https://github.com/pipecat-ai/pipecat/pull/4048))
- Extended `ProcessFrameResult` to stop strategies, allowing a stop strategy to
short-circuit evaluation of subsequent strategies by returning `STOP`.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
- `GradiumSTTService` now takes both an `encoding` and `sample_rate`
constructor argument which is assmebled in the class to form the
`input_format`. PCM accepts `8000`, `16000`, and `24000` Hz sample rates.
(PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))
- Improved `GradiumSTTService` transcription accuracy by reworking how text
fragments are accumulated and finalized. Previously, trailing words could be
dropped when the server's `flushed` response arrived before all text tokens
were delivered. The service now uses a short aggregation delay after flush to
capture trailing tokens, producing complete utterances.
(PR [#4066](https://github.com/pipecat-ai/pipecat/pull/4066))
### Deprecated
- `SimliVideoService.InputParams` is deprecated. Use the direct constructor
parameters `max_session_length`, `max_idle_time`, and `enable_logging`
instead.
(PR [#4001](https://github.com/pipecat-ai/pipecat/pull/4001))
- Deprecated `LocalSmartTurnAnalyzerV2` and `LocalCoreMLSmartTurnAnalyzer`. Use
`LocalSmartTurnAnalyzerV3` instead. Instantiating these analyzers will now
emit a `DeprecationWarning`.
(PR [#4012](https://github.com/pipecat-ai/pipecat/pull/4012))
- Deprecated `WakeCheckFilter` in favor of `WakePhraseUserTurnStartStrategy`.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
### Fixed
- Fixed an issue where the default model for `OpenAILLMService` and
`AzureLLMService` was mistakenly reverted to `gpt-4o`. The defaults are now
restored to `gpt-4.1`.
(PR [#4000](https://github.com/pipecat-ai/pipecat/pull/4000))
- Fixed a race condition where `EndTaskFrame` could cause the pipeline to shut
down before in-flight frames (e.g. LLM function call responses) finished
processing. `EndTaskFrame` and `StopTaskFrame` now flow through the pipeline
as `ControlFrame`s, ensuring all pending work is flushed before shutdown
begins. `CancelTaskFrame` and `InterruptionTaskFrame` remain immediate
(`SystemFrame`).
(PR [#4006](https://github.com/pipecat-ai/pipecat/pull/4006))
- Fixed `ParallelPipeline` dropping or misordering frames during lifecycle
synchronization. Buffered frames are now flushed in the correct order
relative to synchronization frames (`StartFrame` goes first,
`EndFrame`/`CancelFrame` go after), and frames added to the buffer during
flush are also drained.
(PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))
- Fixed `TTSService` potentially canceling in-flight audio during shutdown. The
stop sequence now waits for all queued audio contexts to finish processing
before canceling the stop frame task.
(PR [#4007](https://github.com/pipecat-ai/pipecat/pull/4007))
- Fixed `Language` enum values (e.g. `Language.ES`) not being converted to
service-specific codes when passed via
`settings=Service.Settings(language=Language.ES)` at init time. This caused
API errors (e.g. 400 from Rime) because the raw enum was sent instead of the
expected language code (e.g. `"spa"`). Runtime updates via
`UpdateSettingsFrame` were unaffected. The fix centralizes conversion in the
base `TTSService` and `STTService` classes so all services handle this
consistently.
(PR [#4024](https://github.com/pipecat-ai/pipecat/pull/4024))
- Fixed `DeepgramSTTService` ignoring the `base_url` scheme when using `ws://`
or `http://`. Previously these were silently overwritten with `wss://` /
`https://`, breaking air-gapped or private deployments that don't use TLS.
All scheme choices (`wss://`, `https://`, `ws://`, `http://`, or bare
hostname) are now respected.
(PR [#4026](https://github.com/pipecat-ai/pipecat/pull/4026))
- Fixed `LLMSwitcher.register_function()` and `register_direct_function()` not
accepting or forwarding the `timeout_secs` parameter.
(PR [#4037](https://github.com/pipecat-ai/pipecat/pull/4037))
- Fixed empty user transcriptions in Nova Sonic causing spurious interruptions.
Previously, an empty transcription could trigger an interruption of the
assistant's response even though the user hadn't actually spoken.
(PR [#4042](https://github.com/pipecat-ai/pipecat/pull/4042))
- Fixed `SonioxSTTService` and `OpenAIRealtimeSTTService` crash when language
parameters contain plain strings instead of `Language` enum values.
(PR [#4046](https://github.com/pipecat-ai/pipecat/pull/4046))
- Fixed premature user turn stops caused by late transcriptions arriving
between turns. A stale transcript from the previous turn could persist into
the next turn and trigger a stop before the current turn's real transcript
arrived. Stop strategies are now reset at both turn start and turn stop to
prevent state from leaking across turn boundaries.
(PR [#4057](https://github.com/pipecat-ai/pipecat/pull/4057))
- Fixed raw language strings like `"de-DE"` silently failing when passed to
TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go
through the same `Language` enum resolution as enum values, so regional codes
like `"de-DE"` are properly converted to service-expected formats like
`"de"`. Unrecognized strings log a warning instead of failing silently.
(PR [#4058](https://github.com/pipecat-ai/pipecat/pull/4058))
- Fixed Deepgram STT list-type settings (`keyterm`, `keywords`, `search`,
`redact`, `replace`) being stringified instead of passed as lists to the SDK,
which caused them to be sent as literal strings (e.g. `"['pipecat']"`) in the
WebSocket query params.
(PR [#4063](https://github.com/pipecat-ai/pipecat/pull/4063))
- Fixed `MinWordsUserTurnStartStrategy` including text below the word threshold
in the output by resetting aggregation when the minimum word count is not
met.
(PR [#4064](https://github.com/pipecat-ai/pipecat/pull/4064))
- Fixed audio overlap and potential dropped TTS content when multiple assistant
turns occur in quick succession. `TTSService` now flushes remaining text
before pausing frame processing on `LLMFullResponseEndFrame`/`EndFrame`,
instead of pausing first.
(PR [#4071](https://github.com/pipecat-ai/pipecat/pull/4071))
### Security
- Bumped PyJWT minimum version from 2.10.1 to 2.12.0 in the `livekit` extra to
address CVE-2026-32597 (GHSA-752w-5fwx-jx9f), where PyJWT <= 2.11.0 accepted
unknown `crit` header extensions.
(PR [#4035](https://github.com/pipecat-ai/pipecat/pull/4035))
## [0.0.105] - 2026-03-10
### Added
- Added concurrent audio context support: `CartesiaTTSService` can now
synthesize the next sentence while the previous one is still playing, by
setting `pause_frame_processing=False` and routing each sentence through its
own audio context queue.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
- Added custom video track support to Daily transport. Use
`video_out_destinations` in `DailyParams` to publish multiple video tracks
simultaneously, mirroring the existing `audio_out_destinations` feature.
(PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
- Added `ServiceSwitcherStrategyFailover` that automatically switches to the
next service when the active service reports a non-fatal error. Recovery
policies can be implemented via the `on_service_switched` event handler.
(PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
- Added optional `timeout_secs` parameter to `register_function()` and
`register_direct_function()` for per-tool function call timeout control,
overriding the global `function_call_timeout_secs` default.
(PR [#3915](https://github.com/pipecat-ai/pipecat/pull/3915))
- Added `cloud-audio-only` recording option to Daily transport's
`enable_recording` property.
(PR [#3916](https://github.com/pipecat-ai/pipecat/pull/3916))
- Wired up `system_instruction` in `BaseOpenAILLMService`,
`AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default
system prompt, matching the behavior of the Google services. This enables
sharing a single `LLMContext` across multiple LLM services, where each
service provides its own system instruction independently.
```python
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful assistant.",
)
context = LLMContext()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
```
(PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
- Added `vad_threshold` parameter to `AssemblyAIConnectionParams` for
configuring voice activity detection sensitivity in U3 Pro. Aligning this
with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone"
where AssemblyAI transcribes speech that VAD hasn't detected yet.
(PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
- Added `push_empty_transcripts` parameter to `BaseWhisperSTTService` and
`OpenAISTTService` to allow empty transcripts to be pushed downstream as
`TranscriptionFrame` instead of discarding them (the default behavior). This
is intended for situations where VAD fires even though the user did not
speak. In these cases, it is useful to know that nothing was transcribed so
that the agent can resume speaking, instead of waiting longer for a
transcription.
(PR [#3930](https://github.com/pipecat-ai/pipecat/pull/3930))
- LLM services (`BaseOpenAILLMService`, `AnthropicLLMService`,
`AWSBedrockLLMService`) now log a warning when both `system_instruction` and
a system message in the context are set. The constructor's
`system_instruction` takes precedence.
(PR [#3932](https://github.com/pipecat-ai/pipecat/pull/3932))
- Runtime settings updates (via `STTUpdateSettingsFrame`) now work for AWS
Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and
Soniox STT services. Previously, changing settings at runtime only stored the
new values without reconnecting.
(PR [#3946](https://github.com/pipecat-ai/pipecat/pull/3946))
- Exposed `on_summary_applied` event on `LLMAssistantAggregator`, allowing
users to listen for context summarization events without accessing private
members.
(PR [#3947](https://github.com/pipecat-ai/pipecat/pull/3947))
- Deepgram Flux STT settings (`keyterm`, `eot_threshold`,
`eager_eot_threshold`, `eot_timeout_ms`) can now be updated mid-stream via
`STTUpdateSettingsFrame` without triggering a reconnect. The new values are
sent to Deepgram as a Configure WebSocket message on the existing connection.
(PR [#3953](https://github.com/pipecat-ai/pipecat/pull/3953))
- Added `system_instruction` parameter to `run_inference` across all LLM
services, allowing callers to override the system prompt for one-shot
inference calls. Used by `_generate_summary` to pass the summarization prompt
cleanly.
(PR [#3968](https://github.com/pipecat-ai/pipecat/pull/3968))
### Changed
- Audio context management (previously in `AudioContextTTSService`) is now
built into `TTSService`. All WebSocket providers (`cartesia`, `elevenlabs`,
`asyncai`, `inworld`, `rime`, `gradium`, `resembleai`) now inherit from
`WebsocketTTSService` directly. Word-timestamp baseline is set automatically
on the first audio chunk of each context instead of requiring each provider
to call `start_word_timestamps()` in their receive loop.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
- Daily transport now uses `CustomVideoSource`/`CustomVideoTrack` instead of
`VirtualCameraDevice` for the default camera output, mirroring how audio
already works with `CustomAudioSource`/`CustomAudioTrack`.
(PR [#3831](https://github.com/pipecat-ai/pipecat/pull/3831))
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions`
class was removed from the SDK and is now provided by pipecat directly;
import it from `pipecat.services.deepgram.stt` instead of `deepgram`.
(PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
- `ServiceSwitcherStrategy` base class now provides a `handle_error()` hook for
subclasses to implement error-based switching. `ServiceSwitcher` defaults to
`ServiceSwitcherStrategyManual` and `strategy_type` is now optional.
(PR [#3861](https://github.com/pipecat-ai/pipecat/pull/3861))
- Support for Voice Focus 2.0 models.
- Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
- Cleaned unused `ParameterFixedError` exception handling in `AICFilter`
parameter setup.
(PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
- `max_context_tokens` and `max_unsummarized_messages` in
`LLMAutoContextSummarizationConfig` (and deprecated
`LLMContextSummarizationConfig`) can now be set to `None` independently to
disable that summarization threshold. At least one must remain set.
(PR [#3914](https://github.com/pipecat-ai/pipecat/pull/3914))
- ⚠️ Removed `formatted_finals` and `word_finalization_max_wait_time` from
`AssemblyAIConnectionParams` as these were v2 API parameters not supported in
v3. Clarified that `format_turns` only applies to Universal-Streaming models;
U3 Pro has automatic formatting built-in.
(PR [#3927](https://github.com/pipecat-ai/pipecat/pull/3927))
- Changed `DeepgramTTSService` to send a Clear message on interruption instead
of disconnecting and reconnecting the WebSocket, allowing the connection to
persist throughout the session.
(PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
- Re-added `enhancement_level` support to `AICFilter` with runtime
`FilterEnableFrame` control, applying `ProcessorParameter.Bypass` and
`ProcessorParameter.EnhancementLevel` together.
(PR [#3961](https://github.com/pipecat-ai/pipecat/pull/3961))
- Updated `daily-python` dependency from `~=0.23.0` to `~=0.24.0`.
(PR [#3970](https://github.com/pipecat-ai/pipecat/pull/3970))
- Updated `FishAudioTTSService` default model from `s1` to `s2-pro`, matching
Fish Audio's latest recommended model for improved quality and speed.
(PR [#3973](https://github.com/pipecat-ai/pipecat/pull/3973))
- `AzureSTTService` `region` parameter is now optional when `private_endpoint`
is provided. A `ValueError` is raised if neither is given, and a warning is
logged if both are provided (`private_endpoint` takes priority).
(PR [#3974](https://github.com/pipecat-ai/pipecat/pull/3974))
### Deprecated
- Deprecated `AudioContextTTSService` and `AudioContextWordTTSService`.
Subclass `WebsocketTTSService` directly instead; audio context management is
now part of the base `TTSService`.
- Deprecated `WordTTSService`, `WebsocketWordTTSService`, and
`InterruptibleWordTTSService`. Word timestamp logic is now always active in
`TTSService` and no longer needs to be opted into via a subclass.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
- Deprecated `pipecat.services.google.llm_vertex`,
`pipecat.services.google.llm_openai`, and
`pipecat.services.google.gemini_live.llm_vertex` modules. Use
`pipecat.services.google.vertex.llm`, `pipecat.services.google.openai.llm`,
and `pipecat.services.google.gemini_live.vertex.llm` instead. The old import
paths still work but will emit a `DeprecationWarning`.
(PR [#3980](https://github.com/pipecat-ai/pipecat/pull/3980))
### Removed
- ⚠️ Removed `supports_word_timestamps` parameter from `TTSService.__init__()`.
Word timestamp logic is now always active. Remove this argument from any
custom subclass `super().__init__()` calls.
(PR [#3804](https://github.com/pipecat-ai/pipecat/pull/3804))
### Fixed
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The
deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit
`KeepAlive` messages every 5 seconds, within the recommended 35 second
interval before Deepgram's 10-second inactivity timeout.
(PR [#3848](https://github.com/pipecat-ai/pipecat/pull/3848))
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in
`AICFilter` caused by holding a `memoryview` on the mutable audio buffer
across async yield points.
(PR [#3889](https://github.com/pipecat-ai/pipecat/pull/3889))
- Fixed TTS context not being appended to the assistant message history when
using `TTSSpeakFrame` with `append_to_context=True` with some TTS providers.
(PR [#3936](https://github.com/pipecat-ai/pipecat/pull/3936))
- Fixed context summarization leaving orphaned tool responses in the kept
context when tool calls were moved to the summarized portion.
(PR [#3937](https://github.com/pipecat-ai/pipecat/pull/3937))
- Fixed turn completion state not resetting at end of LLM responses.
`LLMFullResponseEndFrame` is pushed (not received) by the LLM service, so the
mixin now handles it in `push_frame` instead of `process_frame`.
(PR [#3956](https://github.com/pipecat-ai/pipecat/pull/3956))
- Fixed turn completion instructions being injected as a context system message
instead of using `system_instruction`. This caused warning spam when
`system_instruction` was also set and didn't persist across full context
updates.
(PR [#3957](https://github.com/pipecat-ai/pipecat/pull/3957))
- Fixed `TTSService` audio context queue getting blocked when
`append_to_audio_context()` was called with a `None` context ID, which
prevented subsequent audio from being delivered.
(PR [#3958](https://github.com/pipecat-ai/pipecat/pull/3958))
- Fixed `on_call_state_updated` event handler in LiveKit transport receiving
incorrect number of arguments due to redundant `self` passed to
`_call_event_handler`.
(PR [#3959](https://github.com/pipecat-ai/pipecat/pull/3959))
- Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services
treating `conversation_already_has_active_response` as a fatal error. These
services now log it as a non-fatal debug event when a response is already in
progress.
(PR [#3960](https://github.com/pipecat-ai/pipecat/pull/3960))
- Fixed `SmallWebRTCConnection` silently discarding messages sent before the
data channel is open by queuing them and flushing once the channel is ready.
A bounded queue (`MAX_MESSAGE_QUEUE_SIZE = 50`) prevents unbounded memory
growth, and a 10-second timeout after connection clears the queue and falls
back to discard mode if the data channel never opens.
(PR [#3962](https://github.com/pipecat-ai/pipecat/pull/3962))
- Fixed `AzureSTTService` failing to initialize when `private_endpoint` is
provided. The Azure Speech SDK's `SpeechConfig` does not accept both `region`
and `endpoint` simultaneously, so they are now passed conditionally.
(PR [#3967](https://github.com/pipecat-ai/pipecat/pull/3967))
- Fixed `GoogleLLMService` ignoring the `system_instruction` set via
constructor or `GoogleLLMSettings` when a system message was also present in
the context. The settings value now correctly takes priority, and a warning
is logged when both are set.
(PR [#3976](https://github.com/pipecat-ai/pipecat/pull/3976))
### Other
- Updated foundational examples to use `system_instruction` on LLM services
instead of adding system messages to `LLMContext`.
(PR [#3918](https://github.com/pipecat-ai/pipecat/pull/3918))
- Updated AssemblyAI turn detection example to use `keyterms_prompt` list
format instead of `prompt` string for improved clarity.
(PR [#3929](https://github.com/pipecat-ai/pipecat/pull/3929))
- Updated foundational examples and eval scripts to use `"user"` role instead
of `"system"` when adding messages to `LLMContext`, since system prompts
should be set via `system_instruction` on the LLM service.
(PR [#3931](https://github.com/pipecat-ai/pipecat/pull/3931))
## [0.0.104] - 2026-03-02
### Added

View File

@@ -65,12 +65,25 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Websocket-based Services
**Base class:** `WebsocketSTTService`
**Use for:** Services where you manage the websocket connection directly. Combines `STTService` with `WebsocketService` for automatic reconnection and keepalive support.
**Examples:**
- [CartesiaSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/stt.py)
- [ElevenLabsRealtimeSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/stt.py)
#### SDK-based Streaming Services
**Base class:** `STTService`
**Use for:** Streaming services where the provider's Python SDK manages the connection internally.
**Examples:**
- [DeepgramSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/deepgram/stt.py)
- [SpeechmaticsSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/speechmatics/stt.py)
- [GoogleSTTService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/stt.py)
#### File-based Services
@@ -108,55 +121,59 @@ Once your PR is submitted, post in the `#community-integrations` Discord channel
#### Key requirements:
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` - Signals the start of an LLM response
- `LLMTextFrame` - Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` - Signals the end of an LLM response
- **`_process_context(self, context: LLMContext)`** — The main method that processes an LLM context and generates a response. Each LLM service overrides `process_frame` to extract context from `LLMContextFrame` and calls `_process_context`.
- **Context aggregation:** Implement context aggregation to collect user and assistant content:
- Aggregators come in pairs with a `user()` instance and `assistant()` instance
- Context must adhere to the `LLMContext` universal format
- Aggregators should handle adding messages, function calls, and images to the context
- **`adapter_class`** — Class attribute pointing to a `BaseLLMAdapter` subclass. Defaults to `OpenAILLMAdapter`. Non-OpenAI services must implement their own adapter (see `src/pipecat/adapters/base_llm_adapter.py`) with methods:
- `get_llm_invocation_params(context)` — Extract provider-specific params from universal context
- `to_provider_tools_format(tools_schema)` — Convert standard tools to provider format
- `get_messages_for_logging(context)` — Format messages for logging
- Reference adapters: `src/pipecat/adapters/services/` (anthropic, gemini, bedrock, etc.)
- **Frame sequence:** Output must follow this frame sequence pattern:
- `LLMFullResponseStartFrame` — Signals the start of an LLM response
- `LLMTextFrame` — Contains LLM content, typically streamed as tokens
- `LLMFullResponseEndFrame` — Signals the end of an LLM response
- **Thought frames (reasoning models):** If the model supports extended thinking / chain-of-thought, emit thought frames alongside the response:
- `LLMThoughtStartFrame` — Signals the start of a thought
- `LLMThoughtTextFrame` — Contains thought content, streamed as tokens
- `LLMThoughtEndFrame` — Signals the end of a thought
- **Context aggregation** is handled by the framework via `LLMContext` + `LLMContextAggregatorPair`. The LLM service just processes context it receives — no need to implement aggregators.
### TTS (Text-to-Speech) Services
#### AudioContextWordTTSService
#### WebsocketTTSService
**Use for:** Websocket-based services supporting word/timestamp alignment
**Use for:** Websocket-based streaming services (with or without word timestamps)
**Example:**
**Examples:**
- [CartesiaTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/cartesia/tts.py)
- [ElevenLabsTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### InterruptibleTTSService
**Use for:** Websocket-based services without word/timestamp alignment, requiring disconnection on interruption
**Use for:** Websocket-based services without word timestamps that reconnect on interruption (e.g. don't support a context ID or interruption message)
**Example:**
- [SarvamTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/sarvam/tts.py)
#### WordTTSService
**Use for:** HTTP-based services supporting word/timestamp alignment
**Example:**
- [ElevenLabsHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/elevenlabs/tts.py)
#### TTSService
**Use for:** HTTP-based services without word/timestamp alignment
**Use for:** HTTP-based services (word timestamps are supported in the base class)
**Example:**
**Examples:**
- [GoogleHttpTTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/google/tts.py)
- [OpenAITTSService](https://github.com/pipecat-ai/pipecat/blob/main/src/pipecat/services/openai/tts.py)
#### Key requirements:
- For websocket services, use asyncio WebSocket implementation (required for v13+ support)
- For websocket services, use asyncio WebSocket implementation
- Handle idle service timeouts with keepalives
- TTSServices push both audio (`TTSRawAudioFrame`) and text (`TTSTextFrame`) frames
- TTS services push both audio (`TTSAudioRawFrame`) and text (`TTSTextFrame`) frames
### Telephony Serializers
@@ -200,9 +217,9 @@ Vision services process images and provide analysis such as descriptions, object
#### Key requirements:
- Must implement `run_vision` method that takes an `LLMContext` and returns an `AsyncGenerator[Frame, None]`
- The method processes the latest image in the context and yields frames with analysis results
- Typically yields `TextFrame` objects containing descriptions or answers
- Must implement `run_vision` method that takes a `UserImageRawFrame` and returns an `AsyncGenerator[Frame, None]`
- The method processes the image frame and yields frames with analysis results
- Must yield the frame sequence: `VisionFullResponseStartFrame`, `VisionTextFrame`, `VisionFullResponseEndFrame`
## Implementation Guidelines
@@ -231,49 +248,105 @@ def can_generate_metrics(self) -> bool:
return True
```
### Dynamic Settings Updates
### Service Settings
STT, LLM, and TTS services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
Every AI service (STT, LLM, TTS, image generation, etc.) exposes a **Settings dataclass** that serves two roles:
Each service declares a settings dataclass that extends the appropriate base (`STTSettings`, `TTSSettings`, `LLMSettings`). Fields default to `NOT_GIVEN` so that update objects can represent sparse deltas:
1. **Store mode** — the service's `self._settings` holds the current value of every runtime-updatable field.
2. **Delta mode** — an update frame (e.g. `TTSUpdateSettingsFrame`) specifies only the fields that should change; unspecified fields remain `NOT_GIVEN`.
#### Defining your Settings class
Extend `STTSettings`, `TTSSettings`, `LLMSettings`, or `ImageGenSettings` (or, if your service directly subclasses `AIService`, `ServiceSettings`). The base classes already provide common fields (e.g. `model`, `voice`, `language`). You only need to add **service-specific knobs that should be runtime-updatable**:
```python
from dataclasses import dataclass, field
from pipecat.services.settings import STTSettings, NOT_GIVEN
from pipecat.services.settings import TTSSettings, NOT_GIVEN
@dataclass
class MySTTSettings(STTSettings):
"""Settings for my STT service.
class MyTTSSettings(TTSSettings):
"""Settings for MyTTS service.
Parameters:
region: Cloud region for the service.
speaking_rate: Speed multiplier (0.52.0).
"""
region: str = field(default_factory=lambda: NOT_GIVEN)
speaking_rate: float | None = field(default_factory=lambda: NOT_GIVEN)
```
The service stores its current settings in `self._settings` and declares the type with a class-level annotation for editor support:
**What goes in Settings vs. `__init__` params:**
| Belongs in Settings | Stays as `__init__` params |
| -------------------------------------------------------- | ----------------------------------------- |
| Model name, voice, language | API keys, auth tokens |
| Service-specific tuning knobs (rate, pitch, temperature) | Base URLs, endpoint overrides |
| Anything users may want to change mid-session | Audio encoding, sample format |
| | Connection parameters (timeouts, retries) |
The rule of thumb: if a caller might send an update frame to change it at runtime, it belongs in Settings. Everything else is init-only config stored as `self._xxx`.
#### Wiring settings into `__init__`
Accept an **optional** `settings` parameter. Build a `default_settings` object with all fields set to real values, then merge any caller overrides with `apply_update`.
Add a `Settings` **class attribute** that points to your settings dataclass. This lets callers access the settings class through the service itself (e.g. `MyTTSService.Settings(...)`) without a separate import:
```python
class MySTTService(STTService):
_settings: MySTTSettings
from typing import Optional
def __init__(self, *, model: str, language: str, region: str, **kwargs):
# An initial value should be provided for every settings field.
# This will be validated at service start.
# (If you track sample_rate, it can be a placeholder value like 0; see
# "Sample Rate Handling").
super().__init__(
settings=MySTTSettings(model=model, language=language, region=region), **kwargs
class MyTTSService(TTSService):
Settings = MyTTSSettings
_settings: Settings
def __init__(
self,
*,
api_key: str,
settings: Optional[Settings] = None,
**kwargs,
):
# 1. Defaults — every field has a real value (store mode).
default_settings = self.Settings(
model="my-model-v1",
voice="default-voice",
language="en",
speaking_rate=1.0,
)
# 2. Merge caller overrides (only given fields win).
if settings is not None:
default_settings.apply_update(settings)
# 3. Pass the fully-populated settings to the base class.
super().__init__(settings=default_settings, **kwargs)
# 4. Init-only config stored separately.
self._api_key = api_key
```
This pattern lets callers override only what they care about:
```python
# Uses all defaults
svc = MyTTSService(api_key="sk-xxx")
# Overrides just the voice — access Settings through the service class
svc = MyTTSService(
api_key="sk-xxx",
settings=MyTTSService.Settings(voice="custom-voice"),
)
```
#### Reacting to runtime changes
AI services support runtime configuration changes via `*UpdateSettingsFrame`s (e.g. `STTUpdateSettingsFrame`, `TTSUpdateSettingsFrame`, `LLMUpdateSettingsFrame`).
To react to runtime setting changes, override `_update_settings`. The base implementation applies the delta to `self._settings` and returns a `dict` mapping each changed field name to its **pre-update** value. Your override should call `super()` first, then act on the changed fields. A common implementation might look like:
```python
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the recognizer if needed."""
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
"""Apply a settings update, reconfiguring the connection if needed."""
changed = await super()._update_settings(update)
if not changed:
@@ -292,7 +365,7 @@ Note that, in this example, the service requires a reconnect to apply the new la
If your service can't yet apply certain settings at runtime, call `self._warn_unhandled_updated_settings(changed)` with any unhandled field names so users get a clear log message:
```python
async def _update_settings(self, update: STTSettings) -> dict[str, Any]:
async def _update_settings(self, update: TTSSettings) -> dict[str, Any]:
changed = await super()._update_settings(update)
if not changed:
@@ -325,7 +398,7 @@ Note that `self.sample_rate` is a `@property` set in the TTSService base class,
Use Pipecat's tracing decorators:
- **STT:** `@traced_stt` - decorate a function that handles `transcript`, `is_final`, `language` as args
- **STT:** `@traced_stt` - decorate `_handle_transcription(self, transcript, is_final, language)` (the standard method name convention)
- **LLM:** `@traced_llm` - decorate the `_process_context()` method
- **TTS:** `@traced_tts` - decorate the `run_tts()` method
@@ -347,17 +420,15 @@ For REST-based communication, use aiohttp. Pipecat includes this as a required d
- Wrap API calls in appropriate try/catch blocks
- Handle rate limits and network failures gracefully
- Provide meaningful error messages
- When errors occur, raise exceptions AND push `ErrorFrame`s to notify the pipeline:
- When errors occur, raise exceptions AND push errors to notify the pipeline:
```python
from pipecat.frames.frames import ErrorFrame
try:
# Your API call
result = await self._make_api_call()
except Exception as e:
# Push error frame to pipeline
await self.push_error(ErrorFrame(error=f"{self} error: {e}"))
# Push error upstream to notify the pipeline
await self.push_error(f"{self} error: {e}", exception=e)
# Raise or handle as appropriate
raise
```

View File

@@ -65,6 +65,10 @@ claude plugin marketplace add pipecat-ai/skills
and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.youtube.com/playlist?list=PLzU2zoMTQIHjqC3v4q2XVSR3hGSzwKFwH) channel.
@@ -81,19 +85,20 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
## 🧩 Available services
| Category | Services |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Category | Services |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/video/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp](https://docs.pipecat.ai/server/utilities/audio/krisp-filter), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)

View File

@@ -1 +0,0 @@
- ⚠️ Updated `DeepgramSTTService` to use `deepgram-sdk` v6. The `LiveOptions` class was removed from the SDK and is now provided by pipecat directly; import it from `pipecat.services.deepgram.stt` instead of `deepgram`.

View File

@@ -1 +0,0 @@
- Fixed `DeepgramSTTService` keepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit `KeepAlive` messages every 5 seconds, within the recommended 35 second interval before Deepgram's 10-second inactivity timeout.

View File

@@ -1,3 +0,0 @@
- Support for Voice Focus 2.0 models.
- Updated `aic-sdk` to `~=2.1.0` to support Voice Focus 2.0 models.
- Cleaned unused `ParameterFixedError` exception handling in `AICFilter` parameter setup.

View File

@@ -1 +0,0 @@
- Fixed `BufferError: Existing exports of data: object cannot be re-sized` in `AICFilter` caused by holding a `memoryview` on the mutable audio buffer across async yield points.

View File

@@ -1 +0,0 @@
- `max_context_tokens` and `max_unsummarized_messages` in `LLMAutoContextSummarizationConfig` (and deprecated `LLMContextSummarizationConfig`) can now be set to `None` independently to disable that summarization threshold. At least one must remain set.

View File

@@ -1 +0,0 @@
- Added optional `timeout_secs` parameter to `register_function()` and `register_direct_function()` for per-tool function call timeout control, overriding the global `function_call_timeout_secs` default.

View File

@@ -1 +0,0 @@
- Added `cloud-audio-only` recording option to Daily transport's `enable_recording` property.

View File

@@ -1,15 +0,0 @@
- Wired up `system_instruction` in `BaseOpenAILLMService`, `AnthropicLLMService`, and `AWSBedrockLLMService` so it works as a default system prompt, matching the behavior of the Google services. This enables sharing a single `LLMContext` across multiple LLM services, where each service provides its own system instruction independently.
```python
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful assistant.",
)
context = LLMContext()
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
```

View File

@@ -1 +0,0 @@
- Updated foundational examples to use `system_instruction` on LLM services instead of adding system messages to `LLMContext`.

1
changelog/3978.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `SarvamLLMService` with support for `sarvam-30b`, `sarvam-30b-16k`, `sarvam-105b` and `sarvam-105b-32k`

1
changelog/4013.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created.

View File

@@ -0,0 +1 @@
- Added context prewarming path for `InworldTTSService` to improve first audio latency

View File

@@ -0,0 +1 @@
- Added `KrispVivaVadAnalyzer` for Voice Activity Detection using the Krisp VIVA SDK (requires `krisp_audio`).

View File

@@ -0,0 +1 @@
- Modeified `InworldTTSService` to close context at end of turn instead of relying on idle timeout

1
changelog/4031.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `XAIHttpTTSService` for text-to-speech using xAI's HTTP TTS API.

View File

@@ -0,0 +1 @@
- Added Gemini 3 support to the Gemini Live service.

View File

@@ -0,0 +1 @@
- `TTSService`: the default `stop_frame_timeout_s` (idle time before an automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has changed from `2.0` to `3.0` seconds.

1
changelog/4089.added.md Normal file
View File

@@ -0,0 +1 @@
- Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use `system_instruction` to set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion).

View File

@@ -0,0 +1 @@
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction.

View File

@@ -0,0 +1 @@
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring `settings.system_instruction`. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored.

1
changelog/4089.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior.

1
changelog/4092.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings.

View File

@@ -0,0 +1 @@
- Fixed `InworldTtsService` to fallback to full text when TTS timestamps are not received

1
changelog/4115.added.md Normal file
View File

@@ -0,0 +1 @@
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD settings.

1
changelog/4117.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `domain` parameter to `AssemblyAISTTSettings` for specialized recognition modes such as Medical Mode (`domain="medical-v1"`).

1
changelog/4119.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `NovitaLLMService` for using Novita AI's LLM models via their OpenAI-compatible API.

1
changelog/4120.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer resources are properly released when no longer needed. Custom `VADAnalyzer` subclasses can override `cleanup()` to free any held resources.

1
changelog/4125.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was deferred while waiting for the bot to finish responding and `turn_complete` never arrived. As a possible root-cause fix, `turn_complete` messages are now handled even if they lack `usage_metadata`. As a fallback, the deferred `EndFrame` now has a 30-second safety timeout.

1
changelog/4126.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit.

1
changelog/4127.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The `LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`, causing the `LLMAssistantAggregator` to finalize the context before the final sentence arrived.

1
changelog/4128.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires after the final transcript is pushed, providing a reliable hook for end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both Pipecat and AssemblyAI turn detection modes.

View File

@@ -0,0 +1 @@
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefer `system_instruction` from service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set.

1
changelog/4135.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed audio crackling and popping in recordings when both user and bot are speaking. `AudioBufferProcessor` no longer injects silence into a track's buffer while that track is actively producing audio, preventing mid-utterance interruptions in the recorded output.

View File

@@ -0,0 +1 @@
- Bumped `nvidia-riva-client` minimum version to `>=2.25.1`.

View File

@@ -0,0 +1 @@
- Upgraded `protobuf` from 5.x to 6.x (`>=6.31.1,<7`).

View File

@@ -0,0 +1 @@
- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a warning at startup. The log message has been downgraded to debug level since these are valid service-specific values that are passed through correctly.

View File

@@ -0,0 +1 @@
- `GrokLLMService` and `GrokRealtimeLLMService` now live in the `pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three use the same xAI API. Update imports from `pipecat.services.grok.*` to `pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import GrokLLMService`).

View File

@@ -0,0 +1 @@
- `pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and `pipecat.services.grok.realtime.events` are deprecated. The old import paths still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`, `pipecat.services.xai.realtime.llm`, and `pipecat.services.xai.realtime.events` instead.

1
changelog/4143.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux speech-to-text on AWS SageMaker endpoints. Use with `ExternalUserTurnStrategies` to take advantage of Flux's turn detection.

View File

@@ -0,0 +1 @@
- Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale words or backward PTS values into later turns.

1
changelog/4145.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had been invoked but `BotStartedSpeakingFrame` had not yet been received, a user interruption could allow stale audio to leak through.

View File

@@ -0,0 +1 @@
- ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and `"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that called `await self.add_word_timestamps([("Reset", 0)])` or `await self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`, replace them with `await self.append_to_audio_context(ctx_id, TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage the word-timestamp reset automatically.

1
changelog/4146.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with external VAD) not working. The bot now correctly detects user speech and signals turn boundaries to the Gemini API.

1
changelog/4147.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Gemini Live message handling to process all `server_content` fields independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and `output_transcription`) on the same message, but the previous `elif` chain only processed the first match, silently dropping the rest.

1
changelog/4149.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS) propagated upstream through the switcher. Previously, any non-fatal error passing through would be misattributed to the active service and trigger an unwanted service switch. Now only errors originating from the switcher's own managed services trigger failover.

1
changelog/4151.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal buffer on interruption, causing the bot to continue speaking for several seconds after being interrupted.

1
changelog/4152.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a crash in OpenAI LLM processing when the provider returns `chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no attribute 'get'` errors during audio transcript handling.

1
changelog/4153.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed error floods in `DeepgramSTTService` when the WebSocket connection drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead connection instead of silently failing, causing every queued audio frame to log an error. Now `send_media()` failures are caught gracefully — a single warning is logged and audio frames are skipped until the existing reconnection logic restores the connection.

View File

@@ -0,0 +1 @@
- Removed `SambaNovaSTTService`. SambaNova no longer offers speech-to-text audio models. Use another STT provider instead.

1
changelog/4156.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `Mem0MemoryService.get_memories()` convenience method for retrieving all stored memories outside the pipeline (e.g. to build a personalized greeting at connection time). This avoids the need to manually handle client type branching, filter construction, and async wrapping.

View File

@@ -0,0 +1 @@
- ⚠️ Bumped `mem0ai` dependency from `~=0.1.94` to `>=1.0.8,<2`. Users of the `mem0` extra will need to update their mem0ai package.

View File

@@ -0,0 +1 @@
- Fixed `Mem0MemoryService` failing to store messages when the context contained system or developer role messages. The Mem0 API only accepts user and assistant roles, so other roles are now filtered out before storing.

1
changelog/4156.fixed.md Normal file
View File

@@ -0,0 +1 @@
- `Mem0MemoryService` no longer blocks the event loop during memory storage and retrieval. All Mem0 API calls now run in a background thread, and message storage is fire-and-forget so it doesn't delay downstream processing.

1
changelog/4161.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()` `DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup time.

View File

@@ -0,0 +1 @@
- Fixed an issue in `InworldTTSService` where, in cases of fast interruption, we would continue receiving audio from the previous context.

1
changelog/4167.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a word timestamp interleaving issue in `InworldTTSService` when processing multiple sentences.

1
changelog/4172.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using `push_stop_frames=True`. When the stop-frame timeout fired, a second `TTSStoppedFrame` could be pushed after the normal one at context completion.

View File

@@ -0,0 +1 @@
- `RimeTTSService` now handles Rime's `done` WebSocket message to complete audio contexts immediately, eliminating the 3-second idle timeout that previously added latency at the end of each utterance.

1
changelog/4174.fixed.md Normal file
View File

@@ -0,0 +1 @@
- ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK now requires explicit message objects for `send_keep_alive()`, `send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk version is now 6.1.0.

1
changelog/4176.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed RTVI events not being delivered to clients when using WebSocket transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False` by default.

View File

@@ -80,15 +80,9 @@ GOOGLE_TEST_CREDENTIALS=...
# Gradium
GRAPDIUM_API_KEY=...
# Grok
GROK_API_KEY=...
# Groq
GROQ_API_KEY=...
# Hathora
HATHORA_API_KEY=...
# Heygen
HEYGEN_API_KEY=...
HEYGEN_LIVE_AVATAR_API_KEY=...
@@ -130,6 +124,9 @@ MISTRAL_API_KEY=...
# Neuphonic
NEUPHONIC_API_KEY=...
# Novita
NOVITA_API_KEY=...
# NVIDIA
NVIDIA_API_KEY=...
@@ -179,6 +176,9 @@ SENTRY_DSN=...
SIMLI_API_KEY=...
SIMLI_FACE_ID=...
# Smallest
SMALLEST_API_KEY=...
# Smart turn
LOCAL_SMART_TURN_MODEL_PATH=...
FAL_SMART_TURN_API_KEY=...
@@ -212,3 +212,6 @@ WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...
# xAI / Grok
XAI_API_KEY=...

View File

@@ -39,7 +39,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
tts = PiperHttpTTSService(
base_url=os.getenv("PIPER_BASE_URL"), aiohttp_session=session, sample_rate=24000
base_url=os.getenv("PIPER_BASE_URL"),
aiohttp_session=session,
sample_rate=24000,
)
task = PipelineTask(

View File

@@ -39,8 +39,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
tts = RimeHttpTTSService(
api_key=os.getenv("RIME_API_KEY", ""),
voice_id="rex",
aiohttp_session=session,
settings=RimeHttpTTSService.Settings(
voice="rex",
),
)
task = PipelineTask(

View File

@@ -37,7 +37,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
task = PipelineTask(

View File

@@ -29,7 +29,9 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
pipeline = Pipeline([tts, transport.output()])

View File

@@ -37,7 +37,9 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
runner = PipelineRunner()

View File

@@ -39,12 +39,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are an LLM in a WebRTC session, and this is a 'hello world' demo.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
task = PipelineTask(
@@ -56,7 +60,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
context = LLMContext()
context.add_message({"role": "system", "content": "Say hello to the world."})
context.add_message({"role": "developer", "content": "Say hello to the world."})
await task.queue_frames([LLMContextFrame(context), EndFrame()])
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)

View File

@@ -45,7 +45,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Create an HTTP session
async with aiohttp.ClientSession() as session:
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -37,7 +37,9 @@ async def main():
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)

View File

@@ -67,12 +67,16 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -105,7 +109,9 @@ async def run_example(webrtc_connection: SmallWebRTCConnection):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -50,13 +50,16 @@ async def main():
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o",
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -89,7 +92,7 @@ async def main():
await transport.capture_participant_transcription(participant["id"])
# Kick off the conversation.
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -57,12 +57,16 @@ async def main():
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
context = LLMContext()

View File

@@ -16,11 +16,12 @@ from pipecat.frames.frames import (
Frame,
LLMContextFrame,
LLMFullResponseStartFrame,
OutputImageRawFrame,
TextFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.sync_parallel_pipeline import FrameOrder, SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
@@ -30,6 +31,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -44,6 +46,18 @@ class MonthFrame(DataFrame):
return f"{self.name}(month: {self.month})"
class MarkImageForPlaybackSync(FrameProcessor):
"""Marks output image frames to be synchronized with audio playback."""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, OutputImageRawFrame):
frame.sync_with_audio = True
await self.push_frame(frame, direction)
class MonthPrepender(FrameProcessor):
def __init__(self):
super().__init__()
@@ -98,11 +112,19 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
# No need to aggregate by sentences (the default), as we already know we're getting full sentences
# (Otherwise the service will unnecessarily wait for follow-up input to confirm the sentence is complete,
# which, sadly, actually breaks the synchronization mechanism)
text_aggregation_mode=TextAggregationMode.TOKEN,
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
settings=FalImageGenService.Settings(
image_size="square_hd",
),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
@@ -115,17 +137,26 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# that, each pipeline runs concurrently and `SyncParallelPipeline` will
# wait for the input frame to be processed.
#
# We use `FrameOrder.PIPELINE` so that each synchronized batch of output
# frames is pushed in the order the pipelines are listed: image first,
# then audio. This ensures the transport receives the image before the
# audio frames it should accompany.
#
# Note that `SyncParallelPipeline` requires the last processor in each
# of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# `FalImageGenService` and `CartesiaHttpTTSService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
SyncParallelPipeline( # Run pipelines in parallel aggregating the result
[
imagegen, # Generate image
MarkImageForPlaybackSync(), # Mark image as needing sync w/audio during playback
],
[month_prepender, tts], # Create "Month: sentence" and output audio
[imagegen], # Generate image
frame_order=FrameOrder.PIPELINE,
),
transport.output(), # Transport output
]
@@ -148,7 +179,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]:
messages = [
{
"role": "system",
"role": "user",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]

View File

@@ -1,198 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import asyncio
import os
import sys
import tkinter as tk
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.frames.frames import (
Frame,
LLMContextFrame,
OutputAudioRawFrame,
TextFrame,
TTSAudioRawFrame,
URLImageRawFrame,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.sync_parallel_pipeline import SyncParallelPipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.sentence import SentenceAggregator
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.cartesia.tts import CartesiaHttpTTSService
from pipecat.services.fal.image import FalImageGenService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.local.tk import TkLocalTransport, TkTransportParams
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main():
async with aiohttp.ClientSession() as session:
tk_root = tk.Tk()
tk_root.title("Calendar")
runner = PipelineRunner()
async def get_month_data(month):
messages = [
{
"role": "system",
"content": f"Describe a nature photograph suitable for use in a calendar, for the month of {month}. Include only the image description with no preamble. Limit the description to one sentence, please.",
}
]
class ImageDescription(FrameProcessor):
def __init__(self):
super().__init__()
self.text = ""
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TextFrame):
self.text = frame.text
await self.push_frame(frame, direction)
class AudioGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.audio = bytearray()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, TTSAudioRawFrame):
self.audio.extend(frame.audio)
self.frame = OutputAudioRawFrame(
bytes(self.audio), frame.sample_rate, frame.num_channels
)
await self.push_frame(frame, direction)
class ImageGrabber(FrameProcessor):
def __init__(self):
super().__init__()
self.frame = None
async def process_frame(self, frame: Frame, direction: FrameDirection):
await super().process_frame(frame, direction)
if isinstance(frame, URLImageRawFrame):
self.frame = frame
await self.push_frame(frame, direction)
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
imagegen = FalImageGenService(
params=FalImageGenService.InputParams(image_size="square_hd"),
aiohttp_session=session,
key=os.getenv("FAL_KEY"),
)
sentence_aggregator = SentenceAggregator()
description = ImageDescription()
audio_grabber = AudioGrabber()
image_grabber = ImageGrabber()
# With `SyncParallelPipeline` we synchronize audio and images by
# pushing them basically in order (e.g. I1 A1 A1 A1 I2 A2 A2 A2 A2
# I3 A3). To do that, each pipeline runs concurrently and
# `SyncParallelPipeline` will wait for the input frame to be
# processed.
#
# Note that `SyncParallelPipeline` requires the last processor in
# each of the pipelines to be synchronous. In this case, we use
# `CartesiaHttpTTSService` and `FalImageGenService` which make HTTP
# requests and wait for the response.
pipeline = Pipeline(
[
llm, # LLM
sentence_aggregator, # Aggregates LLM output into full sentences
description, # Store sentence
SyncParallelPipeline(
[tts, audio_grabber], # Generate and store audio for the given sentence
[imagegen, image_grabber], # Generate and storeimage for the given sentence
),
]
)
task = PipelineTask(pipeline)
await task.queue_frame(LLMContextFrame(LLMContext(messages)))
await task.stop_when_done()
await runner.run(task)
return {
"month": month,
"text": description.text,
"image": image_grabber.frame,
"audio": audio_grabber.frame,
}
transport = TkLocalTransport(
tk_root,
TkTransportParams(
audio_out_enabled=True,
video_out_enabled=True,
video_out_width=1024,
video_out_height=1024,
),
)
pipeline = Pipeline([transport.output()])
task = PipelineTask(pipeline)
# We only specify a few months as we create tasks all at once and we
# might get rate limited otherwise.
months: list[str] = [
"January",
"February",
]
# We create one task per month. This will be executed concurrently.
month_tasks = [asyncio.create_task(get_month_data(month)) for month in months]
# Now we wait for each month task in the order they're completed. The
# benefit is we'll have as little delay as possible before the first
# month, and likely no delay between months, but the months won't
# display in order.
async def show_images(month_tasks):
for month_data_task in asyncio.as_completed(month_tasks):
data = await month_data_task
await task.queue_frames([data["image"], data["audio"]])
await runner.stop_when_done()
async def run_tk():
while not task.has_finished():
tk_root.update()
tk_root.update_idletasks()
await asyncio.sleep(0.1)
await asyncio.gather(runner.run(task), show_images(month_tasks), run_tk())
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -83,12 +83,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
ml = MetricsLogger()
@@ -125,7 +129,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -100,12 +100,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()

View File

@@ -6,6 +6,7 @@
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
@@ -52,60 +53,68 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
async with aiohttp.ClientSession() as session:
stt = CartesiaSTTService(api_key=os.getenv("CARTESIA_API_KEY"))
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
)
tts = CartesiaHttpTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
aiohttp_session=session,
settings=CartesiaHttpTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(task)
async def bot(runner_args: RunnerArguments):

View File

@@ -1,5 +1,5 @@
#
# Copyright (c) 20242026, Daily
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
@@ -21,9 +21,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.hathora.stt import HathoraSTTService
from pipecat.services.hathora.tts import HathoraTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -51,24 +51,24 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = HathoraSTTService(
model="nvidia-parakeet-tdt-0.6b-v3",
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
tts = HathoraTTSService(
model="hexgrad-kokoro-82m",
)
# See https://models.hathora.dev/model/qwen3-30b-a3b
llm = OpenAILLMService(
base_url="https://app-362f7ca1-6975-4e18-a605-ab202bf2c315.app.hathora.dev/v1",
api_key=os.getenv("HATHORA_API_KEY"),
model=None,
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
context_aggregator = LLMContextAggregatorPair(
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
@@ -77,11 +77,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
[
transport.input(), # Transport user input
stt,
context_aggregator.user(), # User responses
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
context_aggregator.assistant(), # Assistant spoken responses
assistant_aggregator, # Assistant spoken responses
]
)
@@ -98,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -24,7 +24,6 @@ from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.tts_service import TextAggregationMode
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -56,15 +55,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
# Alternatively, you can use TextAggregationMode.TOKEN to stream tokens instead of
# sentencesfor faster response times.
# text_aggregation_mode=TextAggregationMode.TOKEN,
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -98,7 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -21,7 +21,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -93,7 +92,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
# focus_speakers=["S1"],
@@ -104,32 +103,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. "
"Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to. "
),
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
@@ -160,7 +148,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
context.add_message({"role": "developer", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -22,7 +22,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.openai.base_llm import BaseOpenAILLMService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
@@ -76,7 +75,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async with aiohttp.ClientSession() as session:
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
settings=SpeechmaticsSTTService.Settings(
language=Language.EN,
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
),
@@ -84,31 +83,21 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah",
settings=SpeechmaticsTTSService.Settings(
voice="sarah",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
params=BaseOpenAILLMService.InputParams(temperature=0.75),
settings=OpenAILLMService.Settings(
temperature=0.75,
system_instruction="You are a helpful British assistant called Sarah in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Always include punctuation in your responses. Give very short replies - do not give longer replies unless strictly necessary. Respond to what the user said in a concise, funny, creative and helpful way. Use `<Sn/>` tags to identify different speakers - do not use tags in your replies. Do not respond to speakers within `<PASSIVE/>` tags unless explicitly asked to.",
),
)
messages = [
{
"role": "system",
"content": (
"You are a helpful British assistant called Sarah. "
"Your goal is to demonstrate your capabilities in a succinct way. "
"Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. "
"Always include punctuation in your responses. "
"Give very short replies - do not give longer replies unless strictly necessary. "
"Respond to what the user said in a concise, funny, creative and helpful way. "
"Use `<Sn/>` tags to identify different speakers - do not use tags in your replies."
),
},
]
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -139,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
messages.append({"role": "system", "content": "Say a short hello to the user."})
context.add_message({"role": "developer", "content": "Say a short hello to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -71,15 +71,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Be nice and helpful. Answer very briefly and without special characters like `#` or `*`. "
"Your response will be synthesized to voice and those characters will create unnatural sounds.",
"You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
MessagesPlaceholder("chat_history"),
("human", "{input}"),

View File

@@ -0,0 +1,151 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
from pipecat.services.deepgram.flux.sagemaker.stt import DeepgramFluxSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import ExternalUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
# Initialize Deepgram Flux SageMaker STT Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram Flux model
stt = DeepgramFluxSageMakerSTTService(
endpoint_name=os.getenv("SAGEMAKER_STT_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramFluxSageMakerSTTService.Settings(
min_confidence=0.3,
),
)
# Initialize Deepgram SageMaker TTS Service
# This requires:
# - AWS credentials configured (via environment variables or AWS CLI)
# - A deployed SageMaker endpoint with Deepgram TTS model
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
settings=AWSBedrockLLMSettings(
model="us.amazon.nova-pro-v1:0",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
# Use ExternalUserTurnStrategies since Flux handles turn detection natively
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=ExternalUserTurnStrategies(),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "user", "content": "Please introduce yourself to the user."})
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@stt.event_handler("on_update")
async def on_deepgram_flux_update(stt, transcript):
logger.debug(f"On deepgram flux update: {transcript}")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -56,14 +56,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramFluxSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
params=DeepgramFluxSTTService.InputParams(min_confidence=0.3),
settings=DeepgramFluxSTTService.Settings(
min_confidence=0.3,
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -100,7 +109,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -59,18 +59,20 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramHttpTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
voice="aura-2-andromeda-en",
settings=DeepgramHttpTTSService.Settings(
voice="aura-2-andromeda-en",
),
aiohttp_session=session,
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
messages = []
context = LLMContext(messages)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
@@ -102,7 +104,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -22,7 +22,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.aws.llm import AWSBedrockLLMService
from pipecat.services.aws.llm import AWSBedrockLLMService, AWSBedrockLLMSettings
from pipecat.services.deepgram.sagemaker.stt import DeepgramSageMakerSTTService
from pipecat.services.deepgram.sagemaker.tts import DeepgramSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
@@ -69,14 +69,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = DeepgramSageMakerTTSService(
endpoint_name=os.getenv("SAGEMAKER_TTS_ENDPOINT_NAME"),
region=os.getenv("AWS_REGION"),
voice="aura-2-andromeda-en",
settings=DeepgramSageMakerTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = AWSBedrockLLMService(
aws_region=os.getenv("AWS_REGION"),
model="us.amazon.nova-pro-v1:0",
params=AWSBedrockLLMService.InputParams(temperature=0.8),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=AWSBedrockLLMSettings(
model="us.amazon.nova-pro-v1:0",
temperature=0.8,
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -110,7 +114,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -7,7 +7,6 @@
import os
from deepgram import LiveOptions
from dotenv import load_dotenv
from loguru import logger
@@ -56,14 +55,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
live_options=LiveOptions(vad_events=True, utterance_end_ms="1000"),
settings=DeepgramSTTService.Settings(
vad_events=True,
utterance_end_ms="1000",
),
)
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -97,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,11 +55,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = DeepgramTTSService(api_key=os.getenv("DEEPGRAM_API_KEY"), voice="aura-2-andromeda-en")
tts = DeepgramTTSService(
api_key=os.getenv("DEEPGRAM_API_KEY"),
settings=DeepgramTTSService.Settings(
voice="aura-2-andromeda-en",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -93,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -63,13 +63,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsHttpTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
aiohttp_session=session,
settings=ElevenLabsHttpTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -104,7 +108,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "system", "content": "Please introduce yourself to the user."}
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])

View File

@@ -57,12 +57,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY", ""),
voice_id=os.getenv("ELEVENLABS_VOICE_ID", ""),
settings=ElevenLabsTTSService.Settings(
voice=os.getenv("ELEVENLABS_VOICE_ID", ""),
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -96,7 +100,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -0,0 +1,128 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
import os
import aiohttp
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.xai.llm import GrokLLMService
from pipecat.services.xai.tts import XAIHttpTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
async with aiohttp.ClientSession() as session:
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
tts = XAIHttpTTSService(
api_key=os.getenv("XAI_API_KEY"),
aiohttp_session=session,
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
llm = GrokLLMService(
api_key=os.getenv("XAI_API_KEY"),
settings=GrokLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -65,8 +65,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -100,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -65,8 +65,10 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = AzureLLMService(
api_key=os.getenv("AZURE_CHATGPT_API_KEY"),
endpoint=os.getenv("AZURE_CHATGPT_ENDPOINT"),
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=AzureLLMService.Settings(
model=os.getenv("AZURE_CHATGPT_MODEL"),
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -100,7 +102,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -54,15 +54,24 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = OpenAISTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
context = LLMContext()
@@ -97,7 +106,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -55,20 +55,25 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = OpenAIRealtimeSTTService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
# Uses local VAD by default.
# To enable server-side VAD, set turn_detection=None or
# a dict with server_vad settings.
# turn_detection={"type": "server_vad", "threshold": 0.5},
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
),
)
tts = OpenAITTSService(api_key=os.getenv("OPENAI_API_KEY"), voice="ballad")
tts = OpenAITTSService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAITTSService.Settings(
voice="ballad",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenAILLMService.Settings(
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)
context = LLMContext()
@@ -103,7 +108,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -57,7 +57,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
timestamp = int(time.time())
@@ -65,7 +67,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
api_key=os.getenv("OPENAI_API_KEY"),
openpipe_api_key=os.getenv("OPENPIPE_API_KEY"),
tags={"conversation_id": f"pipecat-{timestamp}"},
system_instruction="You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
settings=OpenPipeLLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
context = LLMContext()
@@ -99,7 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message({"role": "system", "content": "Please introduce yourself to the user."})
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

Some files were not shown because too many files have changed in this diff Show More