Add test exercising the step 3 ordering where stripping a trailing
assistant exposes a system message that then gets converted to user.
Move the reasoning about when a trailing system message can occur
into the docstring.
Perplexity allows multiple initial system messages, so don't merge them.
Instead, skip system-system pairs during the consecutive same-role merge
step. Broaden the trailing message fix to convert any trailing system
message to user (not just a lone system message), so contexts with only
system messages don't fail.
Perplexity's API is stricter than OpenAI about conversation history:
- Requires strict alternation between user/tool and assistant messages
- Disallows system messages except as the initial message
- Requires the last message to be user or tool
The new adapter transforms messages before sending to satisfy all three
constraints: merging consecutive initial system messages, converting
non-initial system to user, merging consecutive same-role messages, and
removing trailing assistant messages.
Also adds dual-system-instruction warnings to Cerebras, Fireworks,
Mistral, Perplexity, and SambaNova services (matching the existing
BaseOpenAILLMService pattern), and updates the warning text in
BaseOpenAILLMService to be more descriptive.
* Add ServiceSwitcherStrategyFailover for automatic error-based service switching
Introduce a strategy hierarchy: ServiceSwitcherStrategy (base) →
ServiceSwitcherStrategyManual (handles ManuallySwitchServiceFrame) →
ServiceSwitcherStrategyFailover (adds error-based failover). ServiceSwitcher
now defaults to ServiceSwitcherStrategyManual with strategy_type optional.
Non-fatal ErrorFrames are forwarded to the strategy via handle_error().
* Move metadata request into _set_active_if_available
Requesting metadata is part of making a service active, so it belongs
alongside setting _active_service and firing on_service_switched. This
removes the duplicate queue_frame calls from ServiceSwitcher push_frame
and process_frame.
* Add system_instruction parameter to run_inference
Allow callers to provide a custom system instruction directly when calling
run_inference, without having to construct provider-specific context objects.
For OpenAI, the instruction is prepended as a system message (preserving
existing messages). For Anthropic, Google, and AWS Bedrock, it overrides the
single system field with a warning when an existing system instruction is
present in the context.
* Use system_instruction parameter in _generate_summary
Pass the summarization prompt via run_inference's system_instruction
parameter instead of embedding it as a system message in the context.
* Add changelog for #3968
Update the tests in test_integration_unified_function_calling.py to not specify particular models but instead just use service defaults (the tests shouldn't be model-dependent anyway)
Turn completion instructions were being injected as a system message in
the LLM context, which caused warning spam when system_instruction was
also set, did not persist across full context updates, and broke LLMs
that do not support consecutive system messages.
Instead, compose the turn completion instructions into the LLM service
system_instruction field. This is managed via _base_system_instruction
which stores the original value for restoration when turn completion is
disabled.
Move `session_properties` into `GrokRealtimeLLMSettings`, making `settings` the canonical way to configure Grok Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=GrokRealtimeLLMSettings(session_properties=...)`.
`system_instruction` is synced bidirectionally between the top-level settings field and `session_properties.instructions`, with top-level taking precedence on conflict. (Unlike OpenAI Realtime, Grok's `SessionProperties` has no `model` field, so no model sync is needed.)
Move `session_properties` into `OpenAIRealtimeLLMSettings`, making `settings` the canonical way to configure OpenAI Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=OpenAIRealtimeLLMSettings(session_properties=...)`.
`model` and `system_instruction` are synced bidirectionally between the top-level settings fields and `session_properties.model`/`.instructions`, with top-level taking precedence on conflict.
- Add dedicated Settings subclasses to 20 LLM services that were
borrowing parent Settings classes (e.g. AzureLLMSettings,
GroqLLMSettings) so users don't need cross-module imports
- Fix field defaults to NOT_GIVEN in BaseWhisperSTTSettings,
OpenAIRealtimeSTTSettings, and NvidiaSegmentedSTTSettings for
delta-mode safety
- Fix incomplete default_settings in AWS, Cartesia, ElevenLabs,
Fish, and Whisper services so validate_complete() passes
- Add auto-discovered tests that verify all Settings classes default
to NOT_GIVEN (delta safety) and all services initialize with
complete settings (store completeness)
Allow either threshold to be set to None to cleanly disable that trigger,
instead of requiring users to set a very large number as a workaround.
At least one of the two must remain set (validated at construction time).
Replace the round-trip push_interruption_task_frame_and_wait() mechanism
with broadcast_interruption(), which pushes an InterruptionFrame both
upstream and downstream directly from the calling processor.
This eliminates race conditions (transcription arriving before the
InterruptionFrame comes back), swallowed-event timeouts (frame blocked
before reaching the sink), and the complexity of _wait_for_interruption
flag / queue bypass / frame.complete() obligations.
- Add broadcast_interruption() to FrameProcessor
- Deprecate push_interruption_task_frame_and_wait() (delegates to new method)
- Remove event field and complete() from InterruptionFrame/InterruptionTaskFrame
- Remove _wait_for_interruption flag and all special-case logic
- Remove frame.complete() calls in stt_mute_filter and llm_response_universal
- Update all 17 call sites to use broadcast_interruption()
- Update tests
Enables .model_dump() serialization for Pipecat Cloud collection.
All metrics now include start_time (Unix timestamp) for timeline
plotting alongside duration_secs.
Add per-service latency breakdown metrics alongside existing user-to-bot
latency measurement. When enable_metrics=True, the observer now emits an
on_latency_breakdown event with TTFB, text aggregation, and user turn
duration metrics collected between VADUserStoppedSpeakingFrame and
BotStartedSpeakingFrame.
- Add LatencyBreakdown dataclass with ttfb, text_aggregation,
user_turn_secs fields
- Accumulate MetricsFrame data during user→bot cycles
- Reset accumulators on InterruptionFrame to discard stale metrics
- Measure user_turn_secs from actual user silence (VAD timestamp -
stop_secs) to turn release (UserStoppedSpeakingFrame)
- Filter zero-value TTFB entries from startup metric resets
- Add frame deduplication using bounded deque + set pattern
- Update example 29 with latency breakdown display
Use start_offset_secs (offset from StartFrame) on ProcessorStartupTiming
instead of a wall-clock timestamp. Reports keep a single start_time
anchor for dashboard visualization. Remove _mono_to_wall conversion.
Switch ProcessorStartupTiming, StartupTimingReport, and
TransportTimingReport from dataclasses to Pydantic BaseModel. Add
start_time (Unix timestamp) fields and wall clock conversion for
monotonic observer timestamps.
Add BotConnectedFrame (SystemFrame) pushed by SFU transports (Daily,
LiveKit, HeyGen, Tavus) when the bot joins the room. Replace the
on_transport_readiness_measured event with on_transport_timing_report
which includes both bot_connected_secs and client_connected_secs.
Introduce ClientConnectedFrame (SystemFrame) pushed by all transports
when a client connects. StartupTimingObserver uses this to measure
transport readiness — the time from StartFrame to first client
connection — via a new on_transport_readiness_measured event.
Tracks how long each processor start method takes during pipeline
startup by measuring StartFrame arrive/leave deltas. Emits a timing
report via the on_startup_timing_report event and auto-logs a summary.
Internal pipeline processors are excluded from reports by default.
When filter_incomplete_user_turns is enabled and an LLMMessagesUpdateFrame
replaces the context via set_messages(), the turn completion instructions
system message was lost. This caused the LLM to stop emitting turn
completion markers. Re-inject the instructions after set_messages() to
fix this.
Allow pushing frames upstream through the pipeline by passing
FrameDirection.UPSTREAM. Downstream frames use the existing push queue,
while upstream frames are pushed directly from the pipeline sink.