Commit Graph

412 Commits

Author SHA1 Message Date
Mark Backman
c331c75d66 Add tests for send_media() exception handling in DeepgramSTTService 2026-03-26 09:20:58 -04:00
Mark Backman
7fef3b01eb Merge pull request #4142 from pipecat-ai/mb/grok-move-to-xai-module
Consolidate Grok services into xai module
2026-03-25 23:32:18 -04:00
filipi87
413dbaf974 Automated tests to validate the silence injection guards. 2026-03-25 16:05:58 -03:00
filipi87
da3f184316 Automated tests to validate the silence injection guards. 2026-03-25 15:38:21 -03:00
Mark Backman
4ee4002d5d Merge pull request #4137 from pipecat-ai/mb/language-string-log-level-debug
Downgrade unrecognized language string log from warning to debug
2026-03-25 12:26:46 -04:00
Mark Backman
1c99a537b2 Consolidate Grok services into xai module
Both GrokLLMService and XAIHttpTTSService use the same xAI API (api.x.ai),
so move Grok source files into the xai module. Leave deprecation shims in
the old grok/ paths for backward compatibility.
2026-03-25 12:07:40 -04:00
Nicholas Zhao
bbd14de9c5 Address PR review: rename to XAIHttpTTSService, add language map, clean up API
- Rename XAITTSService → XAIHttpTTSService and XAITTSSettings → XAIHttpTTSSettings
- Add language_to_xai_language() with explicit LANGUAGE_MAP using resolve_language()
- Remove deprecated InputParams, params, voice, language init params
- Remove XAI_DEFAULT_SAMPLE_RATE and XAI_PCM_CODEC constants; add encoding param
- Set sample_rate=None default (picked up from PipelineParams or user)
- Use Language.EN enum instead of string "en" for default language
- Add changelog/4031.added.md
- Add 07e-interruptible-xai.py foundational example
- Update 14g-function-calling-grok.py to use XAIHttpTTSService
- Register 07e in run-release-evals.py
2026-03-25 10:46:54 -04:00
Nicholas Zhao
02b97035f8 Add xAI TTS service 2026-03-25 10:45:15 -04:00
Mark Backman
f470ff193e Update language tests to expect debug instead of warning 2026-03-25 10:26:10 -04:00
Paul Kompfner
bb33045389 Add system instruction conflict resolution tests for realtime adapters
Test that OpenAI Realtime, Grok Realtime, and Nova Sonic adapters
prefer init-provided system_instruction over context-provided, warn
on conflicts, and don't warn for developer messages.
2026-03-24 17:30:35 -04:00
Paul Kompfner
4c121332cf Convert developer messages to user for Cerebras (and lay groundwork for other incompatible services)
OpenAI-compatible services that don't support the "developer" message
role can now set supports_developer_role = False on the service class.
BaseOpenAILLMService passes this as convert_developer_to_user to the
adapter, which converts developer messages to user messages before
sending them to the API. Applied to Cerebras and Perplexity.

Also removes the now-redundant developer→user conversion step from
PerplexityLLMAdapter (handled by the parent adapter via the flag).
2026-03-24 16:05:15 -04:00
Paul Kompfner
0530722c58 Convert developer messages to user in Perplexity adapter
Perplexity doesn't support the "developer" role. Developer messages are
now converted to "user" before other transformations are applied.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0d1b834770 Add developer message support to realtime adapters
OpenAI Realtime, Grok Realtime, and AWS Nova Sonic adapters now convert
"developer" role messages to "user" (consistent with all other non-OpenAI
adapters). Previously these messages were silently dropped. Adds starter
unit tests for all three realtime adapters.
2026-03-24 16:05:15 -04:00
Paul Kompfner
2135557689 Simplify: don't promote developer messages to system instruction
Developer messages are now always converted to "user" in non-OpenAI
adapters, never promoted to the system instruction. This removes an
inconsistency where adding an unrelated message to context would change
whether a developer message got promoted.

Simplifications:
- Rename _extract_initial_system_or_developer → _extract_initial_system
- Return Optional[str] instead of Tuple (role is always "system")
- Drop initial_context_message_role from _resolve_system_instruction
- Drop system_role fields from all ConvertedMessages dataclasses
2026-03-24 16:02:42 -04:00
Paul Kompfner
a0393b9af6 Fix: warn on system_instruction conflict even with single system message
When the only message in context was a system message,
_extract_initial_system_or_developer would convert it to "user" (to
prevent empty history) without warning about the conflict with
system_instruction. Now warns inline before converting, with a message
explaining both the conflict and the user-role conversion.
2026-03-24 16:02:42 -04:00
Paul Kompfner
64ba013b68 Move OpenAI Responses adapter tests into test_get_llm_invocation_params.py
Consolidates all adapter get_llm_invocation_params tests in one file.
Adds new tests for developer message handling in the Responses adapter.
2026-03-24 16:02:42 -04:00
Paul Kompfner
7377d88cf5 Move system_instruction tests into test_get_llm_invocation_params.py 2026-03-24 16:02:42 -04:00
Paul Kompfner
d4dea30407 Centralize system message handling in adapters; add developer message support
Two goals:

1. Centralize system_instruction vs context system message resolution into
   the LLM adapters. This eliminates duplication between in-pipeline and
   out-of-band (run_inference) code paths across ~16 locations in service
   llm.py files.

2. Add support for "developer" role messages in conversation context, which
   is facilitated by the above centralization.

Shared helpers on BaseLLMAdapter:
- _extract_initial_system_or_developer: extracts/converts messages[0]
  based on role and whether system_instruction is provided
- _resolve_system_instruction: warns on conflicts between system_instruction
  and context system messages, returns the effective instruction

Developer message handling (new):
- Non-OpenAI adapters: an initial "developer" message is promoted to the
  system instruction when no system_instruction is provided; otherwise it
  is converted to "user". Subsequent "developer" messages are always
  converted to "user". No conflict warning is emitted for developer
  messages (unlike "system" messages).
- OpenAI adapter: "developer" messages pass through in conversation
  history without triggering conflict warnings.
- OpenAI Responses adapter: "developer" messages are kept as "developer"
  role (same as "system", which is also converted to "developer" for the
  Responses API).

Other behavior changes:
- Gemini: "initial" system message detection now checks messages[0] only
  (previously searched anywhere in the list)
- Bedrock: a lone system message is now converted to "user" instead of
  being extracted to an empty message list (matches existing Anthropic
  behavior)
2026-03-24 16:02:42 -04:00
Mark Backman
5d71de8aad Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
Route LLMFullResponseEndFrame through the serialization queue instead
of pushing it directly downstream when push_text_frames is enabled.
This ensures the frame is emitted only after the audio context is
fully drained, preserving correct ordering relative to TTSTextFrames.

Previously, the final sentence TTSTextFrame would arrive at the
LLMAssistantAggregator after LLMFullResponseEndFrame, causing it to
be dropped from the conversation context (especially with RTVI text
input where no subsequent interruption would flush the orphaned text).
2026-03-24 15:09:42 -04:00
Filipi da Silva Fuchter
5ed183d215 Merge pull request #4022 from krispai/krisp-viva-vad-support
Draft Implementation for Krisp VIVA VAD.
2026-03-24 09:44:32 -04:00
Mark Backman
5c3d3aea2b Merge pull request #4115 from pipecat-ai/mb/user-turn-stop-warnings
Warn when VAD stop_secs misconfiguration may degrade turn detection
2026-03-24 09:32:20 -04:00
Alex-wuhu
8c6f4a8d7b Add Novita AI LLM service provider 2026-03-24 09:20:50 -04:00
Mark Backman
483b643b07 Warn when VAD stop_secs misconfiguration may degrade turn detection
Add warnings in SpeechTimeoutUserTurnStopStrategy and
TurnAnalyzerUserTurnStopStrategy when stop_secs differs from the
recommended default (0.2s) or when stop_secs >= STT p99 latency,
which collapses the STT wait timeout to 0s. Document the stop_secs=0.2
assumption in stt_latency.py.
2026-03-23 17:57:51 -04:00
Garegin Harutyunyan
f1f51de962 Merge branch 'main' into krisp-viva-vad-support 2026-03-23 18:35:58 +04:00
Garegin Harutyunyan
c32240e14b Fixed review comments. 2026-03-23 17:44:48 +04:00
Pablo Ois Lagarde
bc0e7130b8 fix: always include parameters field in Genesys AudioHook messages
The AudioHook protocol requires every message to carry a `parameters`
object. `_create_message` conditionally included it only when parameters
were truthy, so pong responses and closed responses without
outputVariables were sent without the field.

Clients that validate message structure (including the Genesys reference
implementation) rejected these messages, which broke server sequence
tracking and prevented outputVariables from reaching the Architect flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 16:37:53 -03:00
kompfner
488dc1d07e Merge pull request #4074 from pipecat-ai/pk/openai-responses-llm-service
feat: add OpenAI Responses API LLM service
2026-03-19 15:44:26 -04:00
Paul Kompfner
348df9d4ce fix: remove redundant instructions override in run_inference
The override would re-add `instructions` after the adapter had
intentionally converted it to a developer message for empty contexts.
Added a regression test.
2026-03-19 13:34:41 -04:00
Paul Kompfner
d702ebd6a2 Add frame_order parameter to SyncParallelPipeline
Adds a FrameOrder enum with ARRIVAL (default, existing behavior) and
PIPELINE (pushes frames in pipeline definition order). This lets callers
guarantee output ordering between parallel pipelines — e.g. ensuring
image frames precede audio frames — without needing a separate reordering
processor downstream.

Updates the 05-sync-speech-and-image example to use FrameOrder.PIPELINE,
removing the ImageBeforeAudioReorderer class entirely.
2026-03-19 09:43:51 -04:00
filipi87
5fd98e1391 Fixing TTS frame order. 2026-03-19 09:43:40 -03:00
Mark Backman
bad10177d4 Add WakePhraseUserTurnStartStrategy (#4064)
- Add WakePhraseUserTurnStartStrategy for gating interaction behind wake                                                                            
  phrase detection, with timeout and single_activation modes                                                                                        
- Add default_user_turn_start_strategies() and                                                                                                      
  default_user_turn_stop_strategies() helper functions                                                                                              
- Deprecate WakeCheckFilter in favor of the new strategy
- Extend ProcessFrameResult to stop strategies for short-circuit evaluation
- Fix MinWordsUserTurnStartStrategy including filtered text in output
2026-03-18 16:47:17 -04:00
Paul Kompfner
951bb0c1a7 feat: set store=False and add run_inference tests
Set store=False in Responses API calls since we send full conversation
history as input items and don't use previous_response_id.

Add 5 run_inference tests for OpenAIResponsesLLMService using real
LLMContext and adapter (only HTTP client mocked).
2026-03-18 14:47:12 -04:00
Paul Kompfner
c4f21ef76b test: add run_inference tests for OpenAIResponsesLLMService
Uses real LLMContext and adapter (only HTTP client is mocked) to test
basic inference, client exception propagation, system_instruction
override, empty context fallback, and max_tokens override.
2026-03-18 14:17:21 -04:00
Paul Kompfner
a7167ad121 test: add run_inference tests for OpenAIResponsesLLMService
Tests cover basic inference, client exception propagation,
system_instruction override, and max_tokens override.
2026-03-18 14:09:17 -04:00
Paul Kompfner
45186cc4ce feat: add OpenAI Responses API LLM service
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).

- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
2026-03-18 11:45:23 -04:00
Mark Backman
a32f558b07 Merge pull request #4026 from pipecat-ai/mb/fix-deepgram-base-url
Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
2026-03-17 16:39:24 -04:00
Mark Backman
10b3bff525 Merge pull request #4058 from pipecat-ai/mb/improve-stt-tts-language-code-robustness
fix: resolve raw language strings through Language enum for proper service conversion
2026-03-17 16:20:12 -04:00
Mark Backman
790a23d2e5 fix: resolve raw language strings through Language enum for proper service conversion
Raw strings like "de-DE" passed as the language parameter to TTS/STT services
were bypassing the Language enum resolution logic, causing silent failures
(e.g. ElevenLabs expects "de" not "de-DE"). Now raw strings are first converted
to Language enums so they go through the same resolve_language() path, with a
warning logged for unrecognized strings.
2026-03-17 12:00:28 -04:00
Mark Backman
5000b040dd Fix stale state in user turn stop strategies between turns
Reset stop strategies at turn start (not just turn stop) so that late
transcriptions arriving between turns do not leave stale _text that
causes premature stops on the next turn. Also cancel pending timeout
tasks in reset() for both SpeechTimeout and TurnAnalyzer strategies.
2026-03-17 11:31:08 -04:00
Mark Backman
79b7a0f969 Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
The base_url parameter previously forced wss:// and https:// schemes,
breaking air-gapped or private deployments that need ws:// or http://.
Extract URL derivation into _derive_deepgram_urls() helper that respects
the developers scheme choice while deriving the paired WebSocket and
HTTP URLs the Deepgram SDK requires.

Closes #4019
2026-03-13 13:53:06 -04:00
Garegin Harutyunyan
33f042b500 format fix. 2026-03-13 19:32:39 +04:00
Garegin Harutyunyan
0722784f3a tests for VAD. 2026-03-13 19:30:03 +04:00
Paul Kompfner
99f28120b7 Remove trailing system→user conversion for cross-call stability
Perplexity appears to have statefulness within a conversation, so
converting a system message to "user" in one call and then back to
"system" in the next (after more messages are appended) causes API
errors. Remove the trailing system→user conversion entirely — if the
context only has system messages, the API call will fail but the
mistake will be caught right away.
2026-03-12 16:07:39 -04:00
Paul Kompfner
e69f5a76e1 Add test for trailing assistant+system ordering, improve docstring
Add test exercising the step 3 ordering where stripping a trailing
assistant exposes a system message that then gets converted to user.
Move the reasoning about when a trailing system message can occur
into the docstring.
2026-03-12 15:24:17 -04:00
Paul Kompfner
7f98cc9921 Remove initial system message merging, handle trailing system messages
Perplexity allows multiple initial system messages, so don't merge them.
Instead, skip system-system pairs during the consecutive same-role merge
step. Broaden the trailing message fix to convert any trailing system
message to user (not just a lone system message), so contexts with only
system messages don't fail.
2026-03-12 15:14:56 -04:00
Paul Kompfner
0373f85b85 Add PerplexityLLMAdapter to enforce Perplexity's message ordering constraints
Perplexity's API is stricter than OpenAI about conversation history:
- Requires strict alternation between user/tool and assistant messages
- Disallows system messages except as the initial message
- Requires the last message to be user or tool

The new adapter transforms messages before sending to satisfy all three
constraints: merging consecutive initial system messages, converting
non-initial system to user, merging consecutive same-role messages, and
removing trailing assistant messages.

Also adds dual-system-instruction warnings to Cerebras, Fireworks,
Mistral, Perplexity, and SambaNova services (matching the existing
BaseOpenAILLMService pattern), and updates the warning text in
BaseOpenAILLMService to be more descriptive.
2026-03-12 14:56:30 -04:00
Aleix Conchillo Flaqué
a4310d4335 Merge pull request #3980 from pipecat-ai/aleix/move-google-vertex-openai
Move Google Vertex and OpenAI LLM modules to subpackages
2026-03-10 13:37:02 -07:00
Aleix Conchillo Flaqué
b23652caa6 Update imports to use new google.vertex and google.openai paths 2026-03-10 12:58:04 -07:00
kollaikal-rupesh
80bd935c19 Add ServiceSwitcherStrategyFailover for automatic failover on service errors (#3870)
* Add ServiceSwitcherStrategyFailover for automatic error-based service switching

Introduce a strategy hierarchy: ServiceSwitcherStrategy (base) →
ServiceSwitcherStrategyManual (handles ManuallySwitchServiceFrame) →
ServiceSwitcherStrategyFailover (adds error-based failover). ServiceSwitcher
now defaults to ServiceSwitcherStrategyManual with strategy_type optional.
Non-fatal ErrorFrames are forwarded to the strategy via handle_error().

* Move metadata request into _set_active_if_available

Requesting metadata is part of making a service active, so it belongs
alongside setting _active_service and firing on_service_switched. This
removes the duplicate queue_frame calls from ServiceSwitcher push_frame
and process_frame.
2026-03-10 15:37:30 -04:00
Mark Backman
9b26faff05 Merge pull request #3961 from ai-coustics/goekmengoergen/sys-663-re-enable-enhancement-level-feature-on-pipecat
Add enhancement_level support to `AICFilter`.
2026-03-10 14:24:15 -04:00