Commit Graph

4974 Commits

Author SHA1 Message Date
Mark Backman
a8eff6fbbf Downgrade unrecognized language string log from warning to debug
Service-specific language strings like Deepgram's "multi" are valid
pass-through values, not issues worth warning about.
2026-03-25 10:20:36 -04:00
kompfner
86e086c6b5 Merge pull request #4130 from pipecat-ai/pk/realtime-services-init-v-context-system-instructions-cleanup
Prefer init-provided system instructions in realtime services
2026-03-25 09:13:52 -04:00
Paul Kompfner
ac2b1ecd47 Prefer init-provided system instruction in Grok Realtime
Add system_instruction parameter to the Grok Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Update the Grok Realtime example to use settings.system_instruction
instead of session_properties.instructions.
2026-03-24 17:29:19 -04:00
Paul Kompfner
e7dd84b552 Prefer init-provided system instruction in OpenAI Realtime
Add system_instruction parameter to the OpenAI Realtime adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.
2026-03-24 17:21:53 -04:00
Paul Kompfner
39329aaddb Prefer init-provided system instruction in Nova Sonic
Add system_instruction parameter to the Nova Sonic adapter's
get_llm_invocation_params() and call _resolve_system_instruction() to
prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Remove the service-side fallback logic, as the adapter now handles
resolution.
2026-03-24 17:18:44 -04:00
Paul Kompfner
56a56a4174 Prefer init-provided system instruction in Gemini Live
Pass self._system_instruction_from_init to the adapter's
get_llm_invocation_params(), which calls _resolve_system_instruction()
to prefer init-provided over context-provided system instructions and
warn on conflicts. Previously context-provided took precedence.

Also fix the reconnect check to only reconnect when the resolved
system instruction actually differs from what the initial connection
used, avoiding unnecessary reconnects.
2026-03-24 17:06:56 -04:00
kompfner
b80328e038 Merge pull request #4125 from pipecat-ai/pk/gemini-live-endframe-deferral-issue
Gemini Live: fix EndFrame-deferral hang
2026-03-24 17:02:46 -04:00
Paul Kompfner
45926a7135 Update Together.ai default model to openai/gpt-oss-20b
The previous default (meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) is
no longer available as a serverless Together.ai model and now requires a
custom deployment. The new default is openai/gpt-oss-20b, one of
Together's recommended models for small & fast use-cases.
2026-03-24 16:05:15 -04:00
Paul Kompfner
8c678c1c98 Set supports_developer_role = False for more OpenAI-compatible services
DeepSeek, Mistral, OLLama, Qwen, SambaNova, and Together don't support
the "developer" message role.
2026-03-24 16:05:15 -04:00
Paul Kompfner
4c121332cf Convert developer messages to user for Cerebras (and lay groundwork for other incompatible services)
OpenAI-compatible services that don't support the "developer" message
role can now set supports_developer_role = False on the service class.
BaseOpenAILLMService passes this as convert_developer_to_user to the
adapter, which converts developer messages to user messages before
sending them to the API. Applied to Cerebras and Perplexity.

Also removes the now-redundant developer→user conversion step from
PerplexityLLMAdapter (handled by the parent adapter via the flag).
2026-03-24 16:05:15 -04:00
Paul Kompfner
19bcc8620c Fix Gemini Live not honoring settings.system_instruction
_system_instruction_from_init was being set from the deprecated
`system_instruction` constructor parameter instead of
`self._settings.system_instruction`, so system instructions provided
via settings were silently ignored.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0530722c58 Convert developer messages to user in Perplexity adapter
Perplexity doesn't support the "developer" role. Developer messages are
now converted to "user" before other transformations are applied.
2026-03-24 16:05:15 -04:00
Paul Kompfner
0d1b834770 Add developer message support to realtime adapters
OpenAI Realtime, Grok Realtime, and AWS Nova Sonic adapters now convert
"developer" role messages to "user" (consistent with all other non-OpenAI
adapters). Previously these messages were silently dropped. Adds starter
unit tests for all three realtime adapters.
2026-03-24 16:05:15 -04:00
Paul Kompfner
27fabfc1b3 Improve warning message wording and formatting 2026-03-24 16:02:42 -04:00
Paul Kompfner
2135557689 Simplify: don't promote developer messages to system instruction
Developer messages are now always converted to "user" in non-OpenAI
adapters, never promoted to the system instruction. This removes an
inconsistency where adding an unrelated message to context would change
whether a developer message got promoted.

Simplifications:
- Rename _extract_initial_system_or_developer → _extract_initial_system
- Return Optional[str] instead of Tuple (role is always "system")
- Drop initial_context_message_role from _resolve_system_instruction
- Drop system_role fields from all ConvertedMessages dataclasses
2026-03-24 16:02:42 -04:00
Paul Kompfner
a0393b9af6 Fix: warn on system_instruction conflict even with single system message
When the only message in context was a system message,
_extract_initial_system_or_developer would convert it to "user" (to
prevent empty history) without warning about the conflict with
system_instruction. Now warns inline before converting, with a message
explaining both the conflict and the user-role conversion.
2026-03-24 16:02:42 -04:00
Paul Kompfner
3bbec0a2c8 Broaden docstring: all non-OpenAI providers need non-empty messages 2026-03-24 16:02:42 -04:00
Paul Kompfner
e29a63e1ae Improve _extract_initial_system_or_developer docstring clarity 2026-03-24 16:02:42 -04:00
Paul Kompfner
45178972d7 Fix stale docstring in PerplexityLLMAdapter 2026-03-24 16:02:42 -04:00
Paul Kompfner
d4dea30407 Centralize system message handling in adapters; add developer message support
Two goals:

1. Centralize system_instruction vs context system message resolution into
   the LLM adapters. This eliminates duplication between in-pipeline and
   out-of-band (run_inference) code paths across ~16 locations in service
   llm.py files.

2. Add support for "developer" role messages in conversation context, which
   is facilitated by the above centralization.

Shared helpers on BaseLLMAdapter:
- _extract_initial_system_or_developer: extracts/converts messages[0]
  based on role and whether system_instruction is provided
- _resolve_system_instruction: warns on conflicts between system_instruction
  and context system messages, returns the effective instruction

Developer message handling (new):
- Non-OpenAI adapters: an initial "developer" message is promoted to the
  system instruction when no system_instruction is provided; otherwise it
  is converted to "user". Subsequent "developer" messages are always
  converted to "user". No conflict warning is emitted for developer
  messages (unlike "system" messages).
- OpenAI adapter: "developer" messages pass through in conversation
  history without triggering conflict warnings.
- OpenAI Responses adapter: "developer" messages are kept as "developer"
  role (same as "system", which is also converted to "developer" for the
  Responses API).

Other behavior changes:
- Gemini: "initial" system message detection now checks messages[0] only
  (previously searched anywhere in the list)
- Bedrock: a lone system message is now converted to "user" instead of
  being extracted to an empty message list (matches existing Anthropic
  behavior)
2026-03-24 16:02:42 -04:00
Mark Backman
b49bf1c83f Merge pull request #4127 from pipecat-ai/mb/tts-text-frame-ordering
Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
2026-03-24 15:39:06 -04:00
Mark Backman
1b0f7ecb0e Merge pull request #4126 from pipecat-ai/mb/fix-tts-flush-phantom-contexts
Fix TTS flush creating phantom contexts on ElevenLabs
2026-03-24 15:33:58 -04:00
Mark Backman
5d71de8aad Fix LLMFullResponseEndFrame racing ahead of final TTSTextFrame
Route LLMFullResponseEndFrame through the serialization queue instead
of pushing it directly downstream when push_text_frames is enabled.
This ensures the frame is emitted only after the audio context is
fully drained, preserving correct ordering relative to TTSTextFrames.

Previously, the final sentence TTSTextFrame would arrive at the
LLMAssistantAggregator after LLMFullResponseEndFrame, causing it to
be dropped from the conversation context (especially with RTVI text
input where no subsequent interruption would flush the orphaned text).
2026-03-24 15:09:42 -04:00
Paul Kompfner
dc56cb2ccc Gemini Live: reset _bot_is_responding when releasing deferred EndFrame
Without this, the released EndFrame re-enters process_frame, sees
_bot_is_responding is still True, defers again, and loops indefinitely.
2026-03-24 15:01:07 -04:00
Paul Kompfner
063955b7eb Gemini Live: clean up EndFrame deferral state on disconnect
Cancel the deferral timeout task and clear the pending EndFrame during
disconnect, which could otherwise be left dangling after a
CancelFrame-triggered shutdown.
2026-03-24 14:30:14 -04:00
Mark Backman
35f52f70ab Fix TTS flush creating phantom contexts on providers like ElevenLabs
When an interruption arrives before any LLM text reaches run_tts, the
turn context ID exists but was never registered via create_audio_context.
Calling flush_audio for this unregistered context sends a message to the
provider (e.g. ElevenLabs) with a context_id it has never seen, which
implicitly creates a server-side context that is never closed. After
enough rapid interruptions these phantom contexts accumulate and exceed
the providers limit (ElevenLabs: 5 simultaneous contexts, 1008 policy
violation).

Guard the flush call with audio_context_available so it only fires when
the context was actually opened.

Fixes #4114
2026-03-24 13:42:01 -04:00
Paul Kompfner
4abd4d031d Gemini Live: add safety timeout to EndFrame deferral to prevent indefinite pipeline hang
When an EndFrame arrives while the bot is mid-response, it is deferred
until turn_complete is received. If turn_complete never arrives, the
EndFrame gets stuck forever and the pipeline hangs indefinitely.

Add a 30-second timeout: if turn_complete hasn't arrived by then, the
deferred EndFrame is released anyway with a warning log. The timeout
is cancelled if turn_complete arrives normally.
2026-03-24 12:50:07 -04:00
Paul Kompfner
7e42998e9e Gemini Live: fix potential EndFrame-deferral hang by handling turn_complete without usage_metadata
We observed a case where a deferred EndFrame was never released in
Gemini Live, causing the pipeline to hang indefinitely. The EndFrame
deferral mechanism waits for _handle_msg_turn_complete to set
_bot_is_responding back to False, but turn_complete messages were only
processed if they also contained usage_metadata. If Gemini ever sent
turn_complete without usage_metadata, the message would be silently
dropped and the deferred EndFrame would never be released.

Now turn_complete is always handled regardless of usage_metadata
presence, with usage_metadata processing only when available.

Note: we have not actually observed a turn_complete without
usage_metadata in practice, so this is a theoretical fix for the
EndFrame-deferral hang. The actual root cause of the observed hang
may lie elsewhere.
2026-03-24 12:32:14 -04:00
Filipi da Silva Fuchter
28eb4544d3 Merge pull request #4122 from pipecat-ai/filipi/inworld_follow_up
Invoking on_turn_context_created when we receive a TTSSpeakFrame.
2026-03-24 12:28:00 -04:00
Filipi da Silva Fuchter
b45dcb1ae0 Merge pull request #4028 from inworld-ai/ian/close-on-turn-complete
fix(inworld): close context at end of turn instead of relying on idle timeout
2026-03-24 12:07:51 -04:00
Mark Backman
6eb988b729 Merge pull request #4092 from harshitajain165/harshita/smallest-tts-only
Add Smallest AI TTS service integration
2026-03-24 11:54:34 -04:00
Mark Backman
f68b3222b3 Fix SmallestTTSService to use InterruptibleTTSService audio context system
- Route audio through audio contexts (append_to_audio_context) instead of
  pushing frames directly, enabling proper turn management and interruptions
- Add push_stop_frames and push_start_frame so the base class handles
  TTSStartedFrame/TTSStoppedFrame lifecycle
- Remove manual context_id tracking (self._context_id) in favor of
  get_active_audio_context_id()
- Don't call remove_audio_context on "complete" — Smallest sends one
  per request, not per turn; let the base class timeout handle cleanup
- Guard v2-only params (consistency, similarity, enhancement) so they
  aren't sent to lightning-v3.1
- Remove request_id from request payload (not a documented request field)
- Add flush_audio override to send flush to WebSocket
2026-03-24 11:46:28 -04:00
filipi87
05b9c514fb Invoking TTSSpeakFrame when we receive a TTSSpeakFrame. 2026-03-24 12:39:28 -03:00
Filipi da Silva Fuchter
03c0d7c345 Merge pull request #4013 from inworld-ai/ian/prewarm-context-inworld-v2
[inworld] Pre-open WebSocket TTS context on LLM response start
2026-03-24 11:37:28 -04:00
Filipi da Silva Fuchter
0783edb185 Merge pull request #4120 from pipecat-ai/filipi/krisp-viva-vad-support
Added cleanup() method to VADAnalyzer base class
2026-03-24 11:26:53 -04:00
Mark Backman
51d28b4a9f Code review fixes 2026-03-24 11:21:04 -04:00
kompfner
cf083b8411 Merge pull request #4078 from pipecat-ai/cb/gemini-updates
Updates for Gemini Live
2026-03-24 11:18:00 -04:00
Harshita Jain
099814d74a Add Smallest AI TTS service integration
Adds SmallestTTSService, a WebSocket-based TTS service using Smallest AI's
Lightning v3.1 model. Follows current Pipecat service conventions:

- SmallestTTSSettings dataclass with runtime-updatable settings (voice,
  language, speed, etc.)
- Reconnects on model change; keepalive every 30s to prevent idle timeout
- TTS settings default to None so the API applies its own defaults
- Model enum: SmallestTTSModel.LIGHTNING_V3_1

Includes a foundational example (07zl-interruptible-smallest.py) using
Deepgram STT + Smallest TTS + OpenAI LLM.

STT integration will follow in a separate PR once the hallucination/finalize
behaviour is resolved.

Made-with: Cursor
2026-03-24 11:11:10 -04:00
Mark Backman
dd45843c42 Merge pull request #4117 from m-ods/feat/assemblyai-domain-param
feat(assemblyai): add domain parameter for Medical Mode
2026-03-24 11:02:01 -04:00
Paul Kompfner
8109ab6135 Further tweaks and improvements to Gemini 3 support in Gemini Live
Gets Gemini 3 support to the point where it works with:
- The "legacy" pattern from the previous (removed) 26- example
- inference_on_context_initialization=True (the default)
- inference_on_context_initialization=False
2026-03-24 10:45:41 -04:00
filipi87
9df8985d60 Refactoring the way we automatically push TTSStoppedFrame. 2026-03-24 11:00:06 -03:00
filipi87
02cfb129d3 Invoke cleanup method on VAD analyzer. 2026-03-24 10:49:14 -03:00
Filipi da Silva Fuchter
5ed183d215 Merge pull request #4022 from krispai/krisp-viva-vad-support
Draft Implementation for Krisp VIVA VAD.
2026-03-24 09:44:32 -04:00
Mark Backman
5c3d3aea2b Merge pull request #4115 from pipecat-ai/mb/user-turn-stop-warnings
Warn when VAD stop_secs misconfiguration may degrade turn detection
2026-03-24 09:32:20 -04:00
Mark Backman
aa0b49d69f Code review fixes 2026-03-24 09:22:08 -04:00
Alex-wuhu
8c6f4a8d7b Add Novita AI LLM service provider 2026-03-24 09:20:50 -04:00
Mark Backman
1c8a8f51d4 Code review fixes 2026-03-24 08:46:03 -04:00
dhruvladia-sarvam
349b8645f3 Merge branch 'main' into feat/sarvam-llm-integration 2026-03-24 16:34:12 +05:30
dhruvladia-sarvam
696196e30c alignment with pr 4081 2026-03-24 16:29:58 +05:30
Martin Schweiger
f21b262969 feat(assemblyai): add domain parameter for Medical Mode
Add `domain` field to AssemblyAISTTSettings to support AssemblyAI's
streaming API `domain` query parameter, enabling specialized recognition
modes like Medical Mode (`medical-v1`).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 13:09:42 +08:00