Commit Graph

16 Commits

Author SHA1 Message Date
Mark Backman
644030584f Centralize OpenAI audio constants 2026-05-12 17:48:53 -04:00
Mark Backman
abd28e2ac1 Update OpenAI realtime transcription default 2026-05-12 15:20:57 -04:00
Cale Shapera
ec574edd53 Add Inworld Realtime Service (#4140)
* Add Inworld Realtime LLM service

Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.

New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py

Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled

* Prefer init-provided system instruction in Inworld Realtime

Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.

* Update changelog entry with PR number

* Fix changelog format to use bullet point

* Polish PR: default model, example cleanup, changelog update

- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog

* Address PR review feedback for Inworld Realtime

- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
  _ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
  OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths

* Simplify example, add provider tracking, remove local VAD path

- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
  for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends

* Default TTS model to inworld-tts-1.5-max

* Remove dead shimmed tools code, set STT/VAD defaults

- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
2026-04-09 13:04:17 -04:00
Mark Backman
41e46ee69e Remove deprecated vad_events and should_interrupt from DeepgramSTTService
Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of
Silero VAD. This removes vad_events from settings and LiveOptions,
the should_interrupt parameter, the vad_enabled property,
_on_speech_started/_on_utterance_end handlers, and simplifies
_on_message and process_frame accordingly.
2026-04-02 22:05:49 -04:00
Mark Backman
7501effad5 Remove deprecated service module shims and old implementations
Delete deprecated import shims that only re-export from new locations:
- services/ai_services.py
- services/gemini_multimodal_live/
- services/aws_nova_sonic/
- services/openai_realtime/
- services/deepgram/{stt,tts}_sagemaker.py
- services/google/{llm_openai,llm_vertex,google}.py
- services/google/gemini_live/llm_vertex.py
- services/riva/
- services/nim/

Remove deprecated implementations replaced by newer services:
- services/openai_realtime_beta/ (use openai.realtime)
- services/google/openai/ (use google.llm)

Also removes associated examples and tests for deleted services.
2026-03-31 15:34:14 -04:00
Mark Backman
1c99a537b2 Consolidate Grok services into xai module
Both GrokLLMService and XAIHttpTTSService use the same xAI API (api.x.ai),
so move Grok source files into the xai module. Leave deprecation shims in
the old grok/ paths for backward compatibility.
2026-03-25 12:07:40 -04:00
Mark Backman
786279f143 Remove unused imports, 2026-03-07 2026-03-09 12:44:47 -04:00
Paul Kompfner
f4c039048c Adopt the settings pattern for Grok Realtime session properties
Move `session_properties` into `GrokRealtimeLLMSettings`, making `settings` the canonical way to configure Grok Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=GrokRealtimeLLMSettings(session_properties=...)`.

`system_instruction` is synced bidirectionally between the top-level settings field and `session_properties.instructions`, with top-level taking precedence on conflict. (Unlike OpenAI Realtime, Grok's `SessionProperties` has no `model` field, so no model sync is needed.)
2026-03-06 12:53:26 -05:00
Paul Kompfner
bd4229ea9d Adopt the settings pattern for OpenAI Realtime session properties
Move `session_properties` into `OpenAIRealtimeLLMSettings`, making `settings` the canonical way to configure OpenAI Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=OpenAIRealtimeLLMSettings(session_properties=...)`.

`model` and `system_instruction` are synced bidirectionally between the top-level settings fields and `session_properties.model`/`.instructions`, with top-level taking precedence on conflict.
2026-03-06 11:46:21 -05:00
Mark Backman
034e81ff18 Update STT service settings 2026-03-06 08:29:14 -05:00
filipi87
8b09f7bbb4 Upgrading Deepgram to version 6. 2026-03-02 11:22:33 -03:00
Paul Kompfner
faed775d90 Extract _DeepgramSTTSettingsBase with shared _merge_live_options_delta to deduplicate LiveOptions merge logic between __init__ and apply_update, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum) 2026-02-26 11:02:44 -05:00
Paul Kompfner
8b6aa4b912 Unflatten LiveOptions back into a single live_options field on DeepgramSTTSettings and DeepgramSageMakerSTTSettings; add apply_update override with delta-merge semantics and from_mapping override for backward-compatible dict-style updates 2026-02-25 18:25:11 -05:00
Paul Kompfner
94a651cee2 Remove dead ServiceSettings.to_dict method 2026-02-17 15:15:18 -05:00
Paul Kompfner
3b1ba57452 Change apply_update / _update_settings return type from set[str] to dict[str, Any]. The dict maps each changed field name to its pre-update value, enabling services to do granular diffing of complex settings objects. Existing call-site patterns ("field" in changed, if changed, iteration) work unchanged; set-difference sites use changed.keys() - {...}. 2026-02-17 11:49:15 -05:00
Paul Kompfner
8a4ab611be Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with *UpdateSettingsFrames.
Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields.

Other than discoverability/typing, these are some other improvements brought by this refactor:
- There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed
- `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change.
- Several services now properly react to changed settings by reconnecting:
  - `AWSTranscribeSTTService`
  - `AzureSTTService`
  - `SonioxSTTService`
  - `GladiaSTTService`
  - `SpeechmaticsSTTService`
  - `AssemblyAISTTService`
  - `CartesiaSTTService`
  - `FishAudioTTSService` (would previously only reconnect when `model` changed)
  - `GoogleSTTService`
  - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method)
  - `GradiumSTTService`
  - `NvidiaSegmentedSTTService` (which previously only handled changes to language)
- Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth

NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
2026-02-13 15:12:26 -05:00