pipecat

Author	SHA1	Message	Date
Cale Shapera	ec574edd53	Add Inworld Realtime Service (#4140 ) * Add Inworld Realtime LLM service Adds a WebSocket-based realtime service for Inworld's cascade STT/LLM/TTS API with semantic VAD, function calling, and streaming transcription support. New files: - src/pipecat/services/inworld/realtime/ (service, events) - src/pipecat/adapters/services/inworld_realtime_adapter.py - examples/foundational/19zb-inworld-realtime.py Also includes: - websockets dependency for inworld extra in pyproject.toml - Adapter and settings tests matching OpenAI/Grok realtime patterns - Fix for double-response when server-side VAD is enabled * Prefer init-provided system instruction in Inworld Realtime Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup branch. * Update changelog entry with PR number * Fix changelog format to use bullet point * Polish PR: default model, example cleanup, changelog update - Change default model from gpt-4.1-nano to gpt-4.1-mini - Add function calling demo to example - Remove demo-testing artifact from system instruction - Mention Router support in changelog * Address PR review feedback for Inworld Realtime - Move example to examples/realtime/realtime-inworld.py - Change initial context role from "user" to "developer" - Remove explicit sample rates from example; sync them in _ensure_audio_config so Inworld gets the transport's actual rates - Add audio race condition guard in _handle_evt_audio_delta (matches OpenAI realtime pattern) - Convert remaining "system"/"developer" messages to "user" in adapter - Add clarifying comment for local-VAD vs server-VAD metrics paths * Simplify example, add provider tracking, remove local VAD path - Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning - Add pipecat-realtime session key prefix and provider_data metadata for Inworld traffic attribution - Remove local VAD code path (Inworld only supports server-side VAD) - Use typed InputAudioBufferAppendEvent for audio sends * Default TTS model to inworld-tts-1.5-max * Remove dead shimmed tools code, set STT/VAD defaults - Remove non-functional AdapterType.SHIM custom tools code from adapter - Default STT model to assemblyai/u3-rt-pro - Default VAD eagerness to low	2026-04-09 13:04:17 -04:00
Mark Backman	41e46ee69e	Remove deprecated vad_events and should_interrupt from DeepgramSTTService Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of Silero VAD. This removes vad_events from settings and LiveOptions, the should_interrupt parameter, the vad_enabled property, _on_speech_started/_on_utterance_end handlers, and simplifies _on_message and process_frame accordingly.	2026-04-02 22:05:49 -04:00
Mark Backman	7501effad5	Remove deprecated service module shims and old implementations Delete deprecated import shims that only re-export from new locations: - services/ai_services.py - services/gemini_multimodal_live/ - services/aws_nova_sonic/ - services/openai_realtime/ - services/deepgram/{stt,tts}_sagemaker.py - services/google/{llm_openai,llm_vertex,google}.py - services/google/gemini_live/llm_vertex.py - services/riva/ - services/nim/ Remove deprecated implementations replaced by newer services: - services/openai_realtime_beta/ (use openai.realtime) - services/google/openai/ (use google.llm) Also removes associated examples and tests for deleted services.	2026-03-31 15:34:14 -04:00
Mark Backman	1c99a537b2	Consolidate Grok services into xai module Both GrokLLMService and XAIHttpTTSService use the same xAI API (api.x.ai), so move Grok source files into the xai module. Leave deprecation shims in the old grok/ paths for backward compatibility.	2026-03-25 12:07:40 -04:00
Mark Backman	786279f143	Remove unused imports, 2026-03-07	2026-03-09 12:44:47 -04:00
Paul Kompfner	f4c039048c	Adopt the `settings` pattern for Grok Realtime session properties Move `session_properties` into `GrokRealtimeLLMSettings`, making `settings` the canonical way to configure Grok Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=GrokRealtimeLLMSettings(session_properties=...)`. `system_instruction` is synced bidirectionally between the top-level settings field and `session_properties.instructions`, with top-level taking precedence on conflict. (Unlike OpenAI Realtime, Grok's `SessionProperties` has no `model` field, so no model sync is needed.)	2026-03-06 12:53:26 -05:00
Paul Kompfner	bd4229ea9d	Adopt the `settings` pattern for OpenAI Realtime session properties Move `session_properties` into `OpenAIRealtimeLLMSettings`, making `settings` the canonical way to configure OpenAI Realtime — matching the pattern used across the rest of the codebase. The `session_properties` init arg is now deprecated in favor of `settings=OpenAIRealtimeLLMSettings(session_properties=...)`. `model` and `system_instruction` are synced bidirectionally between the top-level settings fields and `session_properties.model`/`.instructions`, with top-level taking precedence on conflict.	2026-03-06 11:46:21 -05:00
Mark Backman	034e81ff18	Update STT service settings	2026-03-06 08:29:14 -05:00
filipi87	8b09f7bbb4	Upgrading Deepgram to version 6.	2026-03-02 11:22:33 -03:00
Paul Kompfner	faed775d90	Extract `_DeepgramSTTSettingsBase` with shared `_merge_live_options_delta` to deduplicate LiveOptions merge logic between `__init__` and `apply_update`, and between the Deepgram STT and SageMaker variants; make top-level model/language take precedence over conflicting live_options values in updates; remove unnecessary Language enum-to-string conversion (Language is a StrEnum)	2026-02-26 11:02:44 -05:00
Paul Kompfner	8b6aa4b912	Unflatten `LiveOptions` back into a single `live_options` field on `DeepgramSTTSettings` and `DeepgramSageMakerSTTSettings`; add `apply_update` override with delta-merge semantics and `from_mapping` override for backward-compatible dict-style updates	2026-02-25 18:25:11 -05:00
Paul Kompfner	94a651cee2	Remove dead `ServiceSettings.to_dict` method	2026-02-17 15:15:18 -05:00
Paul Kompfner	3b1ba57452	Change `apply_update` / `_update_settings` return type from `set[str]` to `dict[str, Any]`. The dict maps each changed field name to its pre-update value, enabling services to do granular diffing of complex settings objects. Existing call-site patterns (`"field" in changed`, `if changed`, iteration) work unchanged; set-difference sites use `changed.keys() - {...}`.	2026-02-17 11:49:15 -05:00
Paul Kompfner	8a4ab611be	Broad service settings refactor, with the primary aim of making service settings discoverable and strongly-typed. Service settings can be updated at runtime with `UpdateSettingsFrame`s. Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `Settings` object fields and `InputParams` fields. Other than discoverability/typing, these are some other improvements brought by this refactor: - There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed - `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though not recommended over `UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change. - Several services now properly react to changed settings by reconnecting: - `AWSTranscribeSTTService` - `AzureSTTService` - `SonioxSTTService` - `GladiaSTTService` - `SpeechmaticsSTTService` - `AssemblyAISTTService` - `CartesiaSTTService` - `FishAudioTTSService` (would previously only reconnect when `model` changed) - `GoogleSTTService` - `SpeechmaticsSTTService` (which previously only handled some* settings updates through a nonstandard public `update_params` method) - `GradiumSTTService` - `NvidiaSegmentedSTTService` (which previously only handled changes to language) - Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things mostly in order, and establish the infrastructure and patterns we'll need going forward.	2026-02-13 15:12:26 -05:00

14 Commits