Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.
Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s
Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
(independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
optional stream/prompt_name params, eliminating duplicate SC methods
Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:
- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
return type is `str` rather than `str | None`, and misconfiguration
fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
before use.
* Add Inworld Realtime LLM service
Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.
New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py
Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled
* Prefer init-provided system instruction in Inworld Realtime
Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.
* Update changelog entry with PR number
* Fix changelog format to use bullet point
* Polish PR: default model, example cleanup, changelog update
- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog
* Address PR review feedback for Inworld Realtime
- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
_ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths
* Simplify example, add provider tracking, remove local VAD path
- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends
* Default TTS model to inworld-tts-1.5-max
* Remove dead shimmed tools code, set STT/VAD defaults
- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
Example files like openai.py shadow installed packages when Python adds the
script directory to sys.path. Prepend the parent folder name to each example
file (e.g. openai.py -> function-calling-openai.py). Also split
thinking-and-mcp/ into separate mcp/ and thinking/ directories.
Move 304 examples from a flat numbered directory into 14 descriptive
subfolders: getting-started, services (speech + function-calling),
transcription, vision, realtime, persistent-context,
context-summarization, update-settings (stt/tts/llm), turn-management,
thinking-and-mcp, transports, video-avatar, video-processing, and
features.
Strip numbered prefixes from filenames (e.g. 07c-interruptible-deepgram.py
becomes services/speech/deepgram.py) since the folder context makes them
redundant. Keep numbered prefixes only in getting-started/ where ordering
matters.
Update eval script paths and README to match the new structure.