* Add Inworld Realtime LLM service Adds a WebSocket-based realtime service for Inworld's cascade STT/LLM/TTS API with semantic VAD, function calling, and streaming transcription support. New files: - src/pipecat/services/inworld/realtime/ (service, events) - src/pipecat/adapters/services/inworld_realtime_adapter.py - examples/foundational/19zb-inworld-realtime.py Also includes: - websockets dependency for inworld extra in pyproject.toml - Adapter and settings tests matching OpenAI/Grok realtime patterns - Fix for double-response when server-side VAD is enabled * Prefer init-provided system instruction in Inworld Realtime Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup branch. * Update changelog entry with PR number * Fix changelog format to use bullet point * Polish PR: default model, example cleanup, changelog update - Change default model from gpt-4.1-nano to gpt-4.1-mini - Add function calling demo to example - Remove demo-testing artifact from system instruction - Mention Router support in changelog * Address PR review feedback for Inworld Realtime - Move example to examples/realtime/realtime-inworld.py - Change initial context role from "user" to "developer" - Remove explicit sample rates from example; sync them in _ensure_audio_config so Inworld gets the transport's actual rates - Add audio race condition guard in _handle_evt_audio_delta (matches OpenAI realtime pattern) - Convert remaining "system"/"developer" messages to "user" in adapter - Add clarifying comment for local-VAD vs server-VAD metrics paths * Simplify example, add provider tracking, remove local VAD path - Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning - Add pipecat-realtime session key prefix and provider_data metadata for Inworld traffic attribution - Remove local VAD code path (Inworld only supports server-side VAD) - Use typed InputAudioBufferAppendEvent for audio sends * Default TTS model to inworld-tts-1.5-max * Remove dead shimmed tools code, set STT/VAD defaults - Remove non-functional AdapterType.SHIM custom tools code from adapter - Default STT model to assemblyai/u3-rt-pro - Default VAD eagerness to low
Pipecat Examples
This directory contains examples showing how to build voice and multimodal agents with Pipecat.
Setup
-
Follow the README steps to get your local environment configured.
Run from root directory: Make sure you are running the steps from the root directory.
Using local audio?: The
LocalAudioTransportrequires a system dependency forportaudio. Install the dependency to use the transport. -
Copy the
env.examplefile and add API keys for services you plan to use:cp env.example .env # Edit .env with your API keys -
Run any example:
uv run python getting-started/01-say-one-thing.py -
Open the web interface at http://localhost:7860/client/ and click "Connect"
Running examples with other transports
Most examples support running with other transports, like Twilio or Daily.
Daily
You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables DAILY_ROOM_URL and DAILY_API_KEY. Alternatively, you can let the example create a room for you (still needs DAILY_API_KEY environment variable). Then, start any example with -t daily:
uv run getting-started/06-voice-agent.py -t daily
Twilio
It is also possible to run the example through a Twilio phone number. You will need to setup a few things:
- Install and run ngrok.
ngrok http 7860
- Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.
Then, run the example with:
uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME
Directory Structure
getting-started/
Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling.
voice/
Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.)
function-calling/
Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.)
transcription/
Speech-to-text examples with various STT providers.
vision/
Image description and vision capabilities with different multimodal LLMs.
realtime/
Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok).
persistent-context/
Maintaining conversation context across sessions with different providers.
context-summarization/
Summarizing conversation context to manage token limits.
update-settings/
Changing service settings at runtime, organized by service type:
turn-management/
Turn detection, interruption handling, and user input management.
thinking-and-mcp/
LLM thinking/reasoning modes and MCP (Model Context Protocol) tool server integration.
transports/
Transport layer examples (WebRTC, Daily, LiveKit).
video-avatar/
Video avatar integrations (Tavus, HeyGen, Simli, LemonSlice).
video-processing/
Video processing, mirroring, GStreamer, and custom video tracks.
audio/
Audio recording, background sounds, and sound effects.
observability/
Pipeline monitoring: observers, heartbeats, and Sentry metrics.
rag/
Retrieval-augmented generation, grounding, and long-term memory (Mem0, Gemini).
features/
Miscellaneous features: wake phrases, live translation, service switching, voice switching, and more.
Advanced Usage
Customizing Network Settings
uv run python <example-name> --host 0.0.0.0 --port 8080
Troubleshooting
- No audio/video: Check browser permissions for microphone and camera
- Connection errors: Verify API keys in
.envfile - Port conflicts: Use
--portto change the port
For more examples, visit the pipecat-examples repository.