Files

Cale Shapera ec574edd53 Add Inworld Realtime Service (#4140 )

* Add Inworld Realtime LLM service

Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.

New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py

Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled

* Prefer init-provided system instruction in Inworld Realtime

Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.

* Update changelog entry with PR number

* Fix changelog format to use bullet point

* Polish PR: default model, example cleanup, changelog update

- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog

* Address PR review feedback for Inworld Realtime

- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
  _ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
  OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths

* Simplify example, add provider tracking, remove local VAD path

- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
  for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends

* Default TTS model to inworld-tts-1.5-max

* Remove dead shimmed tools code, set STT/VAD defaults

- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low

2026-04-09 13:04:17 -04:00

assets

Move foundational examples to examples/

2026-03-31 13:12:24 -04:00

audio

Fixing the background sound example.

2026-04-06 18:25:30 -03:00

context-summarization

Remove DeprecatedModuleProxy and service re-export shims

2026-04-03 13:43:02 -04:00

features

Remove deprecated text_aggregator and text_filter params from TTS

2026-04-01 17:03:05 -04:00

function-calling

Creating a new example for async stream using Google.

2026-04-09 09:50:00 -03:00

getting-started

Reorganize examples into topic-based subfolders

2026-03-31 13:12:24 -04:00

mcp

Update MCP examples

2026-04-02 18:15:56 -05:00

observability

Rename example files to prepend parent folder name, preventing package shadowing

2026-03-31 22:06:01 -04:00

persistent-context

Rename example files to prepend parent folder name, preventing package shadowing

2026-03-31 22:06:01 -04:00

rag

Rename example files to prepend parent folder name, preventing package shadowing

2026-03-31 22:06:01 -04:00

realtime

Add Inworld Realtime Service (#4140 )

2026-04-09 13:04:17 -04:00

thinking

Rename example files to prepend parent folder name, preventing package shadowing

2026-03-31 22:06:01 -04:00

transcription

Remove DeprecatedModuleProxy and service re-export shims

2026-04-03 13:43:02 -04:00

transports

Rename example files to prepend parent folder name, preventing package shadowing

2026-03-31 22:06:01 -04:00

turn-management

Use Parameters instead of Attributes in docstrings to fix duplicate object warnings

2026-04-03 10:36:36 -04:00

update-settings

Remove unused imports across codebase

2026-04-02 22:21:16 -04:00

video-avatar

Update simli name to match others

2026-03-31 22:54:21 -04:00

video-processing

Remove unused imports across codebase

2026-04-02 22:21:16 -04:00

vision

Rename example files to prepend parent folder name, preventing package shadowing

2026-03-31 22:06:01 -04:00

voice

Add mistral voice example

2026-04-07 12:32:06 -04:00

README.md

Rename services/ to voice/ and function-calling/, flatten to top level

2026-03-31 15:20:03 -04:00

README.md

Pipecat Examples

This directory contains examples showing how to build voice and multimodal agents with Pipecat.

Setup

Follow the README steps to get your local environment configured.

Run from root directory: Make sure you are running the steps from the root directory.

Using local audio?: The LocalAudioTransport requires a system dependency for portaudio. Install the dependency to use the transport.
Copy the env.example file and add API keys for services you plan to use:
```
cp env.example .env
# Edit .env with your API keys
```

Run any example:

uv run python getting-started/01-say-one-thing.py

Open the web interface at http://localhost:7860/client/ and click "Connect"

Running examples with other transports

Most examples support running with other transports, like Twilio or Daily.

Daily

You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables DAILY_ROOM_URL and DAILY_API_KEY. Alternatively, you can let the example create a room for you (still needs DAILY_API_KEY environment variable). Then, start any example with -t daily:

uv run getting-started/06-voice-agent.py -t daily

Twilio

It is also possible to run the example through a Twilio phone number. You will need to setup a few things:

Install and run ngrok.

ngrok http 7860

Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.

Then, run the example with:

uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME

Directory Structure

`getting-started/`

Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling.

`voice/`

Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.)

`function-calling/`

Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.)

`transcription/`

Speech-to-text examples with various STT providers.

`vision/`

Image description and vision capabilities with different multimodal LLMs.

`realtime/`

Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok).

`persistent-context/`

Maintaining conversation context across sessions with different providers.

`context-summarization/`

Summarizing conversation context to manage token limits.

`update-settings/`

Changing service settings at runtime, organized by service type:

stt/ — Speech-to-text settings
tts/ — Text-to-speech settings
llm/ — LLM settings

Advanced Usage

Customizing Network Settings

uv run python <example-name> --host 0.0.0.0 --port 8080

Troubleshooting

No audio/video: Check browser permissions for microphone and camera
Connection errors: Verify API keys in .env file
Port conflicts: Use --port to change the port

For more examples, visit the pipecat-examples repository.

README.md

Pipecat Examples

Setup

Running examples with other transports

Daily

Twilio

Directory Structure

`getting-started/`

`voice/`

`function-calling/`

`transcription/`

`vision/`

`realtime/`

`persistent-context/`

`context-summarization/`

`update-settings/`

`turn-management/`

`thinking-and-mcp/`

`transports/`

`video-avatar/`

`video-processing/`

`audio/`

`observability/`

`rag/`

`features/`

Advanced Usage

Customizing Network Settings

Troubleshooting