Files
pipecat/examples
Paul Kompfner b14a03d01f fix: extend cancel_on_interruption=False regression fix to remaining realtime services
Applies the same async-tool message routing introduced for AWSNovaSonicLLMService
and OpenAIRealtimeLLMService to additional realtime LLM services where the
flag's intent ("keep talking while the tool runs") is achievable:

- GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok
  alias since it re-exports the xAI module)
- AzureRealtimeLLMService picks up the fix transitively by inheriting from
  OpenAIRealtimeLLMService — no code change needed.

GrokRealtimeLLMService's _process_completed_function_calls now matches
the canonical pattern: skip LLMSpecificMessage, detect async-tool messages
via parse_message and route them — started skipped silently, intermediate
logged as an error and surfaced via push_error, final delivered through
the same channel as a synchronous result.

UltravoxRealtimeLLMService instead gets a one-time warning when async-tool
messages appear in the context. The Ultravox API freezes the conversation
during tool execution
(https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the
flag's "keep talking while the tool runs" intent isn't achievable there —
applying the same code pattern would mislead users into expecting a UX
Ultravox can't deliver. Surfacing a clear warning is the right behavior
until Ultravox grows true async tool support.

Adds async-tool example files for Grok and Azure modeled on the existing
Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather
tool registered with cancel_on_interruption=False).

Two services remain excluded:

- GeminiLiveLLMService — the async-tool path needs deeper investigation.
- InworldRealtimeLLMService — appears to have a pre-existing problem
  with even simple synchronous tool calling on its Realtime API (the
  request reaches the server fine, but response generation fails with a
  generic server_error).
2026-05-08 15:43:53 -04:00
..
2026-04-21 15:43:31 -04:00
2026-04-21 15:43:31 -04:00
2026-04-21 15:43:31 -04:00
2026-04-21 15:43:31 -04:00
2026-04-27 16:04:02 -04:00

Pipecat Examples

This directory contains examples showing how to build voice and multimodal agents with Pipecat.

Setup

  1. Follow the README steps to get your local environment configured.

    Run from root directory: Make sure you are running the steps from the root directory.

    Using local audio?: The LocalAudioTransport requires a system dependency for portaudio. Install the dependency to use the transport.

  2. Copy the env.example file and add API keys for services you plan to use:

    cp env.example .env
    # Edit .env with your API keys
    
  3. Run any example:

    uv run python getting-started/01-say-one-thing.py
    
  4. Open the web interface at http://localhost:7860/client/ and click "Connect"

Running examples with other transports

Most examples support running with other transports, like Twilio or Daily.

Daily

You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables DAILY_ROOM_URL and DAILY_API_KEY. Alternatively, you can let the example create a room for you (still needs DAILY_API_KEY environment variable). Then, start any example with -t daily:

uv run getting-started/06-voice-agent.py -t daily

Twilio

It is also possible to run the example through a Twilio phone number. You will need to setup a few things:

  1. Install and run ngrok.
ngrok http 7860
  1. Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.

Then, run the example with:

uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME

Directory Structure

getting-started/

Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling.

voice/

Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.)

function-calling/

Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.)

transcription/

Speech-to-text examples with various STT providers.

vision/

Image description and vision capabilities with different multimodal LLMs.

realtime/

Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok).

persistent-context/

Maintaining conversation context across sessions with different providers.

context-summarization/

Summarizing conversation context to manage token limits.

update-settings/

Changing service settings at runtime, organized by service type:

  • stt/ — Speech-to-text settings
  • tts/ — Text-to-speech settings
  • llm/ — LLM settings

turn-management/

Turn detection, interruption handling, and user input management.

thinking-and-mcp/

LLM thinking/reasoning modes and MCP (Model Context Protocol) tool server integration.

transports/

Transport layer examples (WebRTC, Daily, LiveKit).

video-avatar/

Video avatar integrations (Tavus, HeyGen, Simli, LemonSlice).

video-processing/

Video processing, mirroring, GStreamer, and custom video tracks.

audio/

Audio recording, background sounds, and sound effects.

observability/

Pipeline monitoring: observers, heartbeats, and Sentry metrics.

rag/

Retrieval-augmented generation, grounding, and long-term memory (Mem0, Gemini).

features/

Miscellaneous features: wake phrases, live translation, service switching, voice switching, and more.

Advanced Usage

Customizing Network Settings

uv run python <example-name> --host 0.0.0.0 --port 8080

Troubleshooting

  • No audio/video: Check browser permissions for microphone and camera
  • Connection errors: Verify API keys in .env file
  • Port conflicts: Use --port to change the port

For more examples, visit the pipecat-examples repository.