Files

Paul Kompfner 47e2f7a037 realtime + local turn detection: drop the user-transcript wait

Add the configuration surface to drive a realtime service like Gemini
Live from local turn detection without paying user-transcript latency.
Cascaded pipelines wait for a transcript before ending the user's turn
because the downstream LLM needs the user's words recorded in context
— but that wait is pure latency in pipelines using local turn
detection to drive a realtime service, which consumes user audio
directly.

Set `wait_for_transcript_to_end_user_turn=False` on
`LLMUserAggregatorParams` to turn this on. With that single flag the
aggregator:

- drops `TranscriptionUserTurnStartStrategy` from the start strategies
  (so late-arriving realtime transcripts don't trigger new turns),
- sets `wait_for_transcript=False` on any stop strategy that supports
  it (so the turn ends on the audible end of the turn, without
  waiting for a transcript),
- fires `on_user_turn_stopped` on the audible end of the turn with
  empty `content` (since the transcript hasn't arrived), and
- defers the context flush until the transcript arrives or a backstop
  timer fires.

A new `on_user_turn_message_finalized` event fires when the user's
message has been written to context. In the default mode it
coincides with `on_user_turn_stopped`; in the delayed-transcript mode
it fires later. Consumers that want the populated transcript should
subscribe to `on_user_turn_message_finalized` — it's the event that
always carries the user message, regardless of mode.

Strategy mutations are logged: loudly when the user passed their own
strategies (we're overwriting parts of their config), quietly
otherwise. The strategy-level `wait_for_transcript` parameter on
`TurnAnalyzerUserTurnStopStrategy` and `SpeechTimeoutUserTurnStopStrategy`
remains exposed for advanced cases.

The example `realtime-gemini-live-local-vad.py` demonstrates the full
pattern.

2026-05-15 13:49:16 -04:00

assets

Move foundational examples to examples/

2026-03-31 13:12:24 -04:00

audio

Include examples in type checking

2026-04-21 15:43:31 -04:00

context-summarization

Include examples in type checking

2026-04-21 15:43:31 -04:00

features

Mitigate tool-call-related hallucination

2026-05-05 13:02:43 -04:00

function-calling

Update OpenAI realtime transcription default

2026-05-12 15:20:57 -04:00

getting-started

Include examples in type checking

2026-04-21 15:43:31 -04:00

mcp

Include examples in type checking

2026-04-21 15:43:31 -04:00

observability

Include examples in type checking

2026-04-21 15:43:31 -04:00

persistent-context

Include examples in type checking

2026-04-21 15:43:31 -04:00

rag

Include examples in type checking

2026-04-21 15:43:31 -04:00

realtime

realtime + local turn detection: drop the user-transcript wait

2026-05-15 13:49:16 -04:00

thinking

Include examples in type checking

2026-04-21 15:43:31 -04:00

transcription

Update OpenAI realtime transcription default

2026-05-12 15:20:57 -04:00

transports

Include examples in type checking

2026-04-21 15:43:31 -04:00

turn-management

Correct docstrings and comments regarding incomplete_long_timeout duration, 10 sec

2026-05-07 17:47:41 -07:00

update-settings

Include examples in type checking

2026-04-21 15:43:31 -04:00

video-avatar

Include examples in type checking

2026-04-21 15:43:31 -04:00

video-processing

Include examples in type checking

2026-04-21 15:43:31 -04:00

vision

Include examples in type checking

2026-04-21 15:43:31 -04:00

voice

Merge pull request #4450 from pipecat-ai/mb/gpt-realtime-whisper

2026-05-13 09:48:33 -04:00

README.md

Rename services/ to voice/ and function-calling/, flatten to top level

2026-03-31 15:20:03 -04:00

README.md

Pipecat Examples

This directory contains examples showing how to build voice and multimodal agents with Pipecat.

Setup

Follow the README steps to get your local environment configured.

Run from root directory: Make sure you are running the steps from the root directory.

Using local audio?: The LocalAudioTransport requires a system dependency for portaudio. Install the dependency to use the transport.
Copy the env.example file and add API keys for services you plan to use:
```
cp env.example .env
# Edit .env with your API keys
```

Run any example:

uv run python getting-started/01-say-one-thing.py

Open the web interface at http://localhost:7860/client/ and click "Connect"

Running examples with other transports

Most examples support running with other transports, like Twilio or Daily.

Daily

You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables DAILY_ROOM_URL and DAILY_API_KEY. Alternatively, you can let the example create a room for you (still needs DAILY_API_KEY environment variable). Then, start any example with -t daily:

uv run getting-started/06-voice-agent.py -t daily

Twilio

It is also possible to run the example through a Twilio phone number. You will need to setup a few things:

Install and run ngrok.

ngrok http 7860

Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.

Then, run the example with:

uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME

Directory Structure

`getting-started/`

Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling.

`voice/`

Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.)

`function-calling/`

Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.)

`transcription/`

Speech-to-text examples with various STT providers.

`vision/`

Image description and vision capabilities with different multimodal LLMs.

`realtime/`

Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok).

`persistent-context/`

Maintaining conversation context across sessions with different providers.

`context-summarization/`

Summarizing conversation context to manage token limits.

`update-settings/`

Changing service settings at runtime, organized by service type:

stt/ — Speech-to-text settings
tts/ — Text-to-speech settings
llm/ — LLM settings

Advanced Usage

Customizing Network Settings

uv run python <example-name> --host 0.0.0.0 --port 8080

Troubleshooting

No audio/video: Check browser permissions for microphone and camera
Connection errors: Verify API keys in .env file
Port conflicts: Use --port to change the port

For more examples, visit the pipecat-examples repository.

README.md

Pipecat Examples

Setup

Running examples with other transports

Daily

Twilio

Directory Structure

`getting-started/`

`voice/`

`function-calling/`

`transcription/`

`vision/`

`realtime/`

`persistent-context/`

`context-summarization/`

`update-settings/`

`turn-management/`

`thinking-and-mcp/`

`transports/`

`video-avatar/`

`video-processing/`

`audio/`

`observability/`

`rag/`

`features/`

Advanced Usage

Customizing Network Settings

Troubleshooting