Does not (yet) touch `InputParams`, to avoid scope creep and touching something currently part of the public API. But there is a lot of overlap between `*Settings` object fields and `InputParams` fields. Other than discoverability/typing, these are some other improvements brought by this refactor: - There is now a single code path (see `_update_settings_from_typed`) where services can respond to settings changes (by, say, reconnecting if needed), improving maintainability and guaranteeing one and only one reconnection no matter which settings changed - `set_language`/`set_model`/`set_voice`—which we're assuming are usable as public methods, though *not* recommended over `*UpdateSettingsFrame`—all use the same code path as settings updates. They're also now all consistent in that, if a service needs to respond to a change (by, say, reconnecting if needed), any of these methods will kick off that process. Note that this is technically a behavior change. - Several services now properly react to changed settings by reconnecting: - `AWSTranscribeSTTService` - `AzureSTTService` - `SonioxSTTService` - `GladiaSTTService` - `SpeechmaticsSTTService` - `AssemblyAISTTService` - `CartesiaSTTService` - `FishAudioTTSService` (would previously only reconnect when `model` changed) - `GoogleSTTService` - `SpeechmaticsSTTService` (which previously only handled *some* settings updates through a nonstandard public `update_params` method) - `GradiumSTTService` - `NvidiaSegmentedSTTService` (which previously only handled changes to language) - Bookkeeping across various services has been reduced, mostly by deduping ivars; the `self._settings` ivar is treated as the source of truth NOTE: I pretty much guarantee that there are services missed in this PR in terms of bringing to consistency with how updates are handled (like whether changes in certain fields trigger reconnects when they need to). We can squash remaining inconsistencies as we stumble onto them, service by service. The goal here is to get things *mostly* in order, and establish the infrastructure and patterns we'll need going forward.
Pipecat Foundational Examples
This directory contains examples showing how to build voice and multimodal agents with Pipecat. Each example demonstrates specific features, progressing from basic to advanced concepts.
Setup
-
Follow the README steps to get your local environment configured.
Run from root directory: Make sure you are running the steps from the root directory.
Using local audio?: The
LocalAudioTransportrequires a system dependency forportaudio. Install the dependency to use the transport. -
Copy the
env.examplefile and add API keys for services you plan to use:cp env.example .env # Edit .env with your API keys -
Navigate to the examples directory if you aren't already there:
cd examples/foundational -
Run any example:
uv run python 01-say-one-thing.py -
Open the web interface at http://localhost:7860/client/ and click "Connect"
Running examples with other transports
Most examples support running with other transports, like Twilio or Daily.
Daily
You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables DAILY_ROOM_URL and DAILY_API_KEY. Alternatively, you can let the example create a room for you (still needs DAILY_API_KEY environment variable). Then, start any example with -t daily:
uv run 07-interruptible.py -t daily
Twilio
It is also possible to run the example through a Twilio phone number. You will need to setup a few things:
- Install and run ngrok.
ngrok http 7860
- Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.
Then, run the example with:
uv run 07-interruptible.py -t twilio -x NGROK_HOST_NAME
Examples by Feature
Basics
- 01-say-one-thing.py: Most basic bot that says one phrase and exits (Transport, TTS, Event handlers)
- 02-llm-say-one-thing.py: Bot generates a response with an LLM (LLM initialization)
- 03-still-frame.py: Displays a static image (Video transport, Image service)
- 04-transport.py: Different transport options (WebRTC, Daily, Livekit)
Conversational AI
- 07-interruptible.py: Basic voice assistant bot (STT, TTS, LLM, Interruptible speech)
- 10-wake-phrase.py: Bot activated by wake phrase (WakeCheckFilter)
- 22-natural-conversation.py: Smart turn detection (Multiple LLMs, Turn management)
- 38-smart-turn-fal.py: ML-based turn detection (Fal service, Local models)
Common Utilities
- 17-detect-user-idle.py: Handle inactive users (UserIdleProcessor)
- 24-user-mute-strategy.py: Selectively mute user input (LLMUserAggregator user mute strategies)
- 28-transcription-processor.py: Record conversation text (TranscriptProcessor)
- 30-observer.py: Access frame data (Custom observers)
- 31-heartbeats.py: Detect idle pipelines (Pipeline monitoring)
- 34-audio-recording.py: Record conversation audio (Composite and track-level recording)
Advanced LLM Features
- 14-function-calling.py: Bot with tool usage (Function schemas, Tool registration)
- 20a-persistent-context-openai.py: Persistent conversation context (Memory management)
- 32-gemini-grounding-metadata.py: Web search capabilities (Google search integration)
- 33-gemini-rag.py: Retrieval-augmented generation (Data sources, Grounding)
- 37-mem0.py: Long-term agent memory (Mem0 service integration)
Media Handling
- 05-sync-speech-and-images.py: Synchronized narration with images (Custom processors, SyncParallelPipeline)
- 06a-image-sync.py: Dynamic image updates while speaking (Synchronized A/V pipelines)
- 09-mirror.py: Mirror user's audio and video (Custom frame processors)
- 11-sound-effects.py: Add sounds when bot speaks (Sound playback, Event synchronization)
- 23-bot-background-sound.py: Play background audio (SoundfileMixer)
Vision & Multimodal
- 12a-describe-video-gemini-flash.py: Bot describes user's video (Video input, Multimodal LLMs)
- 26c-gemini-live-video.py: Gemini with video input (Streaming video, Function calls)
Voice & Language
- 13-transcription.py: Speech transcription demo (STT providers, Real-time transcription)
- 15-switch-voices.py: Dynamic voice/language changing (ParallelPipelines, FunctionFilters)
- 25-google-audio-in.py: Gemini for speech recognition (Alternative transcription)
- 35-pattern-pair-voice-switching.py: Dynamic TTS voice switching (XML parsing, PatternPairAggregator)
- 36-user-email-gathering.py: Spelling mode for TTS (Confirmation patterns, XML tags)
Integration Examples
- 18-gstreamer-filesrc.py: GStreamer video streaming (Video processing)
- 19-openai-realtime-beta.py: OpenAI Speech-to-Speech (Direct S2S, Function calls)
- 21-tavus-layer-tavus-transport.py: Tavus digital twin (Avatar integration)
- 27-simli-layer.py: Simli avatar integration (Video synchronization)
Performance & Optimization
- 16-gpu-container-local-bot.py: GPU-accelerated local bot (Performance measurement)
Advanced Usage
Customizing Network Settings
uv run python <example-name> --host 0.0.0.0 --port 8080
Troubleshooting
- No audio/video: Check browser permissions for microphone and camera
- Connection errors: Verify API keys in
.envfile - Port conflicts: Use
--portto change the port
For more examples, visit our the pipecat-examples repository.