Pipecat Foundational Examples
This directory contains examples showing how to build voice and multimodal agents with Pipecat. Each example demonstrates specific features, progressing from basic to advanced concepts.
Learning Paths
Depending on what you're trying to build, these learning paths will guide you through relevant examples:
- New to Pipecat: Start with examples 01, 02, 07
- Building conversational bots: 07, 10, 38
- Common add-on capabilities: 17, 24, 28, 34
- Adding visual capabilities: 03, 12, 26
- Advanced agent capabilities: 14, 20, 37
Quick Start
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Create a
.envfile with your API keys. -
Run any example:
python run.py 01-say-one-thing.py -
Open the web interface at http://localhost:7860 and click "Connect"
Examples by Feature
Basics
- 01-say-one-thing.py: Most basic bot that says one phrase and exits (Transport, TTS, Event handlers)
- 02-llm-say-one-thing.py: Bot generates a response with an LLM (LLM initialization)
- 03-still-frame.py: Displays a static image (Video transport, Image service)
- 04-transport.py: Different transport options (WebRTC, Daily, Livekit)
Conversational AI
- 07-interruptible.py: Basic voice assistant bot (STT, TTS, LLM, Interruptible speech)
- 10-wake-phrase.py: Bot activated by wake phrase (WakeCheckFilter)
- 22-natural-conversation.py: Smart turn detection (Multiple LLMs, Turn management)
- 38-smart-turn-fal.py: ML-based turn detection (Fal service, Local models)
Common Utilities
- 17-detect-user-idle.py: Handle inactive users (UserIdleProcessor)
- 24-stt-mute-filter.py: Selectively mute user input (STTMuteFilter)
- 28-transcription-processor.py: Record conversation text (TranscriptProcessor)
- 30-observer.py: Access frame data (Custom observers)
- 31-heartbeats.py: Detect idle pipelines (Pipeline monitoring)
- 34-audio-recording.py: Record conversation audio (Composite and track-level recording)
Advanced LLM Features
- 14-function-calling.py: Bot with tool usage (Function schemas, Tool registration)
- 20a-persistent-context-openai.py: Persistent conversation context (Memory management)
- 32-gemini-grounding-metadata.py: Web search capabilities (Google search integration)
- 33-gemini-rag.py: Retrieval-augmented generation (Data sources, Grounding)
- 37-mem0.py: Long-term agent memory (Mem0 service integration)
Media Handling
- 05-sync-speech-and-images.py: Synchronized narration with images (Custom processors, SyncParallelPipeline)
- 06a-image-sync.py: Dynamic image updates while speaking (Synchronized A/V pipelines)
- 09-mirror.py: Mirror user's audio and video (Custom frame processors)
- 11-sound-effects.py: Add sounds when bot speaks (Sound playback, Event synchronization)
- 23-bot-background-sound.py: Play background audio (SoundfileMixer)
Vision & Multimodal
- 12a-describe-video-gemini-flash.py: Bot describes user's video (Video input, Multimodal LLMs)
- 26c-gemini-multimodal-live-video.py: Gemini with video input (Streaming video, Function calls)
Voice & Language
- 13-transcription.py: Speech transcription demo (STT providers, Real-time transcription)
- 15-switch-voices.py: Dynamic voice/language changing (ParallelPipelines, FunctionFilters)
- 25-google-audio-in.py: Gemini for speech recognition (Alternative transcription)
- 35-pattern-pair-voice-switching.py: Dynamic TTS voice switching (XML parsing, PatternPairAggregator)
- 36-user-email-gathering.py: Spelling mode for TTS (Confirmation patterns, XML tags)
Integration Examples
- 18-gstreamer-filesrc.py: GStreamer video streaming (Video processing)
- 19-openai-realtime-beta.py: OpenAI Speech-to-Speech (Direct S2S, Function calls)
- 21-tavus-layer.py: Tavus digital twin (Avatar integration)
- 27-simli-layer.py: Simli avatar integration (Video synchronization)
Performance & Optimization
- 16-gpu-container-local-bot.py: GPU-accelerated local bot (Performance measurement)
Utilities
Advanced Usage
Customizing Network Settings
python run.py <example-name> --host 0.0.0.0 --port 8080
Troubleshooting
- No audio/video: Check browser permissions for microphone and camera
- Connection errors: Verify API keys in
.envfile - Missing dependencies: Run
pip install -r requirements.txt - Port conflicts: Use
--portto change the port
For more examples, visit our GitHub repository.