Files
pipecat/examples/foundational/README.md
2026-03-02 17:52:39 -05:00

145 lines
6.9 KiB
Markdown

# Pipecat Foundational Examples
This directory contains examples showing how to build voice and multimodal agents with Pipecat. Each example demonstrates specific features, progressing from basic to advanced concepts.
## Setup
1. Follow the [README](https://github.com/pipecat-ai/pipecat/blob/main/README.md#%EF%B8%8F-contributing-to-the-framework) steps to get your local environment configured.
> **Run from root directory**: Make sure you are running the steps from the root directory.
> **Using local audio?**: The `LocalAudioTransport` requires a system dependency for `portaudio`. Install the dependency to use the transport.
2. Copy the [`env.example`](../../env.example) file and add API keys for services you plan to use:
```bash
cp env.example .env
# Edit .env with your API keys
```
3. Navigate to the examples directory if you aren't already there:
```bash
cd examples/foundational
```
4. Run any example:
```bash
uv run python 01-say-one-thing.py
```
5. Open the web interface at http://localhost:7860/client/ and click "Connect"
## Running examples with other transports
Most examples support running with other transports, like Twilio or Daily.
### Daily
You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables `DAILY_ROOM_URL` and `DAILY_API_KEY`. Alternatively, you can let the example create a room for you (still needs `DAILY_API_KEY` environment variable). Then, start any example with `-t daily`:
```bash
uv run 07-interruptible.py -t daily
```
### Twilio
It is also possible to run the example through a Twilio phone number. You will need to setup a few things:
1. Install and run [ngrok](https://ngrok.com/download).
```bash
ngrok http 7860
```
2. Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.
Then, run the example with:
```bash
uv run 07-interruptible.py -t twilio -x NGROK_HOST_NAME
```
## Examples by Feature
### Basics
- **[01-say-one-thing.py](./01-say-one-thing.py)**: Most basic bot that says one phrase and exits (Transport, TTS, Event handlers)
- **[02-llm-say-one-thing.py](./02-llm-say-one-thing.py)**: Bot generates a response with an LLM (LLM initialization)
- **[03-still-frame.py](./03-still-frame.py)**: Displays a static image (Video transport, Image service)
- **[04-transport.py](./04-transport.py)**: Different transport options (WebRTC, Daily, Livekit)
### Conversational AI
- **[07-interruptible.py](./07-interruptible.py)**: Basic voice assistant bot (STT, TTS, LLM, Interruptible speech)
- **[10-wake-phrase.py](./10-wake-phrase.py)**: Bot activated by wake phrase (WakeCheckFilter)
- **[22-natural-conversation.py](./22-natural-conversation.py)**: Smart turn detection (Multiple LLMs, Turn management)
- **[38-smart-turn-fal.py](./38-smart-turn-fal.py)**: ML-based turn detection (Fal service, Local models)
### Common Utilities
- **[17-detect-user-idle.py](./17-detect-user-idle.py)**: Handle inactive users (UserIdleProcessor)
- **[24-user-mute-strategy.py](./24-user-mute-strategy.py)**: Selectively mute user input (LLMUserAggregator user mute strategies)
- **[28-transcription-processor.py](./28-transcription-processor.py)**: Record conversation text (TranscriptProcessor)
- **[30-observer.py](./30-observer.py)**: Access frame data (Custom observers)
- **[31-heartbeats.py](./31-heartbeats.py)**: Detect idle pipelines (Pipeline monitoring)
- **[34-audio-recording.py](./34-audio-recording.py)**: Record conversation audio (Composite and track-level recording)
### Advanced LLM Features
- **[14-function-calling.py](./14-function-calling.py)**: Bot with tool usage (Function schemas, Tool registration)
- **[20a-persistent-context-openai.py](./20a-persistent-context-openai.py)**: Persistent conversation context (Memory management)
- **[32-gemini-grounding-metadata.py](./32-gemini-grounding-metadata.py)**: Web search capabilities (Google search integration)
- **[33-gemini-rag.py](./33-gemini-rag.py)**: Retrieval-augmented generation (Data sources, Grounding)
- **[37-mem0.py](./37-mem0.py)**: Long-term agent memory (Mem0 service integration)
### Media Handling
- **[05-sync-speech-and-images.py](./05-sync-speech-and-images.py)**: Synchronized narration with images (Custom processors, SyncParallelPipeline)
- **[06a-image-sync.py](./06a-image-sync.py)**: Dynamic image updates while speaking (Synchronized A/V pipelines)
- **[09-mirror.py](./09-mirror.py)**: Mirror user's audio and video (Custom frame processors)
- **[11-sound-effects.py](./11-sound-effects.py)**: Add sounds when bot speaks (Sound playback, Event synchronization)
- **[23-bot-background-sound.py](./23-bot-background-sound.py)**: Play background audio (SoundfileMixer)
### Vision & Multimodal
- **[12a-describe-video-gemini-flash.py](./12a-describe-video-gemini-flash.py)**: Bot describes user's video (Video input, Multimodal LLMs)
- **[26c-gemini-live-video.py](./26c-gemini-live-video.py)**: Gemini with video input (Streaming video, Function calls)
### Voice & Language
- **[13-transcription.py](./13-transcription.py)**: Speech transcription demo (STT providers, Real-time transcription)
- **[15-switch-voices.py](./15-switch-voices.py)**: Dynamic voice/language changing (ParallelPipelines, FunctionFilters)
- **[25-google-audio-in.py](./25-google-audio-in.py)**: Gemini for speech recognition (Alternative transcription)
- **[35-pattern-pair-voice-switching.py](./35-pattern-pair-voice-switching.py)**: Dynamic TTS voice switching (XML parsing, PatternPairAggregator)
- **[36-user-email-gathering.py](./36-user-email-gathering.py)**: Spelling mode for TTS (Confirmation patterns, XML tags)
### Integration Examples
- **[18-gstreamer-filesrc.py](./18-gstreamer-filesrc.py)**: GStreamer video streaming (Video processing)
- **[19-openai-realtime-beta.py](./19-openai-realtime-beta.py)**: OpenAI Speech-to-Speech (Direct S2S, Function calls)
- **[21-tavus-layer-tavus-transport.py](./21-tavus-layer-tavus-transport.py)**: Tavus digital twin (Avatar integration)
- **[27-simli-layer.py](./27-simli-layer.py)**: Simli avatar integration (Video synchronization)
- **[56-lemonslice-transport.py](./56-lemonslice-transport.py)**: LemonSlice avatar integration (A/V Synced Avatar integration)
### Performance & Optimization
- **[16-gpu-container-local-bot.py](./16-gpu-container-local-bot.py)**: GPU-accelerated local bot (Performance measurement)
## Advanced Usage
### Customizing Network Settings
```bash
uv run python <example-name> --host 0.0.0.0 --port 8080
```
### Troubleshooting
- **No audio/video**: Check browser permissions for microphone and camera
- **Connection errors**: Verify API keys in `.env` file
- **Port conflicts**: Use `--port` to change the port
For more examples, visit our the [pipecat-examples repository](https://github.com/pipecat-ai/pipecat-examples).