From 4b364dda29423beb0dc731abe4598558c19e1c38 Mon Sep 17 00:00:00 2001 From: Mark Backman Date: Thu, 24 Apr 2025 18:04:36 -0400 Subject: [PATCH] Update foundational README with ToC --- examples/foundational/README.md | 121 ++++++++++++++++++++++++-------- 1 file changed, 91 insertions(+), 30 deletions(-) diff --git a/examples/foundational/README.md b/examples/foundational/README.md index 622c06875..14323d189 100644 --- a/examples/foundational/README.md +++ b/examples/foundational/README.md @@ -1,61 +1,122 @@ # Pipecat Foundational Examples -This directory contains foundational examples showing how to use Pipecat to build voice and multimodal agents. Each example demonstrates specific features of the framework, building from basic to more complex concepts. +This directory contains examples showing how to build voice and multimodal agents with Pipecat. Each example demonstrates specific features, progressing from basic to advanced concepts. -## Prerequisites +## Learning Paths -1. If you haven't already, set up a virtual environment: +Depending on what you're trying to build, these learning paths will guide you through relevant examples: + +- **New to Pipecat**: Start with examples [01](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/01-say-one-thing.py), [02](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/02-llm-say-one-thing.py), [07](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07-interruptible.py) +- **Building conversational bots**: [07](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07-interruptible.py), [10](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/10-wake-phrase.py), [38](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/38-smart-turn-fal.py) +- **Common add-on capabilities**: [17](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/17-detect-user-idle.py), [24](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/24-stt-mute-filter.py), [28](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/28-transcription-processor.py), [34](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/34-audio-recording.py) +- **Adding visual capabilities**: [03](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/03-still-frame.py), [12](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/12a-describe-video-gemini-flash.py), [26](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/26c-gemini-multimodal-live-video.py) +- **Advanced agent capabilities**: [14](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/14-function-calling.py), [20](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/20a-persistent-context-openai.py), [37](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/37-mem0.py) + +## Quick Start + +1. Set up a virtual environment: ```bash python -m venv venv - source venv/bin/activate + source venv/bin/activate # On Windows: venv\Scripts\activate ``` -2. Install Pipecat with the required dependencies: +2. Install dependencies: ```bash pip install -r requirements.txt ``` -3. Set up an `.env` file with the API keys of services you'll run. +3. Create a `.env` file with your API keys. -## Running the examples - -Each example is a self-contained bot that runs using our built-in `run.py` FastAPI server. The server automatically starts a web interface ([Pipecat SmallWebRTC Prebuilt](https://pypi.org/project/pipecat-ai-small-webrtc-prebuilt/)) that allows you to interact with the bot via WebRTC in your browser. - -1. **Run a specific example**: +4. Run any example: ```bash - python + python run.py 01-say-one-thing.py ``` - For example: +5. Open the web interface at http://localhost:7860 and click "Connect" - ```bash - python 07-interruptible.py - ``` +## Examples by Feature -2. **Open the web app** at the URL displayed in the console: +### Basics - ``` - Open your browser to: http://localhost:7860 - ``` +- **[01-say-one-thing.py](./01-say-one-thing.py)**: Most basic bot that says one phrase and exits (Transport, TTS, Event handlers) +- **[02-llm-say-one-thing.py](./02-llm-say-one-thing.py)**: Bot generates a response with an LLM (LLM initialization) +- **[03-still-frame.py](./03-still-frame.py)**: Displays a static image (Video transport, Image service) +- **[04-transport.py](./04-transport.py)**: Different transport options (WebRTC, Daily, Livekit) -3. **Start the example**: +### Conversational AI - Click the "Connect" button in the web interface, grant camera/microphone permissions when prompted, and start interacting with the bot. +- **[07-interruptible.py](./07-interruptible.py)**: Basic voice assistant bot (STT, TTS, LLM, Interruptible speech) +- **[10-wake-phrase.py](./10-wake-phrase.py)**: Bot activated by wake phrase (WakeCheckFilter) +- **[22-natural-conversation.py](./22-natural-conversation.py)**: Smart turn detection (Multiple LLMs, Turn management) +- **[38-smart-turn-fal.py](./38-smart-turn-fal.py)**: ML-based turn detection (Fal service, Local models) -## Troubleshooting +### Common Utilities -- **No audio or video**: Make sure your browser permissions for microphone and camera are granted -- **Connection errors**: Check that your API keys are correctly set in the `.env` file -- **Missing dependencies**: Ensure you've installed all required dependencies with `pip install -r requirements.txt` -- **Port already in use**: Change the port with `--port ` if the default port is unavailable +- **[17-detect-user-idle.py](./17-detect-user-idle.py)**: Handle inactive users (UserIdleProcessor) +- **[24-stt-mute-filter.py](./24-stt-mute-filter.py)**: Selectively mute user input (STTMuteFilter) +- **[28-transcription-processor.py](./28-transcription-processor.py)**: Record conversation text (TranscriptProcessor) +- **[30-observer.py](./30-observer.py)**: Access frame data (Custom observers) +- **[31-heartbeats.py](./31-heartbeats.py)**: Detect idle pipelines (Pipeline monitoring) +- **[34-audio-recording.py](./34-audio-recording.py)**: Record conversation audio (Composite and track-level recording) -### Customizing the network interface +### Advanced LLM Features -If you have conflict on a host or port, you can customize using: +- **[14-function-calling.py](./14-function-calling.py)**: Bot with tool usage (Function schemas, Tool registration) +- **[20a-persistent-context-openai.py](./20a-persistent-context-openai.py)**: Persistent conversation context (Memory management) +- **[32-gemini-grounding-metadata.py](./32-gemini-grounding-metadata.py)**: Web search capabilities (Google search integration) +- **[33-gemini-rag.py](./33-gemini-rag.py)**: Retrieval-augmented generation (Data sources, Grounding) +- **[37-mem0.py](./37-mem0.py)**: Long-term agent memory (Mem0 service integration) + +### Media Handling + +- **[05-sync-speech-and-images.py](./05-sync-speech-and-images.py)**: Synchronized narration with images (Custom processors, SyncParallelPipeline) +- **[06a-image-sync.py](./06a-image-sync.py)**: Dynamic image updates while speaking (Synchronized A/V pipelines) +- **[09-mirror.py](./09-mirror.py)**: Mirror user's audio and video (Custom frame processors) +- **[11-sound-effects.py](./11-sound-effects.py)**: Add sounds when bot speaks (Sound playback, Event synchronization) +- **[23-bot-background-sound.py](./23-bot-background-sound.py)**: Play background audio (SoundfileMixer) + +### Vision & Multimodal + +- **[12a-describe-video-gemini-flash.py](./12a-describe-video-gemini-flash.py)**: Bot describes user's video (Video input, Multimodal LLMs) +- **[26c-gemini-multimodal-live-video.py](./26c-gemini-multimodal-live-video.py)**: Gemini with video input (Streaming video, Function calls) + +### Voice & Language + +- **[13-transcription.py](./13-transcription.py)**: Speech transcription demo (STT providers, Real-time transcription) +- **[15-switch-voices.py](./15-switch-voices.py)**: Dynamic voice/language changing (ParallelPipelines, FunctionFilters) +- **[25-google-audio-in.py](./25-google-audio-in.py)**: Gemini for speech recognition (Alternative transcription) +- **[35-pattern-pair-voice-switching.py](./35-pattern-pair-voice-switching.py)**: Dynamic TTS voice switching (XML parsing, PatternPairAggregator) +- **[36-user-email-gathering.py](./36-user-email-gathering.py)**: Spelling mode for TTS (Confirmation patterns, XML tags) + +### Integration Examples + +- **[18-gstreamer-filesrc.py](./18-gstreamer-filesrc.py)**: GStreamer video streaming (Video processing) +- **[19-openai-realtime-beta.py](./19-openai-realtime-beta.py)**: OpenAI Speech-to-Speech (Direct S2S, Function calls) +- **[21-tavus-layer.py](./21-tavus-layer.py)**: Tavus digital twin (Avatar integration) +- **[27-simli-layer.py](./27-simli-layer.py)**: Simli avatar integration (Video synchronization) + +### Performance & Optimization + +- **[16-gpu-container-local-bot.py](./16-gpu-container-local-bot.py)**: GPU-accelerated local bot (Performance measurement) + +### Utilities + +## Advanced Usage + +### Customizing Network Settings ```bash -python --host 0.0.0.0 --port 8080 +python run.py --host 0.0.0.0 --port 8080 ``` + +### Troubleshooting + +- **No audio/video**: Check browser permissions for microphone and camera +- **Connection errors**: Verify API keys in `.env` file +- **Missing dependencies**: Run `pip install -r requirements.txt` +- **Port conflicts**: Use `--port` to change the port + +For more examples, visit our [GitHub repository](https://github.com/pipecat-ai/pipecat/tree/main/examples).