# Pipecat Examples This directory contains examples showing how to build voice and multimodal agents with Pipecat. ## Setup 1. Follow the [README](https://github.com/pipecat-ai/pipecat/blob/main/README.md#%EF%B8%8F-contributing-to-the-framework) steps to get your local environment configured. > **Run from root directory**: Make sure you are running the steps from the root directory. > **Using local audio?**: The `LocalAudioTransport` requires a system dependency for `portaudio`. Install the dependency to use the transport. 2. Copy the [`env.example`](../env.example) file and add API keys for services you plan to use: ```bash cp env.example .env # Edit .env with your API keys ``` 3. Run any example: ```bash uv run python getting-started/01-say-one-thing.py ``` 4. Open the web interface at http://localhost:7860/client/ and click "Connect" ## Running examples with other transports Most examples support running with other transports, like Twilio or Daily. ### Daily You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables `DAILY_ROOM_URL` and `DAILY_API_KEY`. Alternatively, you can let the example create a room for you (still needs `DAILY_API_KEY` environment variable). Then, start any example with `-t daily`: ```bash uv run getting-started/06-voice-agent.py -t daily ``` ### Twilio It is also possible to run the example through a Twilio phone number. You will need to setup a few things: 1. Install and run [ngrok](https://ngrok.com/download). ```bash ngrok http 7860 ``` 2. Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app. Then, run the example with: ```bash uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME ``` ## Directory Structure ### [`getting-started/`](./getting-started/) Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling. ### [`voice/`](./voice/) Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.) ### [`function-calling/`](./function-calling/) Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.) ### [`transcription/`](./transcription/) Speech-to-text examples with various STT providers. ### [`vision/`](./vision/) Image description and vision capabilities with different multimodal LLMs. ### [`realtime/`](./realtime/) Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok). ### [`persistent-context/`](./persistent-context/) Maintaining conversation context across sessions with different providers. ### [`context-summarization/`](./context-summarization/) Summarizing conversation context to manage token limits. ### [`update-settings/`](./update-settings/) Changing service settings at runtime, organized by service type: - **[`stt/`](./update-settings/stt/)** — Speech-to-text settings - **[`tts/`](./update-settings/tts/)** — Text-to-speech settings - **[`llm/`](./update-settings/llm/)** — LLM settings ### [`turn-management/`](./turn-management/) Turn detection, interruption handling, and user input management. ### [`thinking-and-mcp/`](./thinking-and-mcp/) LLM thinking/reasoning modes and MCP (Model Context Protocol) tool server integration. ### [`transports/`](./transports/) Transport layer examples (WebRTC, Daily, LiveKit). ### [`video-avatar/`](./video-avatar/) Video avatar integrations (Tavus, HeyGen, Simli, LemonSlice). ### [`video-processing/`](./video-processing/) Video processing, mirroring, GStreamer, and custom video tracks. ### [`audio/`](./audio/) Audio recording, background sounds, and sound effects. ### [`observability/`](./observability/) Pipeline monitoring: observers, heartbeats, and Sentry metrics. ### [`rag/`](./rag/) Retrieval-augmented generation, grounding, and long-term memory (Mem0, Gemini). ### [`features/`](./features/) Miscellaneous features: wake phrases, live translation, service switching, voice switching, and more. ## Advanced Usage ### Customizing Network Settings ```bash uv run python --host 0.0.0.0 --port 8080 ``` ### Troubleshooting - **No audio/video**: Check browser permissions for microphone and camera - **Connection errors**: Verify API keys in `.env` file - **Port conflicts**: Use `--port` to change the port For more examples, visit the [pipecat-examples repository](https://github.com/pipecat-ai/pipecat-examples).