* initial config * skeleton * Added a README (to be added to). * Payloads coming from the ASR. * doc update * handle the partials and finals * enable diarization in the example * support sending messages to pipecat pipeline * requirements fix in README * updated example (with amusement) * updated example to match master * updated docs * support for diarization tags * logic fix for wrapper * Use an internal SpeechFrame for speaker_id (not user_id). * only include speaker tags on finalised transcript (as this may skew end of utterance detection) * updated docs * correction to docs and updated example * updated requirement * Fix for using default EU server. * Updates from PR comments. * Refactor based on comments in the original PR. Primary focus on documentation, naming conventions and how `user_id` is used. * Check for SMX installed when importing. * Variable name change * Comment correction. * Support for Esporanto and Uyghur * Impoved language support * function name change * Locale fix * intercept * interim changes * pass the pipeline task to the module for adding events to the top of the pipeline * logging for the pipeline * Reduce timeout for content aggregator. * staged update * testing with Azure * Updated context (Azure was dropping punctuation) and using better ElevenLabs model. * Updated to RT 0.3.0 and use OpenAI (not Azure). * Missing OpenAI import; parameter name change for output locale validation. * Revert to `0.2.0` of RT SDK. * fix for assignment of `output_locale_code`. * update Speechmatics library to 0.3.1 * new transcription example * updated asyncio task handling * Updated doc strings * enable OpenTelemetry logging * removed import from stt for __init__ * updated examples and default values * updated examples * prevent lock up when closing the STT connection
22 KiB
🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
Want to dive right in? Install Pipecat then try the quickstart.
🚀 What You Can Build
- Voice Assistants – natural, streaming conversations with AI
- AI Companions – coaches, meeting assistants, characters
- Multimodal Interfaces – voice, video, images, and more
- Interactive Storytelling – creative tools with generative media
- Business Agents – customer intake, support bots, guided flows
- Complex Dialog Systems – design logic with structured conversations
🧭 Looking to build structured conversations? Check out Pipecat Flows for managing complex conversational states and transitions.
🧠 Why Pipecat?
- Voice-first: Integrates speech recognition, text-to-speech, and conversation handling
- Pluggable: Supports many AI services and tools
- Composable Pipelines: Build complex behavior from modular components
- Real-Time: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)
🎬 See it in action
📱 Client SDKs
You can connect to Pipecat from any platform using our official SDKs:
| Platform | SDK Repo | Description |
|---|---|---|
| Web | pipecat-client-web | JavaScript and React client SDKs |
| iOS | pipecat-client-ios | Swift SDK for iOS |
| Android | pipecat-client-android | Kotlin SDK for Android |
| C++ | pipecat-client-cxx | C++ client SDK |
🧩 Available services
| Category | Services |
|---|---|
| Speech-to-Text | AssemblyAI, AWS, Azure, Cartesia, Deepgram, Fal Wizper, Gladia, Google, Groq (Whisper), OpenAI (Whisper), Parakeet (NVIDIA), SambaNova (Whisper), Speechmatics, Ultravox, Whisper |
| LLMs | Anthropic, AWS, Azure, Cerebras, DeepSeek, Fireworks AI, Gemini, Grok, Groq, NVIDIA NIM, Ollama, OpenAI, OpenRouter, Perplexity, Qwen, SambaNova Together AI |
| Text-to-Speech | AWS, Azure, Cartesia, Deepgram, ElevenLabs, FastPitch (NVIDIA), Fish, Google, LMNT, MiniMax, Neuphonic, OpenAI, Piper, PlayHT, Rime, Sarvam, XTTS |
| Speech-to-Speech | AWS Nova Sonic, Gemini Multimodal Live, OpenAI Realtime |
| Transport | Daily (WebRTC), FastAPI Websocket, SmallWebRTCTransport, WebSocket Server, Local |
| Serializers | Plivo, Twilio, Telnyx |
| Video | Tavus, Simli |
| Memory | mem0 |
| Vision & Image | fal, Google Imagen, Moondream |
| Audio Processing | Silero VAD, Krisp, Koala, Noisereduce |
| Analytics & Metrics | OpenTelemetry, Sentry |
📚 View full services documentation →
⚡ Getting started
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready.
# Install the module
pip install pipecat-ai
# Set up your environment
cp dot-env.template .env
To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:
pip install "pipecat-ai[option,...]"
🧪 Code examples
- Foundational — small snippets that build on each other, introducing one or two concepts at a time
- Example apps — complete applications that you can use as starting points for development
🛠️ Hacking on the framework itself
-
Set up a virtual environment before following these instructions. From the root of the repo:
python3 -m venv venv source venv/bin/activate -
Install the development dependencies:
pip install -r dev-requirements.txt -
Install the git pre-commit hooks (these help ensure your code follows project rules):
pre-commit install -
Install the
pipecat-aipackage locally in editable mode:pip install -e .The
-eor--editableoption allows you to modify the code without reinstalling. -
Include optional dependencies as needed. For example:
pip install -e ".[daily,deepgram,cartesia,openai,silero]" -
(Optional) If you want to use this package from another directory:
pip install "path_to_this_repo[option,...]"
Running tests
Install the test dependencies:
pip install -r test-requirements.txt
From the root directory, run:
pytest
Setting up your editor
This project uses strict PEP 8 formatting via Ruff.
Emacs
You can use use-package to install emacs-lazy-ruff package and configure ruff arguments:
(use-package lazy-ruff
:ensure t
:hook ((python-mode . lazy-ruff-mode))
:config
(setq lazy-ruff-format-command "ruff format")
(setq lazy-ruff-check-command "ruff check --select I"))
ruff was installed in the venv environment described before, so you should be able to use pyvenv-auto to automatically load that environment inside Emacs.
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
Visual Studio Code
Install the
Ruff extension. Then edit the user settings (Ctrl-Shift-P Open User Settings (JSON)) and set it as the default Python formatter, and enable formatting on save:
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true
}
PyCharm
ruff was installed in the venv environment described before, now to enable autoformatting on save, go to File -> Settings -> Tools -> File Watchers and add a new watcher with the following settings:
- Name:
Ruff formatter - File type:
Python - Working directory:
$ContentRoot$ - Arguments:
format $FilePath$ - Program:
$PyInterpreterDirectory$/ruff
🤝 Contributing
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:
- Found a bug? Open an issue
- Have a feature idea? Start a discussion
- Want to contribute code? Check our CONTRIBUTING.md guide
- Documentation improvements? Docs PRs are always welcome
Before submitting a pull request, please check existing issues and PRs to avoid duplicates.
We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.




