6.5 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
Common Commands
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra krisp
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
Architecture
Frame-Based Pipeline Processing
All data flows as Frame objects through a pipeline of FrameProcessors:
Transport Input → Pipeline Source → [Processor1] → [Processor2] → ... → Pipeline Sink → Transport Output
Key components:
-
Frames (
src/pipecat/frames/frames.py): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors). -
FrameProcessor (
src/pipecat/processors/frame_processor.py): Base processing unit. Each processor receives frames, processes them, and pushes results downstream. -
Pipeline (
src/pipecat/pipeline/pipeline.py): Chains processors together. -
ParallelPipeline (
src/pipecat/pipeline/parallel_pipeline.py): Runs multiple pipelines in parallel. -
Transports (
src/pipecat/transports/): External I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface viaBaseTransport. -
Services (
src/pipecat/services/): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes:AIService,LLMService,STTService,TTSService,VisionService. -
Serializers (
src/pipecat/serializers/): Convert frames to/from wire formats for WebSocket transports.FrameSerializerbase class definesserialize()anddeserialize(). Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law). -
RTVI (
src/pipecat/processors/frameworks/rtvi.py): Real-Time Voice Interface protocol bridging clients and the pipeline.RTVIProcessorhandles incoming client messages (text input, audio, function call results).RTVIObserverconverts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
Important Patterns
-
Context Aggregation:
LLMContextaccumulates messages for LLM calls;UserResponseaggregates user input -
Turn Management: Turn management is done through
LLMUserAggregatorandLLMAssistantAggregator, created withLLMContextAggregatorPair -
User turn strategies: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push
UserStartedSpeakingFrameandUserStoppedSpeakingFramerespectively. -
Interruptions: Interruptions are usually triggered by a user turn start strategy (e.g.
VADUserTurnStartStrategy) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. AnInterruptionFramecarries an optionalasyncio.Eventthat is set when the frame reaches the pipeline sink. If a processor stops anInterruptionFramefrom propagating downstream (i.e., doesn't push it), it must callframe.complete()to avoid stallingpush_interruption_task_frame_and_wait()callers. -
Uninterruptible Frames: These are frames that will not be removed from internal queues even if there's an interruption. For example,
EndFrameandStopFrame. -
Events: Most classes in Pipecat have
BaseObjectas the very base class.BaseObjecthas support for events. Events can run in the background in an async task (default) or synchronously (sync=True) if we want immediate action. Synchronous event handlers need to exectue fast.
Key Directories
| Directory | Purpose |
|---|---|
src/pipecat/frames/ |
Frame definitions (100+ types) |
src/pipecat/processors/ |
FrameProcessor base + aggregators, filters, audio |
src/pipecat/pipeline/ |
Pipeline orchestration |
src/pipecat/services/ |
AI service integrations (60+ providers) |
src/pipecat/transports/ |
Transport layer (Daily, LiveKit, WebSocket, Local) |
src/pipecat/serializers/ |
Frame serialization for WebSocket protocols |
src/pipecat/audio/ |
VAD, filters, mixers, turn detection, DTMF |
src/pipecat/turns/ |
User turn management |
Code Style
- Docstrings: Google-style. Classes describe purpose;
__init__hasArgs:section; dataclasses useParameters:section. - Linting: Ruff (line length 100). Pre-commit hooks enforce formatting.
- Type hints: Required for complex async code.
Docstring Example
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
Service Implementation
When adding a new service:
- Extend the appropriate base class (
STTService,TTSService,LLMService, etc.) - Implement required abstract methods
- Handle necessary frames
- By default, all frames should be pushed in the direction they came
- Push
ErrorFrameon failures - Add metrics tracking via
MetricsDataif relevant - Follow the pattern of existing services in
src/pipecat/services/
Pull Requests
After creating a PR, use /changelog <pr_number> to generate the changelog file and /pr-description <pr_number> to update the PR description.