Paul Kompfner 4703df8686 fix: clear 8 more services from pyright ignore list
A fourth pass over low-error-count files. Drops 8 files (57 → 49) and
full-pyright errors from 525 → 496. Default pyright stays clean.

Optional access on transport/client receivers (4 files). Same fix
shape as #4359 — a receiver typed `X | None` accessed without a
guard. For "should never happen" cases (caller's lifecycle ensures
the field is non-None when the method runs), used `assert` rather
than silent early-return so an invariant violation surfaces loudly:

- `transports/whatsapp/client.py` (5 errors): `_validate_whatsapp_webhook_request`
  was typed `bytes` / `str` but called with `bytes | None` / `str | None`.
  Widened the helper signature and pushed the explicit None-check
  inside (matching its existing empty-string check). Also handled
  `pipecat_connection.get_answer()` returning `None` — would have
  crashed at `.get("sdp")` before.
- `transports/websocket/client.py` (5 errors): four are the deprecated
  `websockets.WebSocketClientProtocol` alias (same `# pyright: ignore[reportAttributeAccessIssue]`
  as the `services/websocket_service.py` fix from earlier in this PR).
  The fifth was `async for message in self._websocket` — traced the
  call chain and confirmed `_client_task` is created only after
  `self._websocket` is assigned and cancelled before it's cleared, so
  the field is never None when `_client_task_handler` runs. Used `assert`.
- `services/openai/stt.py` (4 errors): same pattern. `_receive_messages`
  is started by `_connect()` only when `self._websocket` is set, and
  the reconnect loop in `WebsocketService._receive_task_handler`
  re-establishes it before each retry. `assert` at entry. Plus L478/L483:
  the `try`/`except ModuleNotFoundError` import-guard makes
  `websocket_connect` and `State` `<type> | None`; `__init__` already
  raises `ImportError` if either is None, so an `assert` at the
  `_connect_websocket` use site is honest. Plus an L538 `Language | str`
  cast (same shape as last batch).
- `services/deepgram/flux/base.py` (2 errors): `event = data.get("event")`
  flowed into `_handle_turn_resumed(event: str)` as `Any | None`.
  Tightened with an `isinstance(event, str)` guard before the
  `FluxEventType(event)` lookup. The other error (`average_confidence > min_confidence`
  where `min_confidence: float | None`) was a latent crash on missing
  confidence data — restored the original `not min_confidence` (which
  treats both `None` and `0.0` as "no filter") and added an explicit
  drop-on-missing-confidence-data branch.

`gemini_live` Settings/InputParams (vertex). The deprecated `InputParams`
declares `modalities: GeminiModalities | None` and `media_resolution: GeminiMediaResolution | None`,
but their downstream usage at `services/google/gemini_live/llm.py:952,959`
calls `.value` on each — `None` would crash. Rather than touching the
deprecated input model, translate `None` to the canonical defaults
(`GeminiModalities.AUDIO`, `GeminiMediaResolution.UNSPECIFIED`) at the
assignment site in `vertex/llm.py`. Also fixed an unrelated annotation
bug: `_get_credentials` was annotated `-> str` but actually returns
`service_account.Credentials` (used correctly by the caller — only
the annotation was wrong).

`moondream/vision.py` (3 errors). `frame.format` is `str | None` but
`Image.frombytes(mode, ...)` requires `str`; raise instead of crashing
on missing format. The other two errors are pyright thinking the
moondream2-custom `encode_image` and `query` methods are `Tensor`
(rather than callables) — those are provided by the model code via
`trust_remote_code=True` and aren't visible to pyright on the base
`AutoModelForCausalLM` type. Scoped `# pyright: ignore[reportCallIssue]`
on the two call sites.

`transports/base_output.py` (3 errors). Two are `self._mixer.mix(...)`
calls in `with_mixer`, a closure invoked only when `self._mixer` is
truthy at the call site — captured the mixer to a local variable
inside the closure with an `assert`, then used that. Third is the
PIL `frombytes(mode, ...)` shape — `frame.format is None` early-
return guard at the top of `resize_frame` so the main resize logic
reads cleanly.

`elevenlabs/tts.py` (4 errors). The payload-building dict at L1271
was typed `dict[str, str | dict[str, float | bool]]` — an aspirational
shape that matched only the first two assignments. Subsequent code
assigned `list[dict[...]]` (pronunciation locators) and bools, all
violating the annotation. Same pattern at L926 (the WebSocket-init
`msg`). Both widened to `dict[str, Any]`, which is the honest shape
for a JSON request payload and what similar code uses elsewhere.

Files dropped from the ignore list (57 → 49):
services/deepgram/flux/base.py, services/elevenlabs/tts.py,
services/google/gemini_live/vertex/llm.py,
services/moondream/vision.py, services/openai/stt.py,
transports/base_output.py, transports/websocket/client.py,
transports/whatsapp/client.py.
2026-05-01 09:36:14 -04:00
2026-05-01 08:58:38 -04:00
2025-02-11 23:46:19 -08:00
2024-05-12 17:44:10 -07:00
2025-10-05 13:24:47 -05:00

pipecat

PyPI Tests codecov Docs Discord Ask DeepWiki

🎙️ Pipecat: Real-Time Voice & Multimodal AI Agents

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.

Want to dive right in? Run pipecat init quickstart or follow the quickstart guide.

🚀 What You Can Build

  • Voice Assistants natural, streaming conversations with AI
  • AI Companions coaches, meeting assistants, characters
  • Multimodal Interfaces voice, video, images, and more
  • Interactive Storytelling creative tools with generative media
  • Business Agents customer intake, support bots, guided flows
  • Complex Dialog Systems design logic with structured conversations

🧠 Why Pipecat?

  • Voice-first: Integrates speech recognition, text-to-speech, and conversation handling
  • Pluggable: Supports many AI services and tools
  • Composable Pipelines: Build complex behavior from modular components
  • Real-Time: Ultra-low latency interaction with different transports (e.g. WebSockets or WebRTC)

🌐 Pipecat Ecosystem

🧩 Multi-agent systems

Need multiple AI agents working together? Pipecat Subagents lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.

📱 Client SDKs

Building client applications? You can connect to Pipecat from any platform using our official SDKs:

JavaScript | React | React Native | Swift | Kotlin | C++ | ESP32

🧭 Structured conversations

Looking to build structured conversations? Check out Pipecat Flows for managing complex conversational states and transitions.

🪄 Beautiful UIs

Want to build beautiful and engaging experiences? Checkout the Voice UI Kit, a collection of components, hooks and templates for building voice AI applications quickly.

🛠️ Create and deploy projects

Create a new project in under a minute with the Pipecat CLI. Then use the CLI to monitor and deploy your agent to production.

🔍 Debugging

Looking for help debugging your pipeline and processors? Check out Whisker, a real-time Pipecat debugger.

🖥️ Terminal

Love terminal applications? Check out Tail, a terminal dashboard for Pipecat.

🤖 Claude Code Skills

Use Pipecat Skills with Claude Code to scaffold projects, deploy to Pipecat Cloud, and more. Install the marketplace with:

claude plugin marketplace add pipecat-ai/skills

and install any of the available plugins.

🧩 Community Integrations

Build and share your own Pipecat service integrations! Browse existing community integrations or check out our guide to create your own.

📺 Pipecat TV Channel

Catch new features, interviews, and how-tos on our Pipecat TV channel.

🎬 See it in action

 
 

🧩 Available services

Category Services
Speech-to-Text AssemblyAI, AWS, Azure, Cartesia, Deepgram, ElevenLabs, Fal Wizper, Gladia, Google, Gradium, Groq (Whisper), Mistral, NVIDIA, OpenAI (Whisper), Sarvam, Soniox, Speechmatics, Whisper, xAI
LLMs Anthropic, AWS, Azure, Cerebras, DeepSeek, Fireworks AI, Gemini, Grok, Groq, Mistral, Nebius, Novita, NVIDIA NIM, Ollama, OpenAI, OpenAI Responses, OpenRouter, Perplexity, Qwen, SambaNova, Sarvam, Together AI
Text-to-Speech Async, AWS, Azure, Camb AI, Cartesia, Deepgram, ElevenLabs, Fish, Google, Gradium, Groq, Hume, Inworld, Kokoro, LMNT, MiniMax, Mistral, Neuphonic, NVIDIA, OpenAI, Piper, Resemble, Rime, Sarvam, Smallest, Soniox, Speechmatics, xAI, XTTS
Speech-to-Speech AWS Nova Sonic, Gemini Multimodal Live, Grok Voice Agent, OpenAI Realtime, Ultravox,
Transport Daily (WebRTC), FastAPI Websocket, LiveKit (WebRTC), SmallWebRTCTransport, WebSocket Server, WhatsApp, Local
Serializers Exotel, Genesys, Plivo, Twilio, Telnyx, Vonage
Video HeyGen, LemonSlice, Tavus, Simli
Memory mem0
Vision & Image fal, Google Imagen, Moondream
Audio Processing Silero VAD, Krisp Viva, Koala, ai-coustics, RNNoise
Analytics & Metrics OpenTelemetry, Sentry
Community Browse community integrations →

📚 View full services documentation →

Getting started

You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you're ready.

  1. Install uv

    curl -LsSf https://astral.sh/uv/install.sh | sh
    

    Need help? Refer to the uv install documentation.

  2. Install the module

    # For new projects
    uv init my-pipecat-app
    cd my-pipecat-app
    uv add pipecat-ai
    
    # Or for existing projects
    uv add pipecat-ai
    
  3. Set up your environment

    cp env.example .env
    
  4. To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:

    uv add "pipecat-ai[option,...]"
    

Using pip? You can still use pip install pipecat-ai and pip install "pipecat-ai[option,...]" to get set up.

🧪 Code examples

  • Foundational — small snippets that build on each other, introducing one or two concepts at a time
  • Example apps — complete applications that you can use as starting points for development

🛠️ Contributing to the framework

Prerequisites

Minimum Python Version: 3.11 Recommended Python Version: >= 3.12

Setup Steps

  1. Clone the repository and navigate to it:

    git clone https://github.com/pipecat-ai/pipecat.git
    cd pipecat
    
  2. Install development and testing dependencies:

    uv sync --group dev --all-extras \
      --no-extra gstreamer \
      --no-extra local \
    
  3. Install the git pre-commit hooks:

    uv run pre-commit install
    

Note

: Some extras (local, gstreamer) require system dependencies. See documentation if you encounter build errors.

Claude Code Skills

Install development workflow skills for contributing to Pipecat with Claude Code:

claude plugin marketplace add pipecat-ai/pipecat
claude plugin install pipecat-dev@pipecat-dev-skills

Running tests

To run all tests, from the root directory:

uv run pytest

Run a specific test suite:

uv run pytest tests/test_name.py

🤝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:

  • Found a bug? Open an issue
  • Have a feature idea? Start a discussion
  • Want to contribute code? Check our CONTRIBUTING.md guide
  • Documentation improvements? Docs PRs are always welcome

Before submitting a pull request, please check existing issues and PRs to avoid duplicates.

We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.

🛟 Getting help

➡️ Join our Discord

➡️ Read the docs

➡️ Reach us on X

Description
Open Source framework for voice and multimodal conversational AI
Readme BSD-2-Clause 414 MiB
Languages
Python 100%