Go to file

Cyril S. dfa4ac81fd Implement Sentry instrumentation for performance and error tracking (#470 )

* feat: Add Sentry support in FrameProcessor

This update add optional Sentry integration for performance tracking and error monitoring.

Key changes include:

- Add conditional Sentry import and initialization check
- Implement Sentry spans in FrameProcessorMetrics to measure TTFB (Time To First Byte) and processing time when Sentry is available
- Maintain existing metrics functionality with MetricsFrame regardless of Sentry availability

* feat: Enable metrics in DeepgramSTTService for Sentry

This commit enhances the DeepgramSTTService class to enable metrics generation for use with Sentry.

Key changes include:

1. Enable general metrics generation:
   - Implement `can_generate_metrics` method, returning True when VAD is enabled
   - This allows metrics to be collected and used by both Sentry and the metrics system in frame_processor.py

2. Integrate Sentry-compatible performance tracking:
   - Add start_ttfb_metrics and start_processing_metrics calls in the VAD speech detection handler
   - Implement stop_ttfb_metrics call when receiving transcripts
   - Add stop_processing_metrics for final transcripts

3. Enhance VAD support for metrics:
   - Add `vad_enabled` property to check VAD event availability
   - Implement VAD-based speech detection handler for precise metric timing

These changes enable detailed performance tracking via both Sentry and the general metrics system when VAD is active. This allows for better monitoring and analysis of the speech-to-text process, providing valuable insights through Sentry and any other metrics consumers in the pipeline.

* Update frame_processor.py

* Refactor to support flexible metrics implementation

- Modified the __init__ method to accept a metrics parameter that is either FrameProcessorMetrics or one of its subclasses
- Updated the metrics initialization to create an instance with the processor's name
- Moved all FrameProcessorMetrics-related logic to a new processors\metrics\base.py file

* Implement flexible metrics system with Sentry integration

1. Created a new metrics module in processors/metrics/

2. Implemented FrameProcessorMetrics base class in base.py:

3. Implemented SentryMetrics class in sentry.py:
   - Inherits from FrameProcessorMetrics
   - Integrates with Sentry SDK for advanced metrics tracking
   - Implements Sentry-specific span creation and management for TTFB and processing metrics
   - Handles cases where Sentry is not available or initialized

2024-09-23 08:44:14 -07:00

.github/workflows

get the test infrastructure running again

2024-09-19 20:58:17 -04:00

docs

Delete CNAME

2024-05-13 14:42:46 -05:00

examples

Add voice_settings and optimize_streaming_latency to ElevenLabs

2024-09-22 13:58:50 -04:00

src/pipecat

Implement Sentry instrumentation for performance and error tracking (#470 )

2024-09-23 08:44:14 -07:00

tests

get the test infrastructure running again

2024-09-19 20:58:17 -04:00

.dockerignore

Modularize tricky dependencies (#95 )

2024-04-03 10:48:11 -05:00

.gitignore

pyproject: don't use local version for test pypi

2024-04-05 07:51:52 -07:00

CHANGELOG.md

Merge pull request #484 from pipecat-ai/mb/llm-input-params

2024-09-20 20:35:49 -04:00

CHANGELOG.md.template

add CHANGELOG.md

2024-05-14 13:45:01 -07:00

dev-requirements.txt

update pyproject.toml and remove requirements files

2024-08-16 09:28:46 -07:00

Dockerfile

Modularize tricky dependencies (#95 )

2024-04-03 10:48:11 -05:00

dot-env.template

LMNT TTS

2024-08-22 00:47:41 +00:00

LICENSE

frames: generate protobuf pb2 file for pipecat package

2024-05-31 11:36:52 -07:00

pipecat.png

renamed image.png to pipecat.png

2024-05-12 17:44:10 -07:00

pyproject.toml

Merge pull request #469 from pipecat-ai/lewis/remove_torch_dependency

2024-09-23 09:59:40 -04:00

README.md

get the test infrastructure running again

2024-09-19 20:58:17 -04:00

test-requirements.txt

get the test infrastructure running again

2024-09-19 20:58:17 -04:00

README.md

Pipecat

pipecat is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, intake flows, and snarky social companions.

Take a look at some example apps:

Getting started with voice agents

You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.

# install the module
pip install pipecat-ai

# set up an .env file with API keys
cp dot-env.template .env

By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:

pip install "pipecat-ai[option,...]"

Your project may or may not need these, so they're made available as optional requirements. Here is a list:

AI services: anthropic, azure, deepgram, gladia, google, fal, lmnt, moondream, openai, openpipe, playht, silero, whisper, xtts
Transports: local, websocket, daily

Code examples

foundational — small snippets that build on each other, introducing one or two concepts at a time
example apps — complete applications that you can use as starting points for development

A simple voice agent running locally

Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use Daily for real-time media transport, and Cartesia for text-to-speech.

#app.py

import asyncio
import aiohttp

from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport

async def main():
  async with aiohttp.ClientSession() as session:
    # Use Daily as a real-time media transport (WebRTC)
    transport = DailyTransport(
      room_url=...,
      token=...,
      bot_name="Bot Name",
      params=DailyParams(audio_out_enabled=True))

    # Use Cartesia for Text-to-Speech
    tts = CartesiaTTSService(
        api_key=...,
        voice_id=...
      )

    # Simple pipeline that will process text to speech and output the result
    pipeline = Pipeline([tts, transport.output()])

    # Create Pipecat processor that can run one or more pipelines tasks
    runner = PipelineRunner()

    # Assign the task callable to run the pipeline
    task = PipelineTask(pipeline)

    # Register an event handler to play audio when a
    # participant joins the transport WebRTC session
    @transport.event_handler("on_participant_joined")
    async def on_new_participant_joined(transport, participant):
      participant_name = participant["info"]["userName"] or ''
      # Queue a TextFrame that will get spoken by the TTS service (Cartesia)
      await task.queue_frames([TextFrame(f"Hello there, {participant_name}!"), EndFrame()])

    # Run the pipeline task
    await runner.run(task)

if __name__ == "__main__":
  asyncio.run(main())

Run it with:

python app.py

Daily provides a prebuilt WebRTC user interface. Whilst the app is running, you can visit at https://<yourdomain>.daily.co/<room_url> and listen to the bot say hello!

WebRTC for production use

WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see this post.)

One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.

What is VAD?

Voice Activity Detection — very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.

Pipecat makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.

pip install pipecat-ai[silero]

The first time your run your bot with Silero, startup may take a while whilst it downloads and caches the model in the background. You can check the progress of this in the console.

Hacking on the framework itself

Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:

python3 -m venv venv
source venv/bin/activate

From the root of this repo, run the following:

pip install -r dev-requirements.txt
python -m build

This builds the package. To use the package locally (e.g. to run sample files), run

pip install --editable ".[option,...]"

If you want to use this package from another directory, you can run:

pip install "path_to_this_repo[option,...]"

Running tests

From the root directory, run:

pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests

Setting up your editor

This project uses strict PEP 8 formatting.

Emacs

You can use use-package to install py-autopep8 package and configure autopep8 arguments:

(use-package py-autopep8
  :ensure t
  :defer t
  :hook ((python-mode . py-autopep8-mode))
  :config
  (setq py-autopep8-options '("-a" "-a", "--max-line-length=100")))

autopep8 was installed in the venv environment described before, so you should be able to use pyvenv-auto to automatically load that environment inside Emacs.

(use-package pyvenv-auto
  :ensure t
  :defer t
  :hook ((python-mode . pyvenv-auto-run)))

Visual Studio Code

Install the autopep8 extension. Then edit the user settings (Ctrl-Shift-P Open User Settings (JSON)) and set it as the default Python formatter, enable formatting on save and configure autopep8 arguments:

"[python]": {
    "editor.defaultFormatter": "ms-python.autopep8",
    "editor.formatOnSave": true
},
"autopep8.args": [
    "-a",
    "-a",
    "--max-line-length=100"
],

Getting help

➡️ Join our Discord

➡️ Reach us on X

README.md Unescape Escape

Pipecat

Getting started with voice agents

Code examples

A simple voice agent running locally

WebRTC for production use

What is VAD?

Hacking on the framework itself

Running tests

Setting up your editor

Emacs

Visual Studio Code

Getting help

README.md