diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
index b806efad4..bd6961dd5 100644
--- a/.github/workflows/tests.yaml
+++ b/.github/workflows/tests.yaml
@@ -1,4 +1,4 @@
-name: test
+name: tests
on:
workflow_dispatch:
@@ -49,4 +49,4 @@ jobs:
- name: Test with pytest
run: |
source .venv/bin/activate
- pytest --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
+ pytest
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 000000000..6906b571b
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,7 @@
+repos:
+ - repo: local
+ hooks:
+ - id: ruff-format-hook
+ name: Check ruff formatting
+ entry: sh scripts/pre-commit.sh
+ language: system
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 80c7c0c7d..6811f08e9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,9 +9,134 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
+- Added a `metadata` field to `Frame` which makes it possible to pass custom
+ data to all frames.
+
+- Added `test/utils.py` inside of pipecat package.
+
+### Fixed
+
+- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
+ websocket error by increasing the max message size limit to 16MB.
+
+- Fixed a `DailyTransport` issue that would cause events to be triggered before
+ join finished.
+
+- Fixed a `PipelineTask` issue that was preventing processors to be cleaned up
+ after cancelling the task.
+
+- Fixed an issue where queuing a `CancelFrame` to a pipeline task would not
+ cause the task to finish. However, using `PipelineTask.cancel()` is still the
+ recommended way to cancel a task.
+
+### Other
+
+- Updated all examples to use `task.cancel()` instead of pushing an `EndFrame`
+ when a participant leaves/disconnects. If you push an `EndFrame` this will
+ cause the bot to run through everything that is internally queued (which could
+ take seconds). Instead, if a participant disconnects there is nothing else to
+ be sent and therefore we should stop immediately.
+
+## [0.0.54] - 2025-01-27
+
+### Added
+
+- In order to create tasks in Pipecat frame processors it is now recommended to
+ use `FrameProcessor.create_task()` (which uses the new
+ `utils.asyncio.create_task()`). It takes care of uncaught exceptions, task
+ cancellation handling and task management. To cancel or wait for a task there
+ is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of
+ Pipecat processors have been updated accordingly. Also, when a pipeline runner
+ finishes, a warning about dangling tasks might appear, which indicates if any
+ of the created tasks was never cancelled or awaited for (using these new
+ functions).
+
+- It is now possible to specify the period of the `PipelineTask` heartbeat
+ frames with `heartbeats_period_secs`.
+
+- Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models
+ for meeting token creation in `get_token` method of `DailyRESTHelper`.
+
+- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.
+
+- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings to a custom AWS bucket.
+
+### Changed
+
+- Enhanced `UserIdleProcessor` with retry functionality and control over idle
+ monitoring via new callback signature `(processor, retry_count) -> bool`.
+ Updated the `17-detect-user-idle.py` to show how to use the `retry_count`.
+
+- Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio
+ truncation. Audio truncation errors during interruptions now log a warning
+ and allow the session to continue instead of throwing an exception.
+
+- Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant
+ transcripts. Assistant messages are now aggregated based on bot speaking boundaries
+ rather than LLM context, providing better handling of interruptions and partial
+ utterances.
+
+- Updated foundational examples `28a-transcription-processor-openai.py`,
+ `28b-transcript-processor-anthropic.py`, and
+ `28c-transcription-processor-gemini.py` to use the updated
+ `TranscriptProcessor`.
+
+### Fixed
+
+- Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user
+ to push initial LLM assistant messages (using `LLMMessagesAppendFrame`).
+
+- Added missing `FrameProcessor.cleanup()` calls to `Pipeline`,
+ `ParallelPipeline` and `UserIdleProcessor`.
+
+- Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`.
+
+- Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted
+ in an error.
+
+- Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not
+ being processed, in cases where the `_user_audio_buffer` was smaller than the
+ buffer size.
+
+### Performance
+
+- Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s
+ audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with
+ `soxr` with similar audio quality.
+
+### Other
+
+- Added initial unit test infrastructure.
+
+## [0.0.53] - 2025-01-18
+
+### Added
+
+- Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the
+ websocket one.
+
+- Introduced pipeline frame observers. Observers can view all the frames that go
+ through the pipeline without the need to inject processors in the
+ pipeline. This can be useful, for example, to implement frame loggers or
+ debuggers among other things. The example
+ `examples/foundational/30-observer.py` shows how to add an observer to a
+ pipeline for debugging.
+
+- Introduced heartbeat frames. The pipeline task can now push periodic
+ heartbeats down the pipeline when `enable_heartbeats=True`. Heartbeats are
+ system frames that are supposed to make it all the way to the end of the
+ pipeline. When a heartbeat frame is received the traversing time (i.e. the
+ time it took to go through the whole pipeline) will be displayed (with TRACE
+ logging) otherwise a warning will be shown. The example
+ `examples/foundational/31-heartbeats.py` shows how to enable heartbeats and
+ forces warnings to be displayed.
+
+- Added `LLMTextFrame` and `TTSTextFrame` which should be pushed by LLM and TTS
+ services respectively instead of `TextFrame`s.
+
- Added `OpenRouter` for OpenRouter integration with an OpenAI-compatible
interface. Added foundational example `14m-function-calling-openrouter.py`.
-
+
- Added a new `WebsocketService` based class for TTS services, containing
base functions and retry logic.
@@ -48,6 +173,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed
+- Modified `UserIdleProcessor` to start monitoring only after first
+ conversation activity (`UserStartedSpeakingFrame` or
+ `BotStartedSpeakingFrame`) instead of immediately.
+
- Modified `OpenAIAssistantContextAggregator` to support controlled completions
and to emit context update callbacks via `FunctionCallResultProperties`.
@@ -71,6 +200,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Fixed
+- Fixed an issue where `DeepgramSTTService` was not generating metrics using
+ pipeline's VAD.
+
+- Fixed `UserIdleProcessor` not properly propagating `EndFrame`s through the
+ pipeline.
+
- Fixed an issue where websocket based TTS services could incorrectly terminate
their connection due to a retry counter not resetting.
@@ -87,6 +222,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fixed an issue where setting the voice and model for `RimeHttpTTSService`
wasn't working.
+- Fixed an issue where `IdleFrameProcessor` and `UserIdleProcessor` were getting
+ initialized before the start of the pipeline.
+
## [0.0.52] - 2024-12-24
### Added
diff --git a/README.md b/README.md
index ae7a91538..5a915041d 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
-[](https://pypi.org/project/pipecat-ai) [](https://docs.pipecat.ai) [](https://discord.gg/pipecat)
+[](https://pypi.org/project/pipecat-ai)  [](https://docs.pipecat.ai) [](https://discord.gg/pipecat)
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
@@ -53,7 +53,7 @@ To keep things lightweight, only the core framework is included by default. If y
pip install "pipecat-ai[option,...]"
```
-Available options include:
+### Available services
| Category | Services | Install Command Example |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
@@ -81,7 +81,7 @@ Here is a very basic Pipecat bot that greets a user when they join a real-time s
```python
import asyncio
-from pipecat.frames.frames import EndFrame, TextFrame
+from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
@@ -122,7 +122,7 @@ async def main():
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
- await task.queue_frame(EndFrame())
+ await task.cancel()
# Run the pipeline task
await runner.run(task)
@@ -160,15 +160,24 @@ From the root of this repo, run the following:
```shell
pip install -r dev-requirements.txt
-python -m build
```
-This builds the package. To use the package locally (e.g. to run sample files), run
+This will install the necessary development dependencies. Also, make sure you install the git pre-commit hooks:
+
+```shell
+pre-commit install
+```
+
+The hooks will just save you time when you submit a PR by making sure your code follows the project rules.
+
+To use the package locally (e.g. to run sample files), run:
```shell
pip install --editable ".[option,...]"
```
+The `--editable` option makes sure you don't have to run `pip install` again and you can just edit the project files locally.
+
If you want to use this package from another directory, you can run:
```shell
@@ -180,7 +189,7 @@ pip install "path_to_this_repo[option,...]"
From the root directory, run:
```shell
-pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
+pytest
```
## Setting up your editor
diff --git a/dev-requirements.txt b/dev-requirements.txt
index 0ed6d9b05..e3f52f9cd 100644
--- a/dev-requirements.txt
+++ b/dev-requirements.txt
@@ -1,9 +1,11 @@
build~=1.2.2
-grpcio-tools~=1.68.1
+grpcio-tools~=1.69.0
pip-tools~=7.4.1
-pyright~=1.1.390
+pre-commit~=4.0.1
+pyright~=1.1.392
pytest~=8.3.4
-ruff~=0.8.3
-setuptools~=75.6.0
+pytest-asyncio~=0.25.2
+ruff~=0.9.1
+setuptools~=75.8.0
setuptools_scm~=8.1.0
python-dotenv~=1.0.1
diff --git a/examples/README.md b/examples/README.md
index 0bbdafe97..01150b450 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -39,10 +39,10 @@ Next, follow the steps in the README for each demo.
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
-| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
+| [Phone Chatbot](phone-chatbot) | A chatbot that connects to PSTN/SIP phone calls, powered by Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
| [studypal](studypal) | A chatbot to have a conversation about any article on the web | |
-| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities | `python-websockets`, `openai`, `deepgram`, `silero-tts`, `numpy` |
+| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities. | Cartesia, Deepgram, OpenAI, Websockets |
> [!IMPORTANT]
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
diff --git a/examples/canonical-metrics/bot.py b/examples/canonical-metrics/bot.py
index c0a7fb6ec..945792b25 100644
--- a/examples/canonical-metrics/bot.py
+++ b/examples/canonical-metrics/bot.py
@@ -97,7 +97,7 @@ async def main():
call completion, CanonicalMetrics will send the audio buffer to Canonical for
analysis. Visit https://voice.canonical.chat to learn more.
"""
- audio_buffer_processor = AudioBufferProcessor()
+ audio_buffer_processor = AudioBufferProcessor(num_channels=2)
canonical = CanonicalMetricsService(
audio_buffer_processor=audio_buffer_processor,
aiohttp_session=session,
@@ -105,6 +105,7 @@ async def main():
call_id=str(uuid.uuid4()),
assistant="pipecat-chatbot",
assistant_speaks_first=True,
+ context=context,
)
pipeline = Pipeline(
[
@@ -129,11 +130,13 @@ async def main():
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
- await task.queue_frame(EndFrame())
+ await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
+ # Here we don't want to cancel, we just want to finish sending
+ # whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()
diff --git a/examples/canonical-metrics/runner.py b/examples/canonical-metrics/runner.py
index 50743fd09..ad39a3ac4 100644
--- a/examples/canonical-metrics/runner.py
+++ b/examples/canonical-metrics/runner.py
@@ -53,4 +53,3 @@ async def configure(aiohttp_session: aiohttp.ClientSession):
token = await daily_rest_helper.get_token(url, expiry_time)
return (url, token)
- return (url, token)
diff --git a/examples/chatbot-audio-recording/bot.py b/examples/chatbot-audio-recording/bot.py
index 2614fcf6b..7cf750c62 100644
--- a/examples/chatbot-audio-recording/bot.py
+++ b/examples/chatbot-audio-recording/bot.py
@@ -18,7 +18,6 @@ from loguru import logger
from runner import configure
from pipecat.audio.vad.silero import SileroVADAnalyzer
-from pipecat.frames.frames import EndFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -139,7 +138,7 @@ async def main():
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
print(f"Participant left: {participant}")
- await task.queue_frame(EndFrame())
+ await task.cancel()
runner = PipelineRunner()
diff --git a/examples/deployment/flyio-example/bot.py b/examples/deployment/flyio-example/bot.py
index 57f64889e..b74e01231 100644
--- a/examples/deployment/flyio-example/bot.py
+++ b/examples/deployment/flyio-example/bot.py
@@ -79,11 +79,13 @@ async def main(room_url: str, token: str):
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
- await task.queue_frame(EndFrame())
+ await task.cancel()
@transport.event_handler("on_call_state_updated")
async def on_call_state_updated(transport, state):
if state == "left":
+ # Here we don't want to cancel, we just want to finish sending
+ # whatever is queued, so we use an EndFrame().
await task.queue_frame(EndFrame())
runner = PipelineRunner()
diff --git a/examples/deployment/modal-example/bot.py b/examples/deployment/modal-example/bot.py
index 39d7c9573..7855caeb8 100644
--- a/examples/deployment/modal-example/bot.py
+++ b/examples/deployment/modal-example/bot.py
@@ -5,6 +5,15 @@ import sys
from dotenv import load_dotenv
from loguru import logger
+from pipecat.audio.vad.silero import SileroVADAnalyzer
+from pipecat.pipeline.pipeline import Pipeline
+from pipecat.pipeline.runner import PipelineRunner
+from pipecat.pipeline.task import PipelineParams, PipelineTask
+from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
+from pipecat.services.cartesia import CartesiaTTSService
+from pipecat.services.openai import OpenAILLMService
+from pipecat.transports.services.daily import DailyParams, DailyTransport
+
load_dotenv(override=True)
logger.remove(0)
@@ -12,16 +21,6 @@ logger.add(sys.stderr, level="DEBUG")
async def main(room_url: str, token: str):
- from pipecat.audio.vad.silero import SileroVADAnalyzer
- from pipecat.frames.frames import EndFrame
- from pipecat.pipeline.pipeline import Pipeline
- from pipecat.pipeline.runner import PipelineRunner
- from pipecat.pipeline.task import PipelineParams, PipelineTask
- from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
- from pipecat.services.cartesia import CartesiaTTSService
- from pipecat.services.openai import OpenAILLMService
- from pipecat.transports.services.daily import DailyParams, DailyTransport
-
transport = DailyTransport(
room_url,
token,
@@ -79,7 +78,7 @@ async def main(room_url: str, token: str):
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
- await task.queue_frame(EndFrame())
+ await task.cancel()
runner = PipelineRunner()
diff --git a/examples/dialin-chatbot/README.md b/examples/dialin-chatbot/README.md
deleted file mode 100644
index 3af299e8b..000000000
--- a/examples/dialin-chatbot/README.md
+++ /dev/null
@@ -1,85 +0,0 @@
-
-
+