Compare commits

..

3 Commits

Author SHA1 Message Date
Paul Kompfner
9a8cd5cee5 refactor(async-tool-messages): replace reminder grafting with caller-supplied template
Empirical testing showed the previous design — grafting a verbose
re-invocation reminder into the payload's `description` field for
started and intermediate messages — was actually making Nova Sonic
*worse*, not better: more spurious re-invocations of the same tool,
not fewer. Plausibly the long, instruction-shaped description text
reads as content the model has to respond to, where a terse status
update reads as ambient state.

Replace the reminder grafting with a caller-supplied `template`
keyword argument on `prepare_message_payload_for_realtime`. When
`None` (the default), the payload is serialized to its canonical
JSON form. When provided, `template.format(tool_call_id=…, status=…,
result=…, description=…)` is applied. The template is honored across
all kinds, so callers route per kind based on which wire channel
they're using.

Nova Sonic now defines its own bracketed plain-text template
(`_ASYNC_TOOL_RESULT_TEXT_TEMPLATE`) and applies it on the
cross-modal user-text channel (intermediate / final). The started
path stays on raw JSON (the formal AWS tool-result channel requires
valid JSON). A code comment at the template constant captures the
empirical finding for the next person — short framing yields much
better behavior, surprising as it sounds.

Tests updated for the new template behavior across all kinds. Also
reverts a stream-tool example sleep-duration tweak (20s → 10s) and
adds a commented-out alternative in the function-calling-openai-async-stream
example for parallel testing.
2026-05-06 16:50:56 -04:00
Paul Kompfner
1bb0dc1d4f refactor(async-tool-messages): unify on AsyncToolMessagePayload, fix JSON shape on the wire
Reshape the helper module so AsyncToolMessagePayload is the canonical
in-memory form and the on-the-wire JSON is always derived from it
(never stored). This eliminates a drift risk that came with caching
the JSON in raw_content, and it lets prepare_message_payload_for_realtime
edit the payload (graft the re-invocation reminder into 'description')
and then serialize cleanly — which fixes a 'Tool Response parsing error'
from AWS Nova Sonic that was caused by wrapping the JSON with extra
prose.

Other changes:

- Builders construct an AsyncToolMessagePayload internally and convert
  via shared private _payload_to_message and _payload_to_json helpers
  (centralizing field-omission rules, e.g. no 'result' on 'started').
- prepare_message_payload_for_realtime replaces format_text_for_provider,
  dispatching to per-kind helpers. Reminder is now appended after the
  canonical description so the model reads the protocol explanation
  first and the directive flows from it.
- Final-result payloads are pass-through; the task is done at that
  point and re-invocation is no longer a mistake.
- Stream-tool example: lengthen intermediate sleeps 10s → 20s for more
  interesting empirical testing.
2026-05-06 13:55:56 -04:00
Paul Kompfner
7d3726a74b feat: support new async-tool mechanism in AWS Nova Sonic
Add support to AWSNovaSonicLLMService for the new "async tool call"
mechanism activated by `cancel_on_interruption=False`, which includes:

- delivering results asynchronously
- delivering result streams
- cancelling running async tools

Note that the introduction of the new mechanism had actually caused a
regression in AWS Nova Sonic, which previously supported
`cancel_on_interruption=False` with the old mechanism (simply avoiding
discarding tool calls on interruptions).

Support for the other major realtime services (`GeminiLiveLLMService`,
`OpenAIRealtimeLLMService`) will follow in a separate PR — Gemini Live
in particular needs more work before it can support long-running tool
calls reliably.
2026-05-06 11:36:15 -04:00
195 changed files with 2491 additions and 20846 deletions

View File

@@ -1 +0,0 @@
../../.claude/skills/changelog

View File

@@ -1 +0,0 @@
../../.claude/skills/cleanup

View File

@@ -1 +0,0 @@
../../.claude/skills/code-review

View File

@@ -1 +0,0 @@
../../.claude/skills/docstring

View File

@@ -1 +0,0 @@
../../.claude/skills/pr-description

View File

@@ -1 +0,0 @@
../../.claude/skills/pr-submit

View File

@@ -1 +0,0 @@
../../.claude/skills/update-docs

View File

@@ -1,8 +1,3 @@
---
name: cleanup
description: Review, refactor, document, and validate code changes in the current branch
---
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.

View File

@@ -1,91 +0,0 @@
---
name: squash-commits
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
---
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
## Instructions
### 1. Safety check
```bash
git status --short
```
If there are uncommitted changes, stop and tell the user to commit or stash them first.
### 2. Inspect the branch
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
```
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
### 3. Create a backup branch
```bash
git branch backup/<current-branch-name>
```
Tell the user the backup exists so they can recover if needed.
### 4. Soft-reset to main and unstage everything
```bash
git reset --soft main
git restore --staged .
```
All branch changes are now in the working tree, unstaged. No content has changed.
### 5. Plan the logical groups
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
- Core feature or fix (new source files + modified core files)
- Secondary features or fixes (each as its own commit if distinct)
- Refactoring or renames
- Tests
- Changelogs / docs
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
Present the proposed grouping to the user and ask for confirmation before committing.
### 6. Commit in logical groups
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
```bash
git add <file1> <file2> ...
git commit -m "..."
```
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
### 7. Verify
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
git status --short
```
Confirm:
- Commit count is small and each message is meaningful
- The set of changed files vs `main` is identical to before
- Working tree is clean
### 8. Remind about force-push
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
## Rules
- Never change file contents. If you find yourself editing a file, stop.
- Never skip the backup branch step.
- Never force-push without explicit user instruction.
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.

View File

@@ -42,7 +42,6 @@ jobs:
--extra langchain \
--extra livekit \
--extra piper \
--extra runner \
--extra sagemaker \
--extra tracing \
--extra websocket

View File

@@ -46,7 +46,6 @@ jobs:
--extra langchain \
--extra livekit \
--extra piper \
--extra runner \
--extra sagemaker \
--extra tracing \
--extra websocket

174
AGENTS.md
View File

@@ -1,174 +0,0 @@
# AGENTS.md
This file provides guidance to AI coding agents when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Deprecations**: Use the `.. deprecated:: <version>` Sphinx directive in docstrings (never inline tags like `[DEPRECATED]`), and pair it with a runtime `warnings.warn(..., DeprecationWarning)` at the call site. See `CONTRIBUTING.md` for full conventions.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
# Pydantic params class with a deprecated field
class MyParams(BaseModel):
"""Configuration parameters for MyService.
Parameters:
new_setting: Replacement for ``old_setting``.
old_setting: Legacy setting, no longer used.
.. deprecated:: 1.2.0
Use ``new_setting`` instead. Will be removed in 2.0.0.
"""
new_setting: str = "default"
old_setting: str | None = None
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

View File

@@ -7,515 +7,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [1.2.1] - 2026-05-15
### Changed
- Changed the default WebSocket endpoints for `GradiumSTTService` and
`GradiumTTSService` to the region-neutral
`wss://api.gradium.ai/api/speech/asr` and
`wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
traffic to the nearest endpoint. Override the url to pin to a specific
region.
(PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
### Fixed
- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
responded by calling a tool. The user turn never finalized, so the assistant
aggregator gated the tool-result context push and the LLM continuation never
ran. Tool calls now finalize the turn the moment they start, before the
function dispatches.
(PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
## [1.2.0] - 2026-05-14
### Added
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a
per-session identifier in local development the same way they can in Pipecat
Cloud. The development runner now mints a UUID at every construction site,
and paths that already returned a `sessionId` to the caller (Daily `/start`,
dial-in webhook) share that same UUID with the runner args instead of
generating two. The SmallWebRTC `/api/offer` endpoint also accepts an
optional `session_id` query parameter so the `/sessions/{session_id}/...`
proxy can thread it through.
(PR [#4385](https://github.com/pipecat-ai/pipecat/pull/4385))
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService`
for controlling Cartesia's server-side text buffering. When unset, Pipecat
picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE`
mode (custom buffering — avoids stacking client-side aggregation on top of
Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode
(Cartesia's managed buffering applies). Pass an explicit value (05000ms) to
override.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and
`DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
Improvement Program. When set, the value is forwarded to Deepgram as a query
parameter on the speak request. Defaults to `None`, which preserves the
existing behavior. See https://dpgr.am/deepgram-mip for pricing implications
before enabling.
(PR [#4400](https://github.com/pipecat-ai/pipecat/pull/4400))
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set
via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that
appends a developer-role message to the context whenever `LLMSetToolsFrame`
changes the set of advertised standard tools. Helps the LLM stay coherent
across mid-conversation tool changes, mitigating several flavors of
tool-call-related hallucination: calling tools that have been removed,
avoiding tools that have been re-added, and hallucinating output (made-up
answers or tool-call-shaped non-tool-calls) when tools are unavailable.
(PR [#4404](https://github.com/pipecat-ai/pipecat/pull/4404))
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in
`pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the
inference-triggered event and suppresses `on_user_turn_stopped`, leaving
finalization to another strategy in the chain such as
`LLMTurnCompletionUserTurnStopStrategy`.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop`
a generic stop strategy that finalizes the user turn whenever a
`UserTurnInferenceCompletedFrame` arrives, regardless of which component
produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base;
future producers (Flux, custom end-of-turn classifiers, etc.) can use the
base directly or subclass it to add producer-specific setup.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `on_user_turn_inference_triggered`, a new event on the user turn
controller, processor, aggregator and stop strategies that fires when a
strategy has enough signal to start LLM inference. By default it fires
together with `on_user_turn_stopped`; a gating strategy can fire only the
inference-triggered event and defer finalization to a peer.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `FilterIncompleteUserTurnStrategies` in
`pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization
that wraps the detector chain with `deferred(...)` and appends
`LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case:
`user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass
`config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`.
When installed, the strategy gates `on_user_turn_stopped` on a
`UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by
any component that can judge turn completeness — e.g. the
`UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout`
provides a safety net if no completion frame ever arrives.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added first-class RTVI support for the UI Agent Protocol:
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server
messages, plus `ui-command` and `ui-task` server-to-client messages, with
paired `*Data` / `*Message` pydantic models.
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`,
`Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching
default handlers live in `@pipecat-ai/client-react`.
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`,
and `ui-cancel-task` messages.
- Adds five UI pipeline frames, mirroring the `client-message`
frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` /
`RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` /
`UITaskMessage` envelopes, while the processor pushes inbound
`RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame`
alongside `on_ui_message`.
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
(PR [#4407](https://github.com/pipecat-ai/pipecat/pull/4407))
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore
processor now resolve credentials via the standard boto3 provider chain (EC2
instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
`~/.aws/credentials`) when explicit credentials and `AWS_*` environment
variables are absent. Services running with IAM roles no longer need to
export static credentials.
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can
bias transcription for both file-based and realtime transcription.
(PR [#4426](https://github.com/pipecat-ai/pipecat/pull/4426))
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and
`DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum
silence duration before the watchdog sends a silence packet to prevent
dangling turns. The actual threshold is `max(chunk_duration * 2,
watchdog_min_timeout)`, so it also adapts automatically to the audio chunk
size in use.
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on
models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini
2.x); the conversation now continues while the tool runs. On models that
don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time
warning explaining the limitation. (Note: an intermittent 1008 error can
occasionally fire on Gemini 2.5 during long-running tool calls; we
auto-reconnect.)
(PR [#4448](https://github.com/pipecat-ai/pipecat/pull/4448))
- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition
using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint.
Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is
VAD-aware, and automatically reconnects on error.
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
- Added NVIDIA Magpie TTS services via AWS SageMaker:
`NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM
back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream
with full interruption support via `InterruptibleTTSService`).
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`,
for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
(PR [#4470](https://github.com/pipecat-ai/pipecat/pull/4470))
- Inworld TTS updates:
- Added `delivery_mode` setting (`STABLE`/`BALANCED`/`CREATIVE`) to
`InworldTTSService` and `InworldHttpTTSService`, enabling the
stability-vs-creativity tradeoff in `inworld-tts-2`.
- Added language support to `InworldTTSService` and
`InworldHttpTTSService`. The `language` setting is now forwarded to the API,
and a new `language_to_inworld_language()` helper normalizes Pipecat
`Language` enums to Inworld's BCP-47 locale tags.
(PR [#4473](https://github.com/pipecat-ai/pipecat/pull/4473))
### Changed
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the
generally available `tts-rt-v1`.
(PR [#4386](https://github.com/pipecat-ai/pipecat/pull/4386))
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16`
to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the
`use_normalized_timestamps` and `max_buffer_delay_ms` fields.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead
of the deprecated `use_original_timestamps` field. Word timestamps now
reflect what was actually spoken (post text-normalization and
pronunciation-dictionary substitution), matching the convention Pipecat uses
for ElevenLabs. This is a behavior change for `sonic-3` users, who were
previously receiving timestamps tied to the input transcript.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Broadened `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s. Three
changes: a rename (`tool_resources``app_resources`), a new `app_resources`
property on `PipelineTask`, and a new `pipeline_task` property on
`FrameProcessor`. Tool handlers now read `params.app_resources`; custom
processors read `self.pipeline_task.app_resources`. The previous
`tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and
`FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit
`DeprecationWarning`s.
(PR [#4395](https://github.com/pipecat-ai/pipecat/pull/4395))
- Lowered the per-message log in
`SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App
messages can be high-frequency and were noisy at debug level; set the loguru
level to `TRACE` to see them again.
(PR [#4397](https://github.com/pipecat-ai/pipecat/pull/4397))
- Changed the default model for `GrokRealtimeLLMService` to
`grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The
previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is
being removed.
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to
`inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`,
`InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing
users can pin the prior model explicitly via the `model`/`tts_model`
argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid
model IDs.
(PR [#4422](https://github.com/pipecat-ai/pipecat/pull/4422))
- Changed the default model for `GrokLLMService` from `grok-3` to
`grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
(PR [#4429](https://github.com/pipecat-ai/pipecat/pull/4429))
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic:
`max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms.
This prevents false silence injections when large audio chunks are sent at
lower frequency.
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the
turn is complete (on `on_turn_context_completed`) rather than waiting until
all audio has finished playing back. The `isFinal` message from ElevenLabs is
now used to signal `TTSStoppedFrame` and clean up the audio context,
improving turn transition timing.
(PR [#4433](https://github.com/pipecat-ai/pipecat/pull/4433))
- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio
encoding by default, which returns audio bytes without headers.
(PR [#4446](https://github.com/pipecat-ai/pipecat/pull/4446))
- Moved `create_task`, `cancel_task`, the `task_manager` property, and
`setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom
`BaseObject` subclasses (turn strategies, controllers, etc.) now inherit
these methods directly instead of reimplementing the task manager wiring.
Owners propagate the task manager to their child `BaseObject`s via `await
child.setup(task_manager)`.
(PR [#4449](https://github.com/pipecat-ai/pipecat/pull/4449))
- Changed the default OpenAI Realtime input audio transcription model from
`gpt-4o-transcribe` to `gpt-realtime-whisper` for both
`OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does
not accept the `prompt` parameter; if a prompt is supplied alongside
`gpt-realtime-whisper`, it is dropped automatically and a warning is logged.
To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or
`"gpt-4o-mini-transcribe"`).
(PR [#4450](https://github.com/pipecat-ai/pipecat/pull/4450))
- Updated the default model for `CartesiaTTSService` and
`CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
(PR [#4462](https://github.com/pipecat-ai/pipecat/pull/4462))
- Changed the default model for `OpenAIRealtimeLLMService` from
`gpt-realtime-1.5` to `gpt-realtime-2`.
(PR [#4472](https://github.com/pipecat-ai/pipecat/pull/4472))
### Deprecated
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use
`user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add
`LLMTurnCompletionUserTurnStopStrategy` to a custom
`user_turn_strategies.stop`) instead. Setting the legacy flag still works for
one release: the aggregator emits a `DeprecationWarning` and rewires the
strategies as if you had passed `FilterIncompleteUserTurnStrategies`
directly.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the
`create_file_resampler()` / `create_stream_resampler()` factories).
Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class
will be removed in Pipecat 2.0 along with the default `resampy` and `numba`
dependencies.
(PR [#4428](https://github.com/pipecat-ai/pipecat/pull/4428))
### Fixed
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as
`ErrorFrame`s. The latest API emits a `flush_done` per transcript when
server-side buffering is disabled; Pipecat now consumes them silently since
each turn already has its own `context_id`.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`,
`VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance
(e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both
the class and an instance.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200
response — one with the API's error text and a second, less informative
"Unknown error" frame from the outer exception handler. It now pushes a
single frame that includes the HTTP status code and returns cleanly.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally
for user turn stop strategies. It is now only imported when
`default_user_turn_stop_strategies()` is called. This improves startup time
and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning
when the default stop strategies are not used.
(PR [#4393](https://github.com/pipecat-ai/pipecat/pull/4393))
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was
stored in `Settings` but never sent to xAI, so every session silently fell
back to xAI's server-side default. The model is now passed via the `?model=`
query parameter on the WebSocket URL as xAI's Voice Agent API requires.
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
- Fixed `on_user_turn_stopped` firing prematurely when
`filter_incomplete_user_turns` was enabled. The event now fires only after
the LLM confirms the user turn is complete (`✓`); previously the smart-turn
detector's tentative stop was bubbling up before the LLM had a chance to veto
it, causing observers, transcript appenders and UI indicators to receive an
early — and sometimes duplicated — signal.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting
across two assistant messages in the LLM context and not surfacing in
`on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted
at the end of a TTS context now carries a PTS just past the last word so it
can't overtake clock-queued `TTSTextFrame`s in the transport's output, and
`LLMAssistantAggregator` now triggers
`on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the
frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting
transcripts).
(PR [#4414](https://github.com/pipecat-ai/pipecat/pull/4414))
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged
words (e.g. `bookLook`) when using Flash models. Flash often splits sentences
mid-stream into alignment chunks that begin with a real inter-word space, but
the previous fix unconditionally stripped that space from every chunk.
Leading spaces are now stripped only on the first alignment chunk of an
utterance, so subsequent chunks correctly flush partial words across
boundaries.
(PR [#4415](https://github.com/pipecat-ai/pipecat/pull/4415))
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor
erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
was set in the environment. The half-populated kwargs are no longer forwarded
to aioboto3; partial env-var configurations now fall through to the boto3
credential chain like fully-unset configurations do.
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing
romanized/normalized text to the LLM context. With non-Latin input (e.g.,
Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao
!` instead of `你好!`), which then degraded subsequent LLM turns. The services
now consume `alignment` by default and only switch to `normalizedAlignment` /
`normalized_alignment` when `pronunciation_dictionary_locators` is configured
(where `alignment` has overlapping restarts that produce duplicated/garbled
words, per #4316). Both fields are read with preferred-with-fallback
semantics since each is nullable per the API schema.
(PR [#4424](https://github.com/pipecat-ai/pipecat/pull/4424))
- Fixed a deadlock in `TTSService` that could permanently stall pipeline
processing when all three conditions occurred together:
`pause_frame_processing=True`, an interruption arrived before any TTS audio
was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`,
`FunctionCallResultFrame`) was in the processing queue at that moment. The
process task would block on `__process_event.wait()` indefinitely because
`BotStoppedSpeakingFrame` never arrives (no audio was played) and the
interruption handler did not resume processing. Affects services using
`pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and
ResembleAI.
(PR [#4431](https://github.com/pipecat-ai/pipecat/pull/4431))
- Fixed interruptions being delayed when a slow non-uninterruptible frame was
processing and an uninterruptible frame was waiting in the queue. The bot
would stall until the slow frame finished instead of cancelling it
immediately on interruption.
(PR [#4434](https://github.com/pipecat-ai/pipecat/pull/4434))
- Fixed `TTSService` dropping uninterruptible frames (e.g.
`FunctionCallResultFrame`) from its internal serialization queue when an
interruption occurs. Previously, the queue was recreated on every
interruption, silently discarding any queued frames. The queue is now reset
instead of recreated, preserving uninterruptible frames so they are always
delivered downstream.
(PR [#4435](https://github.com/pipecat-ai/pipecat/pull/4435))
- Fixed a race condition in the Daily transport that caused `AttributeError:
'NoneType' object has no attribute 'send_app_message'` when tearing down a
pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the
same `DailyTransportClient` and both call `cleanup()`, which was releasing
the underlying `CallClient` on the first call — leaving the second caller
with a `None` client.
(PR [#4440](https://github.com/pipecat-ai/pipecat/pull/4440))
- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService`
and `OpenAIRealtimeLLMService`. These services previously honored the flag by
simply not cancelling in-flight function calls on interruption; the
introduction of the new async-tool mechanism (which threads
started/intermediate/final messages through the LLM context) broke that path
because the realtime services didn't know how to interpret those messages.
Note that new-style streamed intermediate results
(`FunctionCallResultProperties(is_final=False)`) are not supported on these
realtime services. Similar fixes for other impacted realtime services are
forthcoming.
(PR [#4441](https://github.com/pipecat-ai/pipecat/pull/4441))
- Fixed two misspelled Gemini TTS voice names in
`GeminiTTSService.AVAILABLE_VOICES`.
(PR [#4443](https://github.com/pipecat-ai/pipecat/pull/4443))
- Extended the `cancel_on_interruption=False` regression fix to
`GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and
`UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in
#4441 (each service detects async-tool messages in the LLM context and routes
the final result to its formal tool-result channel; Azure inherits
transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different
approach because its API freezes the conversation between
`client_tool_invocation` and the matching `client_tool_result` — for
async-registered functions it now ships a placeholder `client_tool_result`
immediately when the function is invoked (to unfreeze the conversation), then
injects the real result as user-side text once the tool finishes. Streamed
intermediate results (`FunctionCallResultProperties(is_final=False)`) are
still not supported on any of these realtime services. `GeminiLiveLLMService`
and `InworldRealtimeLLMService` are excluded for now: Gemini Live's
async-tool path needs deeper investigation, and Inworld tool calling needs to
be sorted out first.
(PR [#4447](https://github.com/pipecat-ai/pipecat/pull/4447))
- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses
(observed with `gpt-realtime-2`). A single response can now contain more than
one audio item, and the first item's `audio.done` may arrive after the second
item's deltas have started. Deltas still arrive strictly in playback order,
so we continue to forward them as received (matching OpenAI's reference
implementation). The fix removes spurious warnings, ensures truncation always
targets the latest audio item, and emits a single bracketing
`TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is
now pushed on `response.done`).
(PR [#4465](https://github.com/pipecat-ai/pipecat/pull/4465))
- Fixed missing `output` attribute on LLM OpenTelemetry spans when the LLM call
is interrupted mid-stream.
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
- Fixed incorrect `metrics.ttfb` on STT OpenTelemetry spans, and parented them
to the current turn span.
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
- Fixed incorrect `metrics.ttfb` on TTS OpenTelemetry spans for streaming
services.
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
- Extended the `cancel_on_interruption=False` regression fix to
`InworldRealtimeLLMService`. Uses the same approach as in #4441 (the service
detects async-tool messages in the LLM context and routes the final result to
its formal tool-result channel). Note: as of this writing, Inworld Realtime
doesn't appear to handle the resulting delayed tool result reliably — the
routing is best-effort and the service surfaces a one-time warning when
async-tool messages are seen. Streamed intermediate results
(`FunctionCallResultProperties(is_final=False)`) are still not supported on
this realtime service. (Inworld was excluded from #4447 pending resolution of
an unrelated tool-calling issue, which turned out to be an account-level
matter.)
(PR [#4474](https://github.com/pipecat-ai/pipecat/pull/4474))
- Fixed Cartesia TTS Korean word timestamps to use normal spacing rules,
preserving word boundaries and per-word timestamp alignment during downstream
aggregation.
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
- Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve
provider text spacing, avoiding artificial spaces when timestamp groups are
reassembled downstream.
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
- Fixed `SonioxSTTService` final transcription frames missing detected language
metadata when Soniox returns token-level language annotations.
(PR [#4482](https://github.com/pipecat-ai/pipecat/pull/4482))
- Fixed Soniox final transcription language detection to use the most common
recognized token language, avoiding mislabeling an utterance when the last
token is tagged with a different language.
(PR [#4495](https://github.com/pipecat-ai/pipecat/pull/4495))
- Fixed dropped audio in streaming TTS services whose wire protocol doesn't
echo `context_id` back on incoming audio (Sarvam, Smallest, Soniox, Inworld,
and others). Previously, audio that arrived between contexts or at the very
start of a turn was tagged with `context_id=None` and silently dropped with
an "unable to append audio to context: no context ID provided" debug log.
`TTSService.get_active_audio_context_id()` now falls back to the
synthesis-side `_turn_context_id` when the playback cursor isn't set yet.
(PR [#4497](https://github.com/pipecat-ai/pipecat/pull/4497))
### Security
- Fixed a path traversal issue in the development runner's
`/files/{filename:path}` download endpoint. Previously, when the runner was
started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could
escape the configured folder because `%2F`-encoded separators bypassed
Starlette's path normalisation. The endpoint now resolves the joined path and
rejects any filename that escapes the allowed base with a 403, and also
returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
(PR [#4417](https://github.com/pipecat-ai/pipecat/pull/4417))
## [1.1.0] - 2026-04-27
### Added

158
CLAUDE.md
View File

@@ -1 +1,157 @@
@AGENTS.md
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

View File

@@ -92,10 +92,10 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |

View File

@@ -1 +0,0 @@
- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.

View File

@@ -1 +0,0 @@
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.

View File

@@ -1 +0,0 @@
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.

View File

@@ -1 +0,0 @@
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.

View File

@@ -1 +0,0 @@
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.

View File

@@ -1 +0,0 @@
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.

1
changelog/4385.added.md Normal file
View File

@@ -0,0 +1 @@
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned a `sessionId` to the caller (Daily `/start`, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC `/api/offer` endpoint also accepts an optional `session_id` query parameter so the `/sessions/{session_id}/...` proxy can thread it through.

View File

@@ -0,0 +1 @@
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the generally available `tts-rt-v1`.

1
changelog/4390.added.md Normal file
View File

@@ -0,0 +1 @@
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService` for controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE` mode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode (Cartesia's managed buffering applies). Pass an explicit value (05000ms) to override.

View File

@@ -0,0 +1 @@
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16` to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the `use_normalized_timestamps` and `max_buffer_delay_ms` fields.

View File

@@ -0,0 +1 @@
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead of the deprecated `use_original_timestamps` field. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change for `sonic-3` users, who were previously receiving timestamps tied to the input transcript.

View File

@@ -0,0 +1 @@
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly.

View File

@@ -0,0 +1 @@
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`, `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both the class and an instance.

1
changelog/4390.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as `ErrorFrame`s. The latest API emits a `flush_done` per transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its own `context_id`.

1
changelog/4393.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally for user turn stop strategies. It is now only imported when `default_user_turn_stop_strategies()` is called. This improves startup time and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used.

View File

@@ -0,0 +1 @@
- Broadened `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Three changes: a rename (`tool_resources``app_resources`), a new `app_resources` property on `PipelineTask`, and a new `pipeline_task` property on `FrameProcessor`. Tool handlers now read `params.app_resources`; custom processors read `self.pipeline_task.app_resources`. The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.

1
changelog/4400.added.md Normal file
View File

@@ -0,0 +1 @@
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults to `None`, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling.

View File

@@ -1 +0,0 @@
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.

1
changelog/4425.added.md Normal file
View File

@@ -0,0 +1 @@
- Added support to `AWSNovaSonicLLMService` for the new "async tool call" mechanism activated by `cancel_on_interruption=False`, which includes delivering results asynchronously, delivering result streams, and cancelling running async tools. Support for the other major realtime services (`GeminiLiveLLMService`, `OpenAIRealtimeLLMService`) will be added in a follow-up PR.

1
changelog/4425.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a regression in `AWSNovaSonicLLMService` where `cancel_on_interruption=False` (which previously worked under the old async-tool-call mechanism, by simply avoiding discarding tool calls on interruptions) stopped working after the introduction of the new "async tool call" mechanism.

View File

@@ -1 +0,0 @@
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).

View File

@@ -1 +0,0 @@
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.

View File

@@ -1 +0,0 @@
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.

View File

@@ -1 +0,0 @@
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.

View File

@@ -1 +0,0 @@
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.

View File

@@ -1 +0,0 @@
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.

View File

@@ -1 +0,0 @@
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.

View File

@@ -1 +0,0 @@
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.

View File

@@ -1 +0,0 @@
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.

View File

@@ -1 +0,0 @@
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.

View File

@@ -1 +0,0 @@
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.

View File

@@ -1 +0,0 @@
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.

View File

@@ -1 +0,0 @@
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.

View File

@@ -91,9 +91,6 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
HUME_API_KEY=...
HUME_VOICE_ID=...
# Inception
INCEPTION_API_KEY=...
# Inworld
INWORLD_API_KEY=...
@@ -135,10 +132,6 @@ NOVITA_API_KEY=...
# NVIDIA
NVIDIA_API_KEY=...
# For a full example of how to deploy to SageMaker, see:
# https://github.com/pipecat-ai/pipecat-examples/tree/main/nvidia_sagemaker_example/deployment/aws-sagemaker-nvidia
SAGEMAKER_ASR_ENDPOINT_NAME=...
SAGEMAKER_MAGPIE_ENDPOINT_NAME=...
# OpenAI
OPENAI_API_KEY=...
@@ -214,11 +207,6 @@ TWILIO_AUTH_TOKEN=...
# Ultravox Realtime
ULTRAVOX_API_KEY=...
# Vonage
VONAGE_APPLICATION_ID=...
VONAGE_SESSION_ID=...
VONAGE_TOKEN=...
# WhatsApp
WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...

View File

@@ -1,232 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Manual validation harness for the ``add_tool_change_messages`` feature.
When tools change mid-conversation, LLMs can produce a few different
flavors of tool-call-related hallucination:
- **Forward hallucination** — calling a tool that has been removed.
- **Negative hallucination** — refusing to call a tool that has been
re-added (because recent context is full of "I can't" responses).
- **Hallucinated output when tools are unavailable** — making up an
answer rather than declining gracefully, or producing JSON that
*looks* like a tool call but is actually just an assistant text
response.
The ``add_tool_change_messages`` feature mitigates these by appending a
developer-role message to the conversation whenever ``LLMSetToolsFrame``
changes the set of advertised tools, so the LLM stays in sync with what's
actually available.
This harness exercises all of those flavors by flipping the advertised
tool set on a turn counter:
Phase 0 (turns 14): weather tool ACTIVE — confirm baseline.
Phase 1 (turns 58): tool REMOVED — keep asking for weather.
Phase 2 (turn 9+): tool RE-ADDED — does the LLM call it again?
Set ``ADD_TOOL_CHANGE_MESSAGES=0`` to disable the mitigation and see the
unmitigated behavior. The default is ON so a fresh run shows the feature
working.
Defaults to Llama 3.1 8B Instruct via a locally-running Ollama —
anecdotally one of the more hallucination-prone of the easily accessible
models. Pull the model once with ``ollama pull llama3.1:8b`` and make
sure ``ollama serve`` is running. Swap the LLM service to validate other
providers.
Run with::
uv run examples/features/features-add-tool-change-messages.py
ADD_TOOL_CHANGE_MESSAGES=0 uv run examples/features/features-add-tool-change-messages.py
"""
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import NOT_GIVEN, LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.ollama.llm import OLLamaLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# Default ON so a fresh run shows the feature working. Set to "0" to A/B
# against the unmitigated behavior.
ADD_TOOL_CHANGE_MESSAGES = os.environ.get("ADD_TOOL_CHANGE_MESSAGES", "1") == "1"
async def fetch_weather_from_api(params: FunctionCallParams):
await params.result_callback({"conditions": "nice", "temperature": "75"})
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
weather_tools = ToolsSchema(standard_tools=[weather_function])
transport_params = {
"daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_in_enabled=True, audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_in_enabled=True, audio_out_enabled=True),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(
f"Starting add_tool_change_messages demo bot "
f"(ADD_TOOL_CHANGE_MESSAGES={ADD_TOOL_CHANGE_MESSAGES})"
)
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OLLamaLLMService(
settings=OLLamaLLMService.Settings(
# Llama 3.1 8B Instruct is anecdotally one of the more
# hallucination-prone of the easily accessible models — exactly
# what we want for this validation harness. Pull it with
# ``ollama pull llama3.1:8b`` and make sure ``ollama serve``
# is running.
model="llama3.1:8b",
system_instruction=(
"You are a helpful assistant in a voice conversation. Your responses "
"will be spoken aloud, so avoid emojis, bullet points, or other "
"formatting that can't be spoken. Respond briefly and naturally. "
"If the user asks for the current weather, use the `get_current_weather` "
"function if it's available. IMPORTANT: if you do not have access to the function, "
"say something along the lines of 'Sorry, I can't check the weather right now.'."
),
),
)
llm.register_function("get_current_weather", fetch_weather_from_api)
context = LLMContext(tools=weather_tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
add_tool_change_messages=ADD_TOOL_CHANGE_MESSAGES,
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
# Phase controller: roughly 4 turns per phase.
user_turn_count = 0
REMOVE_AT_TURN = 5 # tool gone for turn N onward
READD_AT_TURN = 9 # tool back for turn N onward
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message):
nonlocal user_turn_count
user_turn_count += 1
logger.info(f"=== User turn {user_turn_count} complete ===")
if user_turn_count == REMOVE_AT_TURN - 1:
logger.info(
"=== Phase 1: weather tool REMOVED. Keep asking about the weather "
"to exercise hallucination scenarios. ==="
)
await task.queue_frame(LLMSetToolsFrame(tools=NOT_GIVEN))
elif user_turn_count == READD_AT_TURN - 1:
logger.info(
"=== Phase 2: weather tool RE-ADDED. Ask for the weather again — "
"does the LLM call it, or keep refusing? (THIS IS THE TEST.) ==="
)
await task.queue_frame(LLMSetToolsFrame(tools=weather_tools))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
logger.info(
"=== Phase 0: weather tool ACTIVE. Ask for the weather a few times "
"to confirm it's working. ==="
)
context.add_message(
{
"role": "developer",
"content": (
"Please introduce yourself briefly to the user, then invite them "
"to ask about the weather."
),
}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -1,187 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Manual demonstration of the missing-handler (developer-error) recovery path.
When a tool is advertised to the LLM via ``tools``/``LLMContext`` but
the developer forgets to call ``llm.register_function(...)`` to wire up
its handler, the LLM happily emits a tool call and then... nothing
happens on the Pipecat side, leaving the conversation stuck.
Pipecat's recovery path (``LLMService._missing_function_call_handler``)
catches this case:
- Logs a ``logger.error`` distinguishing **developer error** (tool advertised
but no handler registered) from a hallucination (tool not advertised),
pointing at the missing ``register_function`` call.
- Returns a neutral terminal tool result
(``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``: "The function
`X` is not currently available.") so the call still terminates with a
normal tool result instead of leaving the conversation stuck.
This example is **deliberately broken**: the weather schema is in
``tools`` but ``register_function`` is *not* called. Ask the bot about
the weather and observe:
1. The LLM emits a tool call for ``get_current_weather``.
2. ``logger.error`` fires with "advertised … but has no registered handler
— did you forget to call register_function()?"
3. The terminal tool result is fed back to the LLM.
4. The LLM responds in voice based on that result (typically something
like "the weather function isn't available right now").
Uses the OpenAI LLM service with defaults. Swap to another provider to
validate this behavior elsewhere.
"""
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
weather_tools = ToolsSchema(standard_tools=[weather_function])
transport_params = {
"daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
"twilio": lambda: FastAPIWebsocketParams(audio_in_enabled=True, audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_in_enabled=True, audio_out_enabled=True),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting missing-handler demo bot (no handler is registered on purpose)")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=(
"You are a helpful assistant in a voice conversation. Your responses "
"will be spoken aloud, so avoid emojis, bullet points, or other "
"formatting that can't be spoken. Respond briefly and naturally. "
"Always use the get_current_weather function to answer questions "
"about the current weather."
),
),
)
# *** DELIBERATELY OMITTED ***
# The whole point of this example is to demonstrate the missing-handler
# recovery path. Re-add this line to wire the tool up correctly:
#
# llm.register_function("get_current_weather", fetch_weather_from_api)
context = LLMContext(tools=weather_tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
logger.info(
"=== Ask for the weather. Watch for a logger.error about the missing "
"handler, and listen for the LLM's response based on the recovery "
"message. ==="
)
context.add_message(
{
"role": "developer",
"content": (
"Please introduce yourself briefly to the user, then invite "
"them to ask about the weather."
),
}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -74,6 +74,7 @@ async def track_current_location(params: FunctionCallParams):
# Second update: revised city estimate.
await asyncio.sleep(10)
# await asyncio.sleep(20)
gps = {"lat": 33.96003, "lng": -118.40639}
await params.result_callback(
{"gps": gps, "city": "Los Angeles"},
@@ -82,6 +83,7 @@ async def track_current_location(params: FunctionCallParams):
# Final result: confirmed city.
await asyncio.sleep(10)
# await asyncio.sleep(20)
gps = {"lat": 32.743569, "lng": -117.20466}
await params.result_callback({"gps": gps, "city": "San Diego"})

View File

@@ -29,7 +29,7 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -69,7 +69,13 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])
stt = OpenAISTTService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related weather, such as temperature and conditions. And restaurant names.",
),
)
tts = OpenAITTSService(
api_key=os.environ["OPENAI_API_KEY"],

View File

@@ -25,7 +25,7 @@ from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -63,14 +63,20 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])
stt = OpenAISTTService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAISTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related weather, such as temperature and conditions. And restaurant names.",
),
)
tts = OpenAITTSService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAITTSService.Settings(
instructions="Please speak clearly and at a moderate pace.",
voice="ballad",
),
instructions="Please speak clearly and at a moderate pace.",
)
llm = OpenAILLMService(

View File

@@ -71,8 +71,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = QwenLLMService(
api_key=os.environ["QWEN_API_KEY"],
model="qwen2.5-72b-instruct",
settings=QwenLLMService.Settings(
model="qwen2.5-72b-instruct",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)

View File

@@ -4,20 +4,21 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example: async function call with the Azure Realtime LLM service.
"""Example: streaming async function call with the AWS Nova Sonic LLM service.
The ``get_current_weather`` tool is registered with
``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
While the call is in flight the conversation continues; the result arrives
later via the async-tool mechanism and is forwarded to Azure Realtime as a
``function_call_output`` so the model can integrate it naturally into its
next turn.
The ``track_current_location`` tool simulates a GPS tracker reporting the
device's position during a road trip from San Francisco to San Diego. It
sends two intermediate updates (via ``params.result_callback`` with
``is_final=False``) as the vehicle passes through cities along the way, then
delivers the final destination.
The placeholder is sent as a formal Nova Sonic ``toolResult``; each
intermediate result is forwarded as a cross-modal user-role text input event
so the model can fold each update into its next turn.
"""
import asyncio
import os
import random
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
@@ -25,7 +26,7 @@ from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.frames.frames import FunctionCallResultProperties, LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
@@ -36,14 +37,8 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.azure.realtime.llm import AzureRealtimeLLMService
from pipecat.services.aws.nova_sonic.llm import AWSNovaSonicLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.realtime.events import (
AudioConfiguration,
AudioInput,
InputAudioTranscription,
SessionProperties,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -51,54 +46,39 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
# Simulate a long-running API call so we can demonstrate that the
# conversation continues while the tool is in flight.
await asyncio.sleep(10)
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"
else random.randint(15, 30)
)
async def track_current_location(params: FunctionCallParams):
"""Simulate a GPS tracker reporting position during a road trip."""
gps = {"lat": 37.7310, "lng": -122.4527}
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"location": params.arguments["location"],
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
{"gps": gps, "city": "San Francisco"},
properties=FunctionCallResultProperties(is_final=False),
)
await asyncio.sleep(10)
gps = {"lat": 33.96003, "lng": -118.40639}
await params.result_callback(
{"gps": gps, "city": "Los Angeles"},
properties=FunctionCallResultProperties(is_final=False),
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
await asyncio.sleep(10)
gps = {"lat": 32.743569, "lng": -117.20466}
await params.result_callback({"gps": gps, "city": "San Diego"})
location_function = FunctionSchema(
name="track_current_location",
description=(
"Start tracking the user's current GPS location, reporting position "
"updates until the user reaches their destination. "
"Once this tracker is started, it doesn't need to be started again for subsequent updates; "
"just call this function once to kick it off and the updates will come in automatically."
),
properties={},
required=[],
)
tools = ToolsSchema(standard_tools=[weather_function])
system_instruction = (
"You are a friendly assistant. The user and you will engage in a spoken "
"dialog exchanging the transcripts of a natural real-time conversation. "
"Keep your responses short, generally two or three sentences for chatty "
"scenarios. When the user asks for the weather, call get_current_weather. "
"While you wait for the result, keep chatting with the user. When the "
"result arrives, share it with the user naturally."
)
tools = ToolsSchema(standard_tools=[location_function])
transport_params = {
@@ -120,24 +100,31 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = AzureRealtimeLLMService(
api_key=os.environ["AZURE_REALTIME_API_KEY"],
base_url=os.environ["AZURE_REALTIME_BASE_URL"],
settings=AzureRealtimeLLMService.Settings(
system_instruction = (
"You are a friendly assistant. The user and you will engage in a spoken "
"dialog exchanging the transcripts of a natural real-time conversation. "
"Keep your responses short, generally two or three sentences for chatty "
"scenarios. You have access to a function that starts tracking the user's "
"location and provides regular updates on it. Narrate each position "
"update to the user as it arrives (city only, no coordinates). "
"When you receive the final location, tell the user the destination has "
"been reached."
)
llm = AWSNovaSonicLLMService(
secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"],
access_key_id=os.environ["AWS_ACCESS_KEY_ID"],
region=os.environ["AWS_REGION"],
session_token=os.getenv("AWS_SESSION_TOKEN"),
settings=AWSNovaSonicLLMService.Settings(
voice="tiffany",
system_instruction=system_instruction,
session_properties=SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(model="whisper-1"),
)
),
),
),
)
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
"track_current_location",
track_current_location,
cancel_on_interruption=False,
)

View File

@@ -7,11 +7,10 @@
"""Example: async function call with the AWS Nova Sonic LLM service.
The ``get_current_weather`` tool is registered with
``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
``cancel_on_interruption=False`` and simulates a slow API call (20s sleep).
While the call is in flight the conversation continues; the result arrives
later via the async-tool mechanism and is forwarded to Nova Sonic via the
formal toolResult channel so the model can integrate it naturally into its
next turn.
later via the async-tool mechanism and is forwarded to Nova Sonic so the
model can integrate it naturally into its next turn.
"""
import asyncio
@@ -48,7 +47,7 @@ load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
# Simulate a long-running API call so we can demonstrate that the
# conversation continues while the tool is in flight.
await asyncio.sleep(10)
await asyncio.sleep(20)
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"

View File

@@ -5,7 +5,6 @@
#
import asyncio
import os
import random
from datetime import datetime

View File

@@ -1,175 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example: async function call with the Gemini Live LLM service.
The ``get_current_weather`` tool is registered with
``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
While the call is in flight the conversation continues; the result arrives
later via the async-tool mechanism and is forwarded to Gemini Live as a
FunctionResponse so the model can integrate it naturally into its next turn.
"""
import asyncio
import os
import random
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
# Simulate a long-running API call so we can demonstrate that the
# conversation continues while the tool is in flight.
await asyncio.sleep(10)
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"
else random.randint(15, 30)
)
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"location": params.arguments["location"],
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
system_instruction = (
"You are a friendly assistant. The user and you will engage in a spoken "
"dialog exchanging the transcripts of a natural real-time conversation. "
"Keep your responses short, generally two or three sentences for chatty "
"scenarios. When the user asks for the weather, call get_current_weather. "
"While you wait for the result, keep chatting with the user. When the "
"result arrives, share it with the user naturally."
)
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = GeminiLiveLLMService(
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=system_instruction,
),
tools=tools,
)
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
cancel_on_interruption=False,
)
context = LLMContext()
# Server-side VAD is enabled by default; no local VAD is added.
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -6,27 +6,22 @@
import os
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.inception.llm import InceptionLLMService
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -36,13 +31,31 @@ load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
await params.result_callback({"conditions": "nice", "temperature": "75"})
temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
system_instruction = """
You are a helpful assistant who can answer questions and use tools.
You have three tools available to you:
1. get_current_weather: Use this tool to get the current weather in a specific location.
2. get_restaurant_recommendation: Use this tool to get a restaurant recommendation in a specific location.
3. google_search: Use this tool to search the web for information.
"""
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
@@ -64,31 +77,6 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = InceptionLLMService(
api_key=os.environ["INCEPTION_API_KEY"],
settings=InceptionLLMService.Settings(
reasoning_effort="instant",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
# You can also register a function_name of None to get all functions
# sent to the same callback with an additional function_name parameter.
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
@llm.event_handler("on_function_calls_started")
async def on_function_calls_started(service, function_calls):
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
@@ -105,7 +93,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
@@ -117,21 +104,39 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
search_tool = {"google_search": {}}
# KNOWN ISSUE: If using GeminiVertexLiveLLMService, it appears
# you cannot use the "google_search" tool alongside other tools.
# See https://github.com/googleapis/python-genai/issues/941.
tools = ToolsSchema(
standard_tools=[weather_function, restaurant_function],
custom_tools={AdapterType.GEMINI: [search_tool]},
)
llm = GeminiLiveLLMService(
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=system_instruction,
),
tools=tools,
)
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
# You can provide the system instructions and tools in the context rather
# than as arguments to GeminiLiveLLMService, but note that doing so will
# trigger a (fast) reconnection when the GeminiLiveLLMService first
# receives the context (i.e. when we send the LLMRunFrame below).
context = LLMContext()
# Server-side VAD is enabled by default; no local VAD is added.
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]

View File

@@ -6,13 +6,10 @@
import os
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import AdapterType, ToolsSchema
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
@@ -26,7 +23,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -34,32 +30,6 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
temperature = 75 if params.arguments["format"] == "fahrenheit" else 24
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
async def fetch_restaurant_recommendation(params: FunctionCallParams):
await params.result_callback({"name": "The Golden Dragon"})
system_instruction = """
You are a helpful assistant who can answer questions and use tools.
You have three tools available to you:
1. get_current_weather: Use this tool to get the current weather in a specific location.
2. get_restaurant_recommendation: Use this tool to get a restaurant recommendation in a specific location.
3. google_search: Use this tool to search the web for information.
"""
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
@@ -81,55 +51,23 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location", "format"],
)
restaurant_function = FunctionSchema(
name="get_restaurant_recommendation",
description="Get a restaurant recommendation",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
},
required=["location"],
)
search_tool = {"google_search": {}}
# KNOWN ISSUE: If using GeminiVertexLiveLLMService, it appears
# you cannot use the "google_search" tool alongside other tools.
# See https://github.com/googleapis/python-genai/issues/941.
tools = ToolsSchema(
standard_tools=[weather_function, restaurant_function],
custom_tools={AdapterType.GEMINI: [search_tool]},
)
llm = GeminiLiveLLMService(
api_key=os.environ["GOOGLE_API_KEY"],
settings=GeminiLiveLLMService.Settings(
system_instruction=system_instruction,
voice="Aoede", # Puck, Charon, Kore, Fenrir, Aoede
# system_instruction="Talk like a pirate."
),
tools=tools,
# inference_on_context_initialization=False,
)
llm.register_function("get_current_weather", fetch_weather_from_api)
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
context = LLMContext()
context = LLMContext(
[
{
"role": "user",
"content": "Say hello. Then ask if I want to hear a joke.",
},
],
)
# Server-side VAD is enabled by default; no local VAD is added.
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
@@ -156,9 +94,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")

View File

@@ -1,179 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example: async function call with the Grok Realtime LLM service.
The ``get_current_weather`` tool is registered with
``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
While the call is in flight the conversation continues; the result arrives
later via the async-tool mechanism and is forwarded to Grok Realtime as a
``function_call_output`` so the model can integrate it naturally into its
next turn.
"""
import asyncio
import os
import random
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.xai.realtime.events import SessionProperties
from pipecat.services.xai.realtime.llm import GrokRealtimeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
# Simulate a long-running API call so we can demonstrate that the
# conversation continues while the tool is in flight.
await asyncio.sleep(10)
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"
else random.randint(15, 30)
)
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"location": params.arguments["location"],
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
system_instruction = (
"You are a friendly assistant. The user and you will engage in a spoken "
"dialog exchanging the transcripts of a natural real-time conversation. "
"Keep your responses short, generally two or three sentences for chatty "
"scenarios. When the user asks for the weather, call get_current_weather. "
"While you wait for the result, keep chatting with the user. When the "
"result arrives, share it with the user naturally."
)
# Note: Grok has built-in server-side VAD, so we don't need local VAD.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = GrokRealtimeLLMService(
api_key=os.environ["XAI_API_KEY"],
settings=GrokRealtimeLLMService.Settings(
system_instruction=system_instruction,
session_properties=SessionProperties(
voice="Ara",
),
),
)
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
cancel_on_interruption=False,
)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -28,14 +28,10 @@ Usage:
"""
import os
import random
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.frames.frames import LLMRunFrame
from pipecat.observers.loggers.transcription_log_observer import (
TranscriptionLogObserver,
@@ -52,7 +48,6 @@ from pipecat.processors.aggregators.llm_response_universal import (
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -60,43 +55,6 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"
else random.randint(15, 30)
)
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"location": params.arguments["location"],
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
# --- Transport Configuration ---
# No local VAD needed — Inworld's server-side semantic VAD handles turn detection.
@@ -127,7 +85,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# See: https://docs.inworld.ai/router/introduction
llm = InworldRealtimeLLMService(
api_key=os.environ["INWORLD_API_KEY"],
llm_model="openai/gpt-4.1-mini",
llm_model="xai/grok-4-1-fast-non-reasoning",
voice="Sarah",
settings=InworldRealtimeLLMService.Settings(
system_instruction="""You are a helpful and friendly AI assistant powered by Inworld.
@@ -139,14 +97,9 @@ Always be helpful and proactive in offering assistance.""",
),
)
# Note: function calling requires a paid Inworld account and a
# function-calling-capable model
llm.register_function("get_current_weather", fetch_weather_from_api)
# Create context with initial message + tools
# Create context with initial message
context = LLMContext(
[{"role": "developer", "content": "Say hello and introduce yourself!"}],
tools,
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)

View File

@@ -1,198 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example: async function call with the OpenAI Realtime LLM service.
The ``get_current_weather`` tool is registered with
``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
While the call is in flight the conversation continues; the result arrives
later via the async-tool mechanism and is forwarded to OpenAI Realtime as a
``function_call_output`` so the model can integrate it naturally into its
next turn.
"""
import asyncio
import os
import random
from datetime import datetime
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.realtime.events import (
AudioConfiguration,
AudioInput,
InputAudioNoiseReduction,
InputAudioTranscription,
SemanticTurnDetection,
SessionProperties,
)
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
# Simulate a long-running API call so we can demonstrate that the
# conversation continues while the tool is in flight.
await asyncio.sleep(10)
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"
else random.randint(15, 30)
)
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"location": params.arguments["location"],
"format": params.arguments["format"],
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
tools = ToolsSchema(standard_tools=[weather_function])
system_instruction = (
"You are a friendly assistant. The user and you will engage in a spoken "
"dialog exchanging the transcripts of a natural real-time conversation. "
"Keep your responses short, generally two or three sentences for chatty "
"scenarios. When the user asks for the weather, call get_current_weather. "
"While you wait for the result, keep chatting with the user. When the "
"result arrives, share it with the user naturally."
)
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = OpenAIRealtimeLLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIRealtimeLLMService.Settings(
system_instruction=system_instruction,
session_properties=SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(),
turn_detection=SemanticTurnDetection(),
noise_reduction=InputAudioNoiseReduction(type="near_field"),
)
),
),
),
)
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
cancel_on_interruption=False,
)
context = LLMContext(tools=tools)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -232,20 +232,6 @@ Remember, your responses should be short. Just one or two sentences, usually. Re
# [LLMUpdateSettingsFrame(settings=SessionProperties(tools=new_tools).model_dump())]
# )
# Reasoning effort can be changed at runtime too. Only
# reasoning-capable Realtime models (e.g. gpt-realtime-2) support this.
# await task.queue_frames(
# [
# LLMUpdateSettingsFrame(
# delta=OpenAIRealtimeLLMService.Settings(
# session_properties=SessionProperties(
# reasoning=Reasoning(effort="xhigh"),
# ),
# )
# )
# ]
# )
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")

View File

@@ -1,186 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example: async function call with the Ultravox Realtime LLM service.
The ``get_current_weather`` tool is registered with
``cancel_on_interruption=False`` and simulates a slow API call (10s sleep).
Ultravox's API freezes the conversation between ``client_tool_invocation``
and the matching ``client_tool_result``, so the service ships a placeholder
``client_tool_result`` immediately when an async-registered function is
invoked (to unfreeze the conversation). When the real tool finishes, the
actual result is injected as user-side text so the model picks it up.
"""
import asyncio
import datetime
import os
import random
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.function_schema import FunctionSchema
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.ultravox.llm import OneShotInputParams, UltravoxRealtimeLLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_stop import SpeechTimeoutUserTurnStopStrategy
from pipecat.turns.user_turn_strategies import UserTurnStrategies
load_dotenv(override=True)
async def fetch_weather_from_api(params: FunctionCallParams):
# Simulate a long-running API call so we can demonstrate that the
# conversation continues while the tool is in flight.
await asyncio.sleep(10)
temperature = (
random.randint(60, 85)
if params.arguments["format"] == "fahrenheit"
else random.randint(15, 30)
)
await params.result_callback(
{
"conditions": "nice",
"temperature": temperature,
"location": params.arguments["location"],
"format": params.arguments["format"],
"timestamp": datetime.datetime.now().strftime("%Y%m%d_%H%M%S"),
}
)
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
required=["location", "format"],
)
system_prompt = (
"You are a friendly assistant. The user and you will engage in a spoken "
"dialog exchanging the transcripts of a natural real-time conversation. "
"Keep your responses short, generally two or three sentences for chatty "
"scenarios. When the user asks for the weather, call get_current_weather. "
"While you wait for the result, keep chatting with the user. When the "
"result arrives, share it with the user naturally."
)
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
llm = UltravoxRealtimeLLMService(
params=OneShotInputParams(
api_key=os.environ["ULTRAVOX_API_KEY"],
system_prompt=system_prompt,
temperature=0.3,
max_duration=datetime.timedelta(minutes=3),
),
one_shot_selected_tools=ToolsSchema(standard_tools=[weather_function]),
)
llm.register_function(
"get_current_weather",
fetch_weather_from_api,
cancel_on_interruption=False,
)
context = LLMContext([])
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
user_turn_strategies=UserTurnStrategies(
stop=[SpeechTimeoutUserTurnStopStrategy()],
),
vad_analyzer=SileroVADAnalyzer(),
),
)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -51,6 +51,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GradiumSTTService(
api_key=os.environ["GRADIUM_API_KEY"],
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
settings=GradiumSTTService.Settings(
language=Language.EN,
delay_in_frames=8,

View File

@@ -49,7 +49,13 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])
stt = OpenAIRealtimeSTTService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
),
)
tl = TranscriptionLogger()
vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())

View File

@@ -1,134 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example of using OpenAI Realtime voice LLM service with Vonage Video Connector transport."""
import asyncio
import os
import sys
from collections.abc import Callable
from typing import Any
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.vonage import configure
from pipecat.services.openai.realtime.events import (
AudioConfiguration,
AudioInput,
InputAudioNoiseReduction,
InputAudioTranscription,
SemanticTurnDetection,
SessionProperties,
)
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
from pipecat.transports.vonage.video_connector import (
VonageVideoConnectorTransport,
VonageVideoConnectorTransportParams,
)
load_dotenv(override=True)
logger.remove(0)
logger.add(sys.stderr, level="DEBUG")
async def main() -> None:
"""Main entry point for the OpenAI Realtime vonage video connector example."""
(application_id, session_id, token) = await configure()
transport = VonageVideoConnectorTransport(
application_id,
session_id,
token,
VonageVideoConnectorTransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
publisher_name="Bot",
),
)
llm = OpenAIRealtimeLLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIRealtimeLLMService.Settings(
system_instruction="""You are a helpful and friendly AI.
Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.
If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly.
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
session_properties=SessionProperties(
audio=AudioConfiguration(
input=AudioInput(
transcription=InputAudioTranscription(),
turn_detection=SemanticTurnDetection(),
noise_reduction=InputAudioNoiseReduction(type="near_field"),
)
),
),
),
)
context = LLMContext(
[{"role": "developer", "content": "Say hello!"}],
)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
user_aggregator,
llm,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
observers=[TranscriptionLogObserver()],
)
event_handler: Callable[[str], Callable[[Any], Any]] = transport.event_handler
@event_handler("on_client_connected")
async def on_client_connected(transport: VonageVideoConnectorTransport, client: object) -> None:
logger.info("Client connected")
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner()
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,201 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example 22: Filter Incomplete Turns
Demonstrates LLM-based turn completion detection to suppress bot responses when
the user was cut off mid-thought. The LLM outputs one of three markers:
- ✓ (complete): User finished their thought, respond normally
- ○ (incomplete short): User was cut off, wait ~5s then prompt
- ◐ (incomplete long): User needs time to think, wait ~10s then prompt
When incomplete is detected, the bot's response is suppressed. After the timeout
expires, the LLM is automatically prompted to re-engage the user.
"""
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
AssistantTurnStoppedMessage,
LLMContextAggregatorPair,
LLMUserAggregatorParams,
UserTurnStoppedMessage,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def get_weather(params: FunctionCallParams, location: str):
"""Return the current weather for a location.
A stub that always reports the same conditions — replace with a real
weather API in production.
Args:
location (str): The city and state or country, e.g. "Paris, France".
"""
await params.result_callback(
{
"location": location,
"temperature_celsius": 22,
"conditions": "partly cloudy",
}
)
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction=(
"You are a helpful assistant in a voice conversation. Your "
"responses will be spoken aloud, so avoid emojis, bullet "
"points, or other formatting that can't be spoken. Respond to "
"what the user said in a creative, helpful, and brief way. "
"If the user asks about the weather, call the get_weather "
"tool and speak the result back naturally."
),
),
)
llm.register_direct_function(get_weather)
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
context = LLMContext(tools=ToolsSchema(standard_tools=[get_weather]))
# `FilterIncompleteUserTurnStrategies` pairs the default detector
# chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
# trigger LLM inference but the public `on_user_turn_stopped` event
# fires only when the LLM confirms ✓. The LLM marks each response
# with one of:
# ✓ = complete (respond normally)
# ○ = incomplete short (wait 5s, then prompt)
# ◐ = incomplete long (wait 10s, then prompt)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
user_turn_strategies=FilterIncompleteUserTurnStrategies(
# Optional: customize turn completion behavior
# config=UserTurnCompletionConfig(
# incomplete_short_timeout=5.0,
# incomplete_long_timeout=10.0,
# incomplete_short_prompt="Custom prompt...",
# incomplete_long_prompt="Custom prompt...",
# instructions="Custom turn completion instructions...",
# ),
),
),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt,
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
@user_aggregator.event_handler("on_user_turn_stopped")
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
line = f"{timestamp}user: {message.content}"
logger.info(f"Transcript: {line}")
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
line = f"{timestamp}assistant: {message.content}"
logger.info(f"Transcript: {line}")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -10,7 +10,7 @@ Demonstrates LLM-based turn completion detection to suppress bot responses when
the user was cut off mid-thought. The LLM outputs one of three markers:
- ✓ (complete): User finished their thought, respond normally
- ○ (incomplete short): User was cut off, wait ~5s then prompt
- ◐ (incomplete long): User needs time to think, wait ~10s then prompt
- ◐ (incomplete long): User needs time to think, wait ~15s then prompt
When incomplete is detected, the bot's response is suppressed. After the timeout
expires, the LLM is automatically prompted to re-engage the user.
@@ -41,7 +41,6 @@ from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies
load_dotenv(override=True)
@@ -84,28 +83,23 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
)
context = LLMContext()
# `FilterIncompleteUserTurnStrategies` pairs the default detector
# chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
# trigger LLM inference but the public `on_user_turn_stopped` event
# fires only when the LLM confirms ✓. The LLM marks each response
# with one of:
# ✓ = complete (respond normally)
# ○ = incomplete short (wait 5s, then prompt)
# ◐ = incomplete long (wait 10s, then prompt)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(
vad_analyzer=SileroVADAnalyzer(),
user_turn_strategies=FilterIncompleteUserTurnStrategies(
# Optional: customize turn completion behavior
# config=UserTurnCompletionConfig(
# incomplete_short_timeout=5.0,
# incomplete_long_timeout=10.0,
# incomplete_short_prompt="Custom prompt...",
# incomplete_long_prompt="Custom prompt...",
# instructions="Custom turn completion instructions...",
# ),
),
# Enable turn completion filtering - the LLM will output:
# ✓ = complete (respond normally)
# ○ = incomplete short (wait 5s, then prompt)
# ◐ = incomplete long (wait 15s, then prompt)
filter_incomplete_user_turns=True,
# Optional: customize turn completion behavior
# turn_completion_config=TurnCompletionConfig(
# incomplete_short_timeout=5.0,
# incomplete_long_timeout=15.0,
# incomplete_short_prompt="Custom prompt...",
# incomplete_long_prompt="Custom prompt...",
# instructions="Custom turn completion instructions...",
# ),
),
)

View File

@@ -50,7 +50,10 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = GradiumSTTService(api_key=os.environ["GRADIUM_API_KEY"])
stt = GradiumSTTService(
api_key=os.environ["GRADIUM_API_KEY"],
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
)
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],

View File

@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.services.soniox.tts import SonioxTTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
@@ -53,7 +53,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])
tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
@@ -98,9 +103,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
await task.queue_frames([LLMRunFrame()])
await asyncio.sleep(10)
logger.info("Updating Soniox STT settings: language_hints=[es]")
logger.info("Updating Soniox STT settings: language=es")
await task.queue_frame(
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
)
@transport.event_handler("on_client_disconnected")

View File

@@ -55,6 +55,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = GradiumTTSService(
api_key=os.environ["GRADIUM_API_KEY"],
settings=GradiumTTSService.Settings(voice="YTpq7expH9539ERJ"),
url="wss://us.api.gradium.ai/api/speech/tts",
)
llm = OpenAILLMService(

View File

@@ -54,6 +54,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = GradiumSTTService(
api_key=os.environ["GRADIUM_API_KEY"],
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
settings=GradiumSTTService.Settings(
language=Language.EN,
),
@@ -61,6 +62,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
tts = GradiumTTSService(
api_key=os.environ["GRADIUM_API_KEY"],
url="wss://us.api.gradium.ai/api/speech/tts",
settings=GradiumTTSService.Settings(
voice="YTpq7expH9539ERJ",
),

View File

@@ -1,129 +0,0 @@
#
# Copyright (c) 2024-2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
# For a full example of how to deploy to SageMaker, see:
# https://github.com/pipecat-ai/pipecat-examples/tree/main/nvidia_sagemaker_example/deployment/aws-sagemaker-nvidia
import os
from dotenv import load_dotenv
from loguru import logger
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.nvidia.llm import NvidiaLLMService
from pipecat.services.nvidia.sagemaker.stt import NvidiaSageMakerSTTService
from pipecat.services.nvidia.sagemaker.tts import NvidiaSageMakerTTSService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
# We use lambdas to defer transport parameter creation until the transport
# type is selected at runtime.
transport_params = {
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
),
}
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = NvidiaSageMakerSTTService(
endpoint_name=os.environ["SAGEMAKER_ASR_ENDPOINT_NAME"],
region=os.getenv("AWS_REGION", "us-west-2"),
)
llm = NvidiaLLMService(
api_key=os.environ["NVIDIA_API_KEY"],
settings=NvidiaLLMService.Settings(
model="meta/llama-3.3-70b-instruct",
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = NvidiaSageMakerTTSService(
endpoint_name=os.environ["SAGEMAKER_MAGPIE_ENDPOINT_NAME"],
region=os.getenv("AWS_REGION", "us-west-2"),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(), # Transport user input
stt, # STT
user_aggregator, # User responses
llm, # LLM
tts, # TTS
transport.output(), # Transport bot output
assistant_aggregator, # Assistant spoken responses
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(
enable_metrics=True,
enable_usage_metrics=True,
),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -25,6 +25,7 @@ from pipecat.runner.utils import create_transport
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.openai.stt import OpenAIRealtimeSTTService
from pipecat.services.openai.tts import OpenAITTSService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
@@ -52,7 +53,14 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = OpenAIRealtimeSTTService(api_key=os.environ["OPENAI_API_KEY"])
stt = OpenAIRealtimeSTTService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAIRealtimeSTTService.Settings(
model="gpt-4o-transcribe",
prompt="Expect words related to dogs, such as breed names.",
language=Language.EN,
),
)
tts = OpenAITTSService(
api_key=os.environ["OPENAI_API_KEY"],
@@ -64,7 +72,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
system_instruction="You are very knowledgable about dogs. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
),
)

View File

@@ -58,7 +58,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Add strict mode to enforce the language hints
language_hints=[Language.EN],
language_hints_strict=True,
enable_language_identification=True,
),
)

View File

@@ -77,7 +77,6 @@ groq = [ "groq>=0.23.0,<2" ]
gstreamer = [ "pygobject~=3.50.0" ]
heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
hume = [ "hume>=0.11.2,<1" ]
inception = []
inworld = [ "pipecat-ai[websockets-base]" ]
koala = [ "pvkoala~=2.0.3" ]
kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
@@ -104,7 +103,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
qwen = []
resembleai = [ "pipecat-ai[websockets-base]" ]
rime = [ "pipecat-ai[websockets-base]" ]
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-small-webrtc-prebuilt>=2.5.0"]
sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
sambanova = []
sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
@@ -120,7 +119,6 @@ tavus = [ "pipecat-ai[daily]" ]
together = []
tracing = [ "opentelemetry-sdk>=1.33.0,<2", "opentelemetry-api>=1.33.0,<2", "opentelemetry-instrumentation>=0.54b0,<1" ]
ultravox = [ "pipecat-ai[websockets-base]" ]
vonage-video-connector = [ "vonage-video-connector~=0.2.3b0; python_full_version>='3.13' and python_full_version<'3.14' and platform_system=='Linux'" ]
webrtc = [ "aiortc>=1.14.0,<2", "opencv-python>=4.11.0.86,<5" ]
websocket = [ "pipecat-ai[websockets-base]", "fastapi>=0.115.6,<1" ]
websockets-base = [ "websockets>=13.1,<16.0" ]

View File

@@ -198,7 +198,6 @@ TESTS_FUNCTION_CALLING = [
("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
("function-calling/function-calling-novita.py", EVAL_WEATHER),
("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
("function-calling/function-calling-inception.py", EVAL_WEATHER),
# Video
("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),
@@ -224,11 +223,12 @@ TESTS_REALTIME = [
# ("realtime/realtime-azure.py", EVAL_WEATHER),
("realtime/realtime-openai-text.py", EVAL_WEATHER),
("realtime/realtime-openai-live-video.py", EVAL_VISION_CAMERA),
("realtime/realtime-gemini-live.py", EVAL_WEATHER),
("realtime/realtime-gemini-live.py", EVAL_SIMPLE_MATH),
("realtime/realtime-gemini-live-local-vad.py", EVAL_SIMPLE_MATH),
("realtime/realtime-gemini-live-function-calling.py", EVAL_WEATHER),
("realtime/realtime-gemini-live-video.py", EVAL_VISION_CAMERA),
("realtime/realtime-gemini-live-google-search.py", EVAL_ONLINE_SEARCH),
("realtime/realtime-gemini-live-vertex.py", EVAL_WEATHER),
("realtime/realtime-gemini-live-vertex-function-calling.py", EVAL_WEATHER),
("realtime/realtime-aws-nova-sonic.py", EVAL_SIMPLE_MATH),
("realtime/realtime-ultravox.py", EVAL_ORDER),
("realtime/realtime-grok.py", EVAL_WEATHER),
@@ -243,7 +243,6 @@ TESTS_VIDEO_AVATAR = [
TESTS_TURN_MANAGEMENT = [
("turn-management/turn-management-filter-incomplete-turns.py", EVAL_COMPLETE_TURN),
("turn-management/turn-management-filter-incomplete-turns-function-calling.py", EVAL_WEATHER),
]
TESTS_THINKING = [

View File

@@ -139,36 +139,6 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
return formatted_standard_tools + custom_gemini_tools
@staticmethod
def to_function_response_dict(content: Any) -> dict[str, Any]:
"""Convert a tool-result content value to Gemini's FunctionResponse.response shape.
Gemini's ``FunctionResponse.response`` field requires a dict, so
non-dict values (e.g. plain strings, JSON-encoded scalars, or
sentinel strings like ``"COMPLETED"`` used when a function returned
no value) are wrapped as ``{"value": <value>}``. JSON strings that
decode to a dict are passed through as-is.
Args:
content: The tool-result content. Typically the JSON-encoded
return value of a function, but can also be a plain string
(e.g. ``"COMPLETED"``) or already-parsed dict.
Returns:
A dict suitable for ``FunctionResponse.response``.
"""
if isinstance(content, dict):
return content
if not isinstance(content, str):
return {"value": content}
try:
decoded = json.loads(content)
except (json.JSONDecodeError, ValueError):
return {"value": content}
if isinstance(decoded, dict):
return decoded
return {"value": decoded}
def get_messages_for_logging(self, context: LLMContext) -> list[dict[str, Any]]:
"""Get messages from a universal LLM context in a format ready for logging about Gemini.
@@ -412,7 +382,16 @@ class GeminiLLMAdapter(BaseLLMAdapter[GeminiLLMInvocationParams]):
)
elif role == "tool":
role = "user"
response_dict = self.to_function_response_dict(msg["content"])
try:
response = json.loads(msg["content"])
if isinstance(response, dict):
response_dict = response
else:
response_dict = {"value": response}
except Exception as e:
# Response might not be JSON-deserializable.
# This occurs with a UserImageFrame, for example, where we get a plain "COMPLETED" string.
response_dict = {"value": msg["content"]}
# Get function name from mapping using tool_call_id, or fallback
tool_call_id = msg.get("tool_call_id")

View File

@@ -10,8 +10,6 @@ This module provides an audio resampler that uses the resampy library
for high-quality audio sample rate conversion.
"""
import warnings
import numpy as np
import resampy
@@ -23,11 +21,6 @@ class ResampyResampler(BaseAudioResampler):
This resampler uses the resampy library's Kaiser windowing filter
for high-quality audio resampling with good performance characteristics.
.. deprecated:: 1.2.0
ResampyResampler is deprecated and will be removed in Pipecat 2.0.
Use SOXRAudioResampler, create_file_resampler(), or create_stream_resampler()
instead.
"""
def __init__(self, **kwargs):
@@ -36,15 +29,7 @@ class ResampyResampler(BaseAudioResampler):
Args:
**kwargs: Additional keyword arguments (currently unused).
"""
with warnings.catch_warnings():
warnings.simplefilter("always")
warnings.warn(
"ResampyResampler is deprecated and will be removed in Pipecat 2.0. "
"Use SOXRAudioResampler, create_file_resampler(), or "
"create_stream_resampler() instead.",
DeprecationWarning,
stacklevel=2,
)
pass
async def resample(self, audio: bytes, in_rate: int, out_rate: int) -> bytes:
"""Resample audio data using resampy library.

View File

@@ -339,40 +339,6 @@ class LLMTextFrame(TextFrame):
self.includes_inter_frame_spaces = True
@dataclass
class LLMMarkerFrame(DataFrame):
"""Sideband marker emitted by an LLM service.
A marker is short, structured assistant output that should be
persisted in the conversation context but should not flow through
the standard text path (TTS, transcript). The assistant aggregator
writes the marker to the context so the LLM can self-condition on
prior markers on subsequent turns.
The primary use today is the ``filter_incomplete_user_turns``
protocol, where ``UserTurnCompletionLLMServiceMixin`` emits the
turn-completion markers ✓ / ○ / ◐ on every response. The frame is
intentionally generic so other components — STT services with
built-in turn signals, end-of-turn classifiers, custom annotations,
etc. — can use the same mechanism to inject sideband signals into
the assistant context.
Parameters:
marker: The marker payload (typically a short string such as a
single character).
append_to_context_immediately: If True, the marker is written
to the context as its own standalone assistant message as
soon as it's received. If False, the marker is appended to
the running assistant aggregation and flushed to the
context together with the following text as a single
message (e.g. for the ✓ case the context message ends up
as "✓ <response>").
"""
marker: str
append_to_context_immediately: bool = True
@dataclass
class AggregatedTextFrame(TextFrame):
"""Text frame representing an aggregation of TextFrames.
@@ -383,14 +349,10 @@ class AggregatedTextFrame(TextFrame):
Parameters:
aggregated_by: Method used to aggregate the text frames.
context_id: Unique identifier for the TTS context that generated this text.
raw_text: The full matched text including start/end pattern delimiters, set when
this frame was produced from a PatternMatch (e.g. a ``<code>...</code>`` block).
None for ordinary sentence aggregations.
"""
aggregated_by: AggregationType | str
context_id: str | None = None
raw_text: str | None = None
@dataclass
@@ -699,11 +661,6 @@ class FunctionCallResultProperties:
is_final: Whether this is the final result for the function call. When
``False`` the result is treated as an intermediate update. Defaults to ``True``.
Only meaningful for async function calls (``cancel_on_interruption=False``).
Note: realtime LLM services do not support streamed intermediate
results; they deliver only the final result to the provider. An
intermediate result reported to a realtime service is dropped
and an error is raised. Use a non-realtime LLM service if your
tool needs to stream intermediate results.
"""
run_llm: bool | None = None
@@ -1013,24 +970,6 @@ class UserSpeakingFrame(SystemFrame):
pass
@dataclass
class UserTurnInferenceCompletedFrame(SystemFrame):
"""Frame indicating that the user turn is semantically complete.
Emitted by any component that can judge conversational turn
completeness — for example an LLM with turn-completion markers, an
STT service with built-in turn detection, or a dedicated
end-of-turn classifier. Stop strategies that gate the
user-turn-stop event on an external completeness signal (e.g.
``LLMTurnCompletionUserTurnStopStrategy``) consume this frame to
finalize the turn. Producers should emit this frame only when they
judge the turn complete; an absence of this frame means the turn is
not yet considered complete.
"""
pass
@dataclass
class VADUserStartedSpeakingFrame(SystemFrame):
"""Frame emitted when VAD definitively detects user started speaking.

View File

@@ -303,7 +303,7 @@ class PipelineTask(BasePipelineTask):
# This task maneger will handle all the asyncio tasks created by this
# PipelineTask and its frame processors.
self._pipeline_task_manager = task_manager or TaskManager()
self._task_manager = task_manager or TaskManager()
# This queue is the queue used to push frames to the pipeline.
self._push_queue = asyncio.Queue()
@@ -386,7 +386,7 @@ class PipelineTask(BasePipelineTask):
# The task observer acts as a proxy to the provided observers. This way,
# we only need to pass a single observer (using the StartFrame) which
# then just acts as a proxy.
self._observer = TaskObserver(observers=observers)
self._observer = TaskObserver(observers=observers, task_manager=self._task_manager)
# These events can be used to check which frames make it to the source
# or sink processors. Instead of calling the event handlers for every
@@ -654,24 +654,32 @@ class PipelineTask(BasePipelineTask):
async def _create_tasks(self):
"""Create and start all pipeline processing tasks."""
self._process_push_task = self.create_task(self._process_push_queue())
self._process_push_task = self._task_manager.create_task(
self._process_push_queue(), f"{self}::_process_push_queue"
)
return self._process_push_task
def _maybe_start_heartbeat_tasks(self):
"""Start heartbeat tasks if heartbeats are enabled and not already running."""
if self._params.enable_heartbeats and self._heartbeat_push_task is None:
self._heartbeat_push_task = self.create_task(self._heartbeat_push_handler())
self._heartbeat_monitor_task = self.create_task(self._heartbeat_monitor_handler())
self._heartbeat_push_task = self._task_manager.create_task(
self._heartbeat_push_handler(), f"{self}::_heartbeat_push_handler"
)
self._heartbeat_monitor_task = self._task_manager.create_task(
self._heartbeat_monitor_handler(), f"{self}::_heartbeat_monitor_handler"
)
def _maybe_start_idle_task(self):
"""Start idle monitoring task if idle timeout is configured."""
if self._idle_timeout_secs:
self._idle_monitor_task = self.create_task(self._idle_monitor_handler())
self._idle_monitor_task = self._task_manager.create_task(
self._idle_monitor_handler(), f"{self}::_idle_monitor_handler"
)
async def _cancel_tasks(self):
"""Cancel all running pipeline tasks."""
if self._process_push_task:
await self.cancel_task(self._process_push_task)
await self._task_manager.cancel_task(self._process_push_task)
self._process_push_task = None
await self._maybe_cancel_heartbeat_tasks()
@@ -683,17 +691,17 @@ class PipelineTask(BasePipelineTask):
return
if self._heartbeat_push_task:
await self.cancel_task(self._heartbeat_push_task)
await self._task_manager.cancel_task(self._heartbeat_push_task)
self._heartbeat_push_task = None
if self._heartbeat_monitor_task:
await self.cancel_task(self._heartbeat_monitor_task)
await self._task_manager.cancel_task(self._heartbeat_monitor_task)
self._heartbeat_monitor_task = None
async def _maybe_cancel_idle_task(self):
"""Cancel idle monitoring task if it is running."""
if self._idle_monitor_task:
await self.cancel_task(self._idle_monitor_task)
await self._task_manager.cancel_task(self._idle_monitor_task)
self._idle_monitor_task = None
def _initial_metrics_frame(self) -> MetricsFrame:
@@ -751,14 +759,12 @@ class PipelineTask(BasePipelineTask):
async def _setup(self, params: PipelineTaskParams):
"""Set up the pipeline task and all processors."""
await super().setup(self._pipeline_task_manager)
mgr_params = TaskManagerParams(loop=params.loop)
self.task_manager.setup(mgr_params)
self._task_manager.setup(mgr_params)
setup = FrameProcessorSetup(
clock=self._clock,
task_manager=self.task_manager,
task_manager=self._task_manager,
observer=self._observer,
pipeline_task=self,
# Populate the deprecated `tool_resources` field for backwards
@@ -774,7 +780,6 @@ class PipelineTask(BasePipelineTask):
await self._load_setup_files()
# Start task observer.
await self._observer.setup(self.task_manager)
await self._observer.start()
async def _cleanup(self, cleanup_pipeline: bool):
@@ -1014,7 +1019,7 @@ class PipelineTask(BasePipelineTask):
def _print_dangling_tasks(self):
"""Log any dangling tasks that haven't been properly cleaned up."""
tasks = [t.get_name() for t in self.task_manager.current_tasks()]
tasks = [t.get_name() for t in self._task_manager.current_tasks()]
if tasks:
logger.warning(f"{self} dangling tasks detected: {tasks}")

View File

@@ -17,6 +17,7 @@ from typing import Any
from attr import dataclass
from pipecat.observers.base_observer import BaseObserver, FrameProcessed, FramePushed
from pipecat.utils.asyncio.task_manager import BaseTaskManager
@dataclass
@@ -61,16 +62,19 @@ class TaskObserver(BaseObserver):
self,
*,
observers: list[BaseObserver] | None = None,
task_manager: BaseTaskManager,
**kwargs,
):
"""Initialize the TaskObserver.
Args:
observers: List of observers to manage. Defaults to empty list.
task_manager: Task manager for creating and managing observer tasks.
**kwargs: Additional arguments passed to the base observer.
"""
super().__init__(**kwargs)
self._observers = observers or []
self._task_manager = task_manager
self._proxies: dict[BaseObserver, Proxy] | None = (
None # Becomes a dict after start() is called
)
@@ -102,7 +106,7 @@ class TaskObserver(BaseObserver):
# Remove the proxy so it doesn't get called anymore.
del self._proxies[observer]
# Cancel the proxy task right away.
await self.cancel_task(proxy.task)
await self._task_manager.cancel_task(proxy.task)
# Remove the observer from the list.
if observer in self._observers:
@@ -118,7 +122,7 @@ class TaskObserver(BaseObserver):
return
for proxy in self._proxies.values():
await self.cancel_task(proxy.task)
await self._task_manager.cancel_task(proxy.task)
async def cleanup(self):
"""Cleanup all proxy observers."""
@@ -153,8 +157,9 @@ class TaskObserver(BaseObserver):
def _create_proxy(self, observer: BaseObserver) -> Proxy:
"""Create a proxy for a single observer."""
queue = asyncio.Queue()
task = self.create_task(
self._proxy_task_handler(queue, observer), f"{observer}::_proxy_task_handler"
task = self._task_manager.create_task(
self._proxy_task_handler(queue, observer),
f"TaskObserver::{observer}::_proxy_task_handler",
)
proxy = Proxy(queue=queue, task=task, observer=observer)
return proxy

View File

@@ -22,9 +22,9 @@ progresses:
This module is the single source of truth for the on-the-wire payload shape:
- The aggregator uses the ``build_*_message`` functions when injecting messages.
- Realtime LLM services use ``parse_message`` to detect async-tool messages
while iterating the context, then read ``payload.result`` and deliver it via
their formal tool-result channel.
- Realtime LLM services use ``parse_message`` when scanning the context for
async-tool messages to forward to their providers, then
``prepare_message_payload_for_realtime`` to produce a wire-ready string.
Internally, ``AsyncToolMessagePayload`` is the canonical structured form;
the on-the-wire JSON string is always derived from it (never stored) so the
@@ -284,3 +284,87 @@ def parse_message(message: LLMStandardMessage) -> AsyncToolMessagePayload | None
description=description,
result=result,
)
# --- Realtime preparation ----------------------------------------------------
def prepare_message_payload_for_realtime(
payload: AsyncToolMessagePayload,
*,
template: str | None = None,
) -> str:
"""Prepare an async-tool message payload for sending to a realtime LLM service.
Realtime services that fully honor the async-tool mechanism send the
``started`` payload via the formal tool-result channel and the subsequent
``intermediate`` / ``final`` payloads as text injected mid-conversation;
this function returns the string to send in either case, and callers
route it to the appropriate channel.
Args:
payload: The parsed async-tool message payload.
template: Optional format string. If provided, the rendered output is
``template.format(tool_call_id=…, status=…, result=…, description=…)``.
If ``None``, the payload is serialized to its canonical JSON
form. Per-kind helpers ultimately decide what to do with the
template, so future per-kind tweaks (e.g. raising for a kind
that shouldn't accept templates) can be added without changing
this signature.
Returns:
The prepared string, ready to be sent to the realtime service.
"""
if payload.kind == "started":
return _prepare_started_message_payload_for_realtime(payload, template=template)
if payload.kind == "intermediate":
return _prepare_intermediate_result_message_payload_for_realtime(payload, template=template)
if payload.kind == "final":
return _prepare_final_result_message_payload_for_realtime(payload, template=template)
raise ValueError(f"Unknown async-tool message payload kind: {payload.kind!r}")
def _prepare_started_message_payload_for_realtime(
payload: AsyncToolMessagePayload,
*,
template: str | None = None,
) -> str:
if template is None:
return _payload_to_json(payload)
return _format_with_template(payload, template)
def _prepare_intermediate_result_message_payload_for_realtime(
payload: AsyncToolMessagePayload,
*,
template: str | None = None,
) -> str:
if template is None:
return _payload_to_json(payload)
return _format_with_template(payload, template)
def _prepare_final_result_message_payload_for_realtime(
payload: AsyncToolMessagePayload,
*,
template: str | None = None,
) -> str:
if template is None:
return _payload_to_json(payload)
return _format_with_template(payload, template)
def _format_with_template(payload: AsyncToolMessagePayload, template: str) -> str:
"""Render a payload via a caller-supplied template.
Available substitution keys: ``tool_call_id``, ``status``, ``result``,
``description``. Note that ``result`` is empty for ``started`` payloads
(no result has been produced yet); callers building templates intended
for ``started`` should not rely on it.
"""
return template.format(
tool_call_id=payload.tool_call_id,
status=payload.status,
result=payload.result or "",
description=payload.description,
)

View File

@@ -25,7 +25,6 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.vad_analyzer import VADAnalyzer
from pipecat.audio.vad.vad_controller import VADController
from pipecat.frames.frames import (
AggregatedTextFrame,
AssistantImageRawFrame,
BotStartedSpeakingFrame,
BotStoppedSpeakingFrame,
@@ -45,7 +44,6 @@ from pipecat.frames.frames import (
LLMContextSummaryRequestFrame,
LLMFullResponseEndFrame,
LLMFullResponseStartFrame,
LLMMarkerFrame,
LLMMessagesAppendFrame,
LLMMessagesTransformFrame,
LLMMessagesUpdateFrame,
@@ -55,6 +53,7 @@ from pipecat.frames.frames import (
LLMThoughtEndFrame,
LLMThoughtStartFrame,
LLMThoughtTextFrame,
LLMUpdateSettingsFrame,
StartFrame,
TextFrame,
TranscriptionFrame,
@@ -74,23 +73,20 @@ from pipecat.processors.aggregators.llm_context import (
LLMContextMessage,
LLMSpecificMessage,
NotGiven,
is_given,
)
from pipecat.processors.aggregators.llm_context_summarizer import (
LLMContextSummarizer,
SummaryAppliedEvent,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.services.settings import LLMSettings
from pipecat.turns.user_idle_controller import UserIdleController
from pipecat.turns.user_mute import BaseUserMuteStrategy
from pipecat.turns.user_start import BaseUserTurnStartStrategy, UserTurnStartedParams
from pipecat.turns.user_stop import BaseUserTurnStopStrategy, UserTurnStoppedParams
from pipecat.turns.user_turn_completion_mixin import UserTurnCompletionConfig
from pipecat.turns.user_turn_controller import UserTurnController
from pipecat.turns.user_turn_strategies import (
FilterIncompleteUserTurnStrategies,
UserTurnStrategies,
)
from pipecat.turns.user_turn_strategies import UserTurnStrategies
from pipecat.utils.context.llm_context_summarization import (
LLMAutoContextSummarizationConfig,
LLMContextSummarizationConfig,
@@ -104,21 +100,6 @@ class LLMUserAggregatorParams:
"""Parameters for configuring LLM user aggregation behavior.
Parameters:
add_tool_change_messages: When True, on each ``LLMSetToolsFrame`` the
aggregator computes the diff against the currently advertised tools
and appends a developer-role message to the context describing
additions/removals. Helps the LLM stay coherent across
mid-conversation tool changes, mitigating several flavors of
tool-call-related hallucination: calling tools that have been
removed, avoiding tools that have been re-added, and hallucinating
output (made-up answers or tool-call-shaped non-tool-calls) when
tools are unavailable. Only standard tools are diffed; custom
(LLM-specific) tools are ignored. When using
``LLMContextAggregatorPair``, prefer setting this via its
``add_tool_change_messages`` argument instead. Defaults to False.
audio_idle_timeout: Timeout in seconds to force speech stop when
no audio frames are received while in SPEAKING state (e.g. user mutes
mic mid-speech). Set to 0 to disable. Defaults to 1.0.
user_turn_strategies: User turn start and stop strategies.
user_mute_strategies: List of user mute strategies.
user_turn_stop_timeout: Time in seconds to wait before considering the
@@ -128,64 +109,27 @@ class LLMUserAggregatorParams:
has been idle (not speaking) for this duration. Set to 0 to disable
idle detection.
vad_analyzer: Voice Activity Detection analyzer instance.
filter_incomplete_user_turns: [DEPRECATED] Use
``user_turn_strategies=FilterIncompleteUserTurnStrategies()``
instead. When enabled, the LLM outputs a turn-completion
marker at the start of each response: ✓ (complete), ○
(incomplete short), or ◐ (incomplete long). Incomplete
responses are suppressed and timeouts trigger re-prompting.
.. deprecated:: 1.2.0
Use ``user_turn_strategies=FilterIncompleteUserTurnStrategies()``
instead. Will be removed in version 2.0.0.
user_turn_completion_config: [DEPRECATED] Configuration for turn
completion behavior including custom instructions, timeouts, and
prompts. Only used when filter_incomplete_user_turns is True
(deprecated path) — for the new strategy-based API, pass the config
directly to ``FilterIncompleteUserTurnStrategies(config=...)``.
.. deprecated:: 1.2.0
Pass the config directly to
``FilterIncompleteUserTurnStrategies(config=...)`` instead.
Will be removed in version 2.0.0.
audio_idle_timeout: Timeout in seconds to force speech stop when
no audio frames are received while in SPEAKING state (e.g. user mutes
mic mid-speech). Set to 0 to disable. Defaults to 1.0.
filter_incomplete_user_turns: Whether to filter out incomplete user turns.
When enabled, the LLM outputs a turn completion marker at the start of
each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete long).
Incomplete responses are suppressed and timeouts trigger re-prompting.
user_turn_completion_config: Configuration for turn completion behavior including
custom instructions, timeouts, and prompts. Only used when
filter_incomplete_user_turns is True.
"""
add_tool_change_messages: bool = False
audio_idle_timeout: float = 1.0
user_turn_strategies: UserTurnStrategies | None = None
user_mute_strategies: list[BaseUserMuteStrategy] = field(default_factory=list)
user_turn_stop_timeout: float = 5.0
user_idle_timeout: float = 0
vad_analyzer: VADAnalyzer | None = None
audio_idle_timeout: float = 1.0
filter_incomplete_user_turns: bool = False
user_turn_completion_config: UserTurnCompletionConfig | None = None
def __post_init__(self):
if self.filter_incomplete_user_turns:
warnings.warn(
"LLMUserAggregatorParams.filter_incomplete_user_turns is deprecated. "
"Use user_turn_strategies=FilterIncompleteUserTurnStrategies() instead.",
DeprecationWarning,
stacklevel=2,
)
if self.user_turn_completion_config:
warnings.warn(
"LLMUserAggregatorParams.user_turn_completion_config is deprecated. "
"Use user_turn_strategies=FilterIncompleteUserTurnStrategies() instead.",
DeprecationWarning,
stacklevel=2,
)
if self.user_turn_completion_config is not None:
warnings.warn(
"LLMUserAggregatorParams.user_turn_completion_config is deprecated. "
"Pass the config directly to "
"FilterIncompleteUserTurnStrategies(config=...) instead.",
DeprecationWarning,
stacklevel=2,
)
@dataclass
class LLMAssistantAggregatorParams:
@@ -200,32 +144,14 @@ class LLMAssistantAggregatorParams:
summarization. Controls trigger thresholds, message preservation, and
summarization prompts. If None, uses default
``LLMAutoContextSummarizationConfig`` values.
add_tool_change_messages: When True, on each ``LLMSetToolsFrame`` the
aggregator computes the diff against the currently advertised tools
and appends a developer-role message to the context describing
additions/removals. Helps the LLM stay coherent across
mid-conversation tool changes, mitigating several flavors of
tool-call-related hallucination: calling tools that have been
removed, avoiding tools that have been re-added, and hallucinating
output (made-up answers or tool-call-shaped non-tool-calls) when
tools are unavailable. Only standard tools are diffed; custom
(LLM-specific) tools are ignored. When using
``LLMContextAggregatorPair``, prefer setting this via its
``add_tool_change_messages`` argument instead. Defaults to False.
"""
enable_auto_context_summarization: bool = False
auto_context_summarization_config: LLMAutoContextSummarizationConfig | None = None
add_tool_change_messages: bool = False
# ---------------------------------------------------------------------------
# Deprecated field names — kept for backward compatibility.
# Use enable_auto_context_summarization and auto_context_summarization_config instead.
#
# .. deprecated:: 1.2.0
# Use ``enable_auto_context_summarization`` and
# ``auto_context_summarization_config`` instead. Will be removed in
# version 2.0.0.
# ---------------------------------------------------------------------------
enable_context_summarization: bool | None = None
context_summarization_config: LLMContextSummarizationConfig | None = None
@@ -323,87 +249,20 @@ class LLMContextAggregator(FrameProcessor):
common functionality for context-based conversation management.
"""
# Developer-role messages appended to the context when tools are added/
# removed via ``LLMSetToolsFrame`` (only when ``add_tool_change_messages``
# is enabled on the aggregator's params). ``{function_names}`` is
# substituted with a sorted, comma-separated, backtick-wrapped list.
TOOL_ACTIVATION_MESSAGE_TEMPLATE = (
"The following function(s) have just been added and may now be called: "
"{function_names}. Any previously available functions remain available."
)
TOOL_DEACTIVATION_MESSAGE_TEMPLATE = (
"The following function(s) have just been removed and should not be called: "
"{function_names}. Any previously available functions remain available. "
"The removed function(s) may become available again later, in which case "
"you will be informed."
)
def __init__(
self,
*,
context: LLMContext,
role: str,
add_tool_change_messages: bool = False,
**kwargs,
):
def __init__(self, *, context: LLMContext, role: str, **kwargs):
"""Initialize the context response aggregator.
Args:
context: The LLM context to use for conversation storage.
role: The role this aggregator represents (e.g. "user", "assistant").
add_tool_change_messages: See the field of the same name on the
aggregator-specific params dataclasses. Subclasses propagate
this from their ``params``.
**kwargs: Additional arguments passed to parent class.
"""
super().__init__(**kwargs)
self._context = context
self._role = role
self._add_tool_change_messages = add_tool_change_messages
self._aggregation: list[TextPartForConcatenation] = []
def _maybe_add_tool_change_messages(self, new_tools: ToolsSchema | NotGiven) -> None:
"""Append a developer message describing tool add/remove deltas.
No-op unless ``add_tool_change_messages`` was enabled on the aggregator,
and no-op when the diff against the currently advertised tools is empty.
Custom (LLM-specific) tools are ignored — only standard tools are diffed.
Both aggregators call this on every ``LLMSetToolsFrame`` they handle.
Whichever aggregator handles the frame first computes a real diff
against the shared context and adds the announcement; by the time
the other aggregator sees it (if at all), the context already
reflects the new tools, so its diff is empty and no duplicate
message is added. This is order-independent: it works whether the
frame flows downstream (user aggregator first) or upstream
(assistant aggregator first, and consumed without being forwarded).
"""
if not self._add_tool_change_messages:
return
def _names(tools: ToolsSchema | NotGiven) -> set[str]:
if not is_given(tools):
return set()
return {s.name for s in tools.standard_tools}
old_names = _names(self._context.tools)
new_names = _names(new_tools)
added = new_names - old_names
removed = old_names - new_names
if not added and not removed:
return
parts: list[str] = []
if added:
names = ", ".join(f"`{n}`" for n in sorted(added))
parts.append(self.TOOL_ACTIVATION_MESSAGE_TEMPLATE.format(function_names=names))
if removed:
names = ", ".join(f"`{n}`" for n in sorted(removed))
parts.append(self.TOOL_DEACTIVATION_MESSAGE_TEMPLATE.format(function_names=names))
self._context.add_message({"role": "developer", "content": " ".join(parts)})
@property
def messages(self) -> list[LLMContextMessage]:
"""Get messages from the LLM context.
@@ -576,46 +435,20 @@ class LLMUserAggregator(LLMContextAggregator):
params: Configuration parameters for aggregation behavior.
**kwargs: Additional arguments.
"""
params = params or LLMUserAggregatorParams()
super().__init__(
context=context,
role="user",
add_tool_change_messages=params.add_tool_change_messages,
**kwargs,
)
self._params = params
super().__init__(context=context, role="user", **kwargs)
self._params = params or LLMUserAggregatorParams()
self._register_event_handler("on_user_turn_started")
self._register_event_handler("on_user_turn_stopped")
self._register_event_handler("on_user_turn_stop_timeout")
self._register_event_handler("on_user_turn_idle")
self._register_event_handler("on_user_turn_inference_triggered")
self._register_event_handler("on_user_mute_started")
self._register_event_handler("on_user_mute_stopped")
user_turn_strategies = self._params.user_turn_strategies or UserTurnStrategies()
# Deprecated path: translate filter_incomplete_user_turns into
# the equivalent FilterIncompleteUserTurnStrategies wiring. The
# DeprecationWarning is emitted in LLMUserAggregatorParams.__post_init__.
if self._params.filter_incomplete_user_turns:
user_turn_strategies = FilterIncompleteUserTurnStrategies(
start=user_turn_strategies.start,
stop=user_turn_strategies.stop,
config=self._params.user_turn_completion_config,
)
self._params.user_turn_strategies = user_turn_strategies
self._user_is_muted = False
self._user_turn_start_timestamp = ""
# Full transcript across the user turn. Each
# `_on_user_turn_inference_triggered` push captures only the
# new segment since the previous push (push_aggregation resets
# `_aggregation` after writing to context); we accumulate those
# segments here so the eventual `on_user_turn_stopped` event
# surfaces the full turn transcript even when several
# inferences fire before finalization.
self._full_user_turn_aggregation: str | None = None
self._user_turn_controller = UserTurnController(
user_turn_strategies=user_turn_strategies,
@@ -626,9 +459,6 @@ class LLMUserAggregator(LLMContextAggregator):
self._user_turn_controller.add_event_handler(
"on_user_turn_started", self._on_user_turn_started
)
self._user_turn_controller.add_event_handler(
"on_user_turn_inference_triggered", self._on_user_turn_inference_triggered
)
self._user_turn_controller.add_event_handler(
"on_user_turn_stopped", self._on_user_turn_stopped
)
@@ -707,7 +537,6 @@ class LLMUserAggregator(LLMContextAggregator):
elif isinstance(frame, LLMMessagesTransformFrame):
await self._handle_llm_messages_transform(frame)
elif isinstance(frame, LLMSetToolsFrame):
self._maybe_add_tool_change_messages(frame.tools)
self.set_tools(frame.tools)
# Push the LLMSetToolsFrame as well, since speech-to-speech LLM
# services (like OpenAI Realtime) may need to know about tool
@@ -747,6 +576,21 @@ class LLMUserAggregator(LLMContextAggregator):
for s in self._params.user_mute_strategies:
await s.setup(self.task_manager)
# Enable incomplete turn filtering on the LLM if configured
if self._params.filter_incomplete_user_turns:
# Get config or use defaults
config = self._params.user_turn_completion_config or UserTurnCompletionConfig()
# Enable the feature on the LLM with config
await self.push_frame(
LLMUpdateSettingsFrame(
delta=LLMSettings(
filter_incomplete_user_turns=True,
user_turn_completion_config=config,
)
)
)
async def _stop(self, frame: EndFrame):
await self._maybe_emit_user_turn_stopped(on_session_end=True)
await self._cleanup()
@@ -886,7 +730,6 @@ class LLMUserAggregator(LLMContextAggregator):
logger.debug(f"{self}: User started speaking (strategy: {strategy})")
self._user_turn_start_timestamp = time_now_iso8601()
self._full_user_turn_aggregation = None
if params.enable_user_speaking_frames:
await self.broadcast_frame(UserStartedSpeakingFrame)
@@ -898,30 +741,6 @@ class LLMUserAggregator(LLMContextAggregator):
await self._call_event_handler("on_user_turn_started", strategy)
async def _on_user_turn_inference_triggered(
self,
controller: UserTurnController,
strategy: BaseUserTurnStopStrategy,
):
logger.debug(f"{self}: User turn inference triggered (strategy: {strategy})")
# Push aggregation now: this writes the user message segment to
# the context and emits LLMContextFrame, which kicks LLM
# inference. Concatenate the segment into
# `_full_user_turn_aggregation` so multiple inferences in the
# same turn don't lose earlier segments from the eventual
# `on_user_turn_stopped` event.
segment = await self.push_aggregation()
if segment:
if self._full_user_turn_aggregation:
self._full_user_turn_aggregation = (
f"{self._full_user_turn_aggregation} {segment}".strip()
)
else:
self._full_user_turn_aggregation = segment
await self._call_event_handler("on_user_turn_inference_triggered", strategy)
async def _on_user_turn_stopped(
self,
controller: UserTurnController,
@@ -956,29 +775,15 @@ class LLMUserAggregator(LLMContextAggregator):
):
"""Maybe emit user turn stopped event.
Earlier inference triggers in the same turn have already pushed
their segments to the context and accumulated them into
``self._full_user_turn_aggregation``. Any aggregation that
arrived after the last inference trigger is flushed here so
end-of-turn content is never lost from the public event.
Args:
strategy: The strategy that triggered the turn stop.
on_session_end: If True, only emit if there's unemitted content
(avoids duplicate events when session ends).
"""
segment = await self.push_aggregation()
full_aggregation = self._full_user_turn_aggregation
self._full_user_turn_aggregation = None
if segment and full_aggregation:
content = f"{full_aggregation} {segment}".strip()
else:
content = full_aggregation or segment
if not on_session_end or content:
aggregation = await self.push_aggregation()
if not on_session_end or aggregation:
message = UserTurnStoppedMessage(
content=content, timestamp=self._user_turn_start_timestamp
content=aggregation, timestamp=self._user_turn_start_timestamp
)
await self._call_event_handler("on_user_turn_stopped", strategy, message)
self._user_turn_start_timestamp = ""
@@ -1039,14 +844,8 @@ class LLMAssistantAggregator(LLMContextAggregator):
params: Configuration parameters for aggregation behavior.
**kwargs: Additional arguments.
"""
params = params or LLMAssistantAggregatorParams()
super().__init__(
context=context,
role="assistant",
add_tool_change_messages=params.add_tool_change_messages,
**kwargs,
)
self._params = params
super().__init__(context=context, role="assistant", **kwargs)
self._params = params or LLMAssistantAggregatorParams()
self._function_calls_in_progress: dict[str, FunctionCallInProgressFrame | None] = {}
self._function_calls_image_results: dict[str, UserImageRawFrame] = {}
@@ -1129,15 +928,13 @@ class LLMAssistantAggregator(LLMContextAggregator):
await self._handle_end_or_cancel(frame)
await self.push_frame(frame, direction)
elif isinstance(frame, LLMAssistantPushAggregationFrame):
await self._handle_push_aggregation()
await self.push_aggregation()
elif isinstance(frame, LLMFullResponseStartFrame):
await self._handle_llm_start(frame)
elif isinstance(frame, LLMFullResponseEndFrame):
await self._handle_llm_end(frame)
elif isinstance(frame, TextFrame):
await self._handle_text(frame)
elif isinstance(frame, LLMMarkerFrame):
await self._handle_marker_frame(frame)
elif isinstance(frame, LLMThoughtStartFrame):
await self._handle_thought_start(frame)
elif isinstance(frame, LLMThoughtTextFrame):
@@ -1153,7 +950,6 @@ class LLMAssistantAggregator(LLMContextAggregator):
elif isinstance(frame, LLMMessagesTransformFrame):
await self._handle_llm_messages_transform(frame)
elif isinstance(frame, LLMSetToolsFrame):
self._maybe_add_tool_change_messages(frame.tools)
self.set_tools(frame.tools)
elif isinstance(frame, LLMSetToolChoiceFrame):
self.set_tool_choice(frame.tool_choice)
@@ -1474,17 +1270,6 @@ class LLMAssistantAggregator(LLMContextAggregator):
async def _handle_llm_end(self, _: LLMFullResponseEndFrame):
await self._trigger_assistant_turn_stopped()
async def _handle_push_aggregation(self):
# LLMAssistantPushAggregationFrame is emitted by TTSService at the end
# of a TTSSpeakFrame-driven utterance (no surrounding LLM response
# cycle), so no LLMFullResponseStartFrame ever set the turn-start
# timestamp. Open a turn now so on_assistant_turn_stopped fires for the
# greeting text the same way it did before LLMAssistantPushAggregationFrame
# was introduced.
if not self._assistant_turn_start_timestamp:
await self._trigger_assistant_turn_started()
await self._trigger_assistant_turn_stopped()
async def _handle_text(self, frame: TextFrame):
# Skip TextFrame types not intended to build the assistant context
if isinstance(frame, (TranscriptionFrame, TranslationFrame, InterimTranscriptionFrame)):
@@ -1497,42 +1282,12 @@ class LLMAssistantAggregator(LLMContextAggregator):
if len(frame.text) == 0:
return
text = (
frame.raw_text
if isinstance(frame, AggregatedTextFrame) and frame.raw_text
else frame.text
)
self._aggregation.append(
TextPartForConcatenation(
text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
)
)
async def _handle_marker_frame(self, frame: LLMMarkerFrame):
if frame.append_to_context_immediately:
# Stand-alone marker: write it to the context now as its
# own assistant message. Used when the marker is the entire
# assistant turn — e.g. the ○ / ◐ incomplete-turn signals,
# where the spoken response is suppressed and the marker
# is the only artifact.
self._context.add_message({"role": "assistant", "content": frame.marker})
await self.push_context_frame()
timestamp_frame = LLMContextAssistantTimestampFrame(timestamp=time_now_iso8601())
await self.push_frame(timestamp_frame)
return
# Marker is part of an in-progress assistant response. Append
# it to the running aggregation so `push_aggregation` writes
# marker + text as a single context message — e.g. the ✓
# complete-turn signal that prefixes the spoken response,
# producing "✓ <response>" in context. Markers are stripped
# from the transcript via
# `_maybe_strip_turn_completion_markers` so consumers see
# clean text.
self._aggregation.append(
TextPartForConcatenation(frame.marker, includes_inter_part_spaces=False)
)
async def _handle_thought_start(self, frame: LLMThoughtStartFrame):
await self._reset_thought_aggregation()
self._thought_append_to_context = frame.append_to_context
@@ -1684,7 +1439,6 @@ class LLMContextAggregatorPair:
*,
user_params: LLMUserAggregatorParams | None = None,
assistant_params: LLMAssistantAggregatorParams | None = None,
add_tool_change_messages: bool | None = None,
):
"""Initialize the LLM context aggregator pair.
@@ -1692,22 +1446,9 @@ class LLMContextAggregatorPair:
context: The context to be managed by the aggregators.
user_params: Parameters for the user context aggregator.
assistant_params: Parameters for the assistant context aggregator.
add_tool_change_messages: When provided, sets the field of the
same name on both ``user_params`` and ``assistant_params``,
overriding any value already set on either. This is the
preferred way to enable tool-change announcements: it ensures
both aggregators participate, which makes the feature robust
regardless of which aggregator handles a given
``LLMSetToolsFrame``. The shared context guarantees the
announcement is added exactly once (the second aggregator's
diff is empty by the time it sees the frame). Leave as
``None`` to respect per-params settings.
"""
user_params = user_params or LLMUserAggregatorParams()
assistant_params = assistant_params or LLMAssistantAggregatorParams()
if add_tool_change_messages is not None:
user_params.add_tool_change_messages = add_tool_change_messages
assistant_params.add_tool_change_messages = add_tool_change_messages
self._user = LLMUserAggregator(context, params=user_params)
self._assistant = LLMAssistantAggregator(context, params=assistant_params)

View File

@@ -23,7 +23,6 @@ from pipecat.frames.frames import (
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.utils.text.base_text_aggregator import BaseTextAggregator
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
@@ -86,11 +85,7 @@ class LLMTextProcessor(FrameProcessor):
out_frame = AggregatedTextFrame(
text=aggregation.text,
aggregated_by=aggregation.type,
raw_text=aggregation.full_match
if isinstance(aggregation, PatternMatch)
else aggregation.text,
)
out_frame.append_to_context = True
out_frame.skip_tts = in_frame.skip_tts
await self.push_frame(out_frame)
@@ -101,9 +96,6 @@ class LLMTextProcessor(FrameProcessor):
out_frame = AggregatedTextFrame(
text=remaining.text,
aggregated_by=remaining.type,
raw_text=remaining.full_match
if isinstance(remaining, PatternMatch)
else remaining.text,
)
out_frame.skip_tts = skip_tts
await self.push_frame(out_frame)

View File

@@ -17,7 +17,7 @@ import asyncio
import dataclasses
import traceback
import warnings
from collections.abc import Awaitable, Callable
from collections.abc import Awaitable, Callable, Coroutine
from dataclasses import dataclass
from enum import Enum
from typing import (
@@ -217,6 +217,9 @@ class FrameProcessor(BaseObject):
# Clock
self._clock: BaseClock | None = None
# Task Manager
self._task_manager: BaseTaskManager | None = None
# Observer
self._observer: BaseObserver | None = None
@@ -365,6 +368,20 @@ class FrameProcessor(BaseObject):
"""
return self._report_only_initial_ttfb
@property
def task_manager(self) -> BaseTaskManager:
"""Get the task manager for this processor.
Returns:
The task manager instance.
Raises:
Exception: If the task manager is not initialized.
"""
if not self._task_manager:
raise Exception(f"{self} TaskManager is still not initialized.")
return self._task_manager
@property
def pipeline_task(self) -> PipelineTask | None:
"""Get the :class:`PipelineTask` this processor is running in.
@@ -494,14 +511,43 @@ class FrameProcessor(BaseObject):
await self.stop_processing_metrics()
await self.stop_text_aggregation_metrics()
def create_task(self, coroutine: Coroutine, name: str | None = None) -> asyncio.Task:
"""Create a new task managed by this processor.
Args:
coroutine: The coroutine to run in the task.
name: Optional name for the task.
Returns:
The created asyncio task.
"""
if name:
name = f"{self}::{name}"
else:
name = f"{self}::{coroutine.cr_code.co_name}"
return self.task_manager.create_task(coroutine, name)
async def cancel_task(self, task: asyncio.Task, timeout: float | None = 1.0):
"""Cancel a task managed by this processor.
A default timeout if 1 second is used in order to avoid potential
freezes caused by certain libraries that swallow
`asyncio.CancelledError`.
Args:
task: The task to cancel.
timeout: Optional timeout for task cancellation.
"""
await self.task_manager.cancel_task(task, timeout)
async def setup(self, setup: FrameProcessorSetup):
"""Set up the processor with required components.
Args:
setup: Configuration object containing setup parameters.
"""
await super().setup(setup.task_manager)
self._clock = setup.clock
self._task_manager = setup.task_manager
self._observer = setup.observer
self._pipeline_task = setup.pipeline_task
@@ -509,7 +555,7 @@ class FrameProcessor(BaseObject):
self.__create_input_task()
if self._metrics is not None:
await self._metrics.setup(self.task_manager)
await self._metrics.setup(self._task_manager)
async def cleanup(self):
"""Clean up processor resources."""
@@ -831,19 +877,14 @@ class FrameProcessor(BaseObject):
current_is_uninterruptible = isinstance(
self.__process_current_frame, UninterruptibleFrame
)
if current_is_uninterruptible:
# The frame currently being processed is uninterruptible, so we
# must not cancel it. Just flush non-uninterruptible frames from
# the queue; any uninterruptible ones will be kept and processed
# after the current frame finishes.
if current_is_uninterruptible or self.__process_queue.has_uninterruptible:
# We don't want to cancel an UninterruptibleFrame (either the
# one currently being processed or one waiting in the queue),
# so we simply cleanup the queue keeping only
# UninterruptibleFrames.
self.__reset_process_queue()
else:
# Cancel and re-create the process task. Previously this branch
# was skipped when the queue contained an uninterruptible frame,
# which caused slow non-uninterruptible frames to block
# interruptions. Uninterruptible queued frames are safe here
# because __create_process_task calls __reset_process_queue
# internally, which always preserves them.
# Cancel and re-create the process task.
await self.__cancel_process_task()
self.__create_process_task()
except Exception as e:

View File

@@ -10,11 +10,6 @@ from pipecat.processors.frameworks.rtvi.frames import (
RTVIClientMessageFrame,
RTVIServerMessageFrame,
RTVIServerResponseFrame,
RTVIUICancelTaskFrame,
RTVIUICommandFrame,
RTVIUIEventFrame,
RTVIUISnapshotFrame,
RTVIUITaskFrame,
)
from pipecat.processors.frameworks.rtvi.observer import (
RTVIFunctionCallReportLevel,
@@ -31,9 +26,4 @@ __all__ = [
"RTVIProcessor",
"RTVIServerMessageFrame",
"RTVIServerResponseFrame",
"RTVIUICancelTaskFrame",
"RTVIUICommandFrame",
"RTVIUIEventFrame",
"RTVIUISnapshotFrame",
"RTVIUITaskFrame",
]

View File

@@ -10,7 +10,6 @@ from dataclasses import dataclass
from typing import Any
from pipecat.frames.frames import SystemFrame
from pipecat.processors.frameworks.rtvi.models import UITaskData
@dataclass
@@ -28,132 +27,6 @@ class RTVIServerMessageFrame(SystemFrame):
return f"{self.name}(data: {self.data})"
@dataclass
class RTVIUICommandFrame(SystemFrame):
"""A frame for sending a UI command to the client.
Pipeline-side counterpart of the ``ui-command`` RTVI message.
The observer wraps the ``command`` + ``payload`` into a
``UICommandMessage`` envelope before pushing it to the transport,
so the wire shape is:
``{label, type: "ui-command", data: {command, payload}}``.
Parameters:
command: App-defined command (e.g. ``"toast"``,
``"navigate"``, or any app-specific command).
payload: App-defined payload. Pydantic command models
(``Toast``, ``Navigate``, ``ScrollTo``, ...) should be
converted to a plain dict via ``model_dump()`` before
being placed here; an arbitrary dict works as well.
"""
command: str = ""
payload: Any = None
def __str__(self):
"""String representation of the UI command frame."""
return f"{self.name}(command: {self.command})"
@dataclass
class RTVIUITaskFrame(SystemFrame):
"""A frame for sending a UI task lifecycle envelope to the client.
Pipeline-side counterpart of the ``ui-task`` RTVI message. The
observer wraps the ``data`` into a ``UITaskMessage`` envelope
before pushing it to the transport, so the wire shape is:
``{label, type: "ui-task", data: <one of the four kinds>}``.
Parameters:
data: One of the four task-lifecycle data models from
``rtvi.models`` (``UITaskGroupStartedData``,
``UITaskUpdateData``, ``UITaskCompletedData``, or
``UITaskGroupCompletedData``). The ``kind`` field on
each discriminates which lifecycle phase this is.
"""
data: UITaskData | None = None
def __str__(self):
"""String representation of the UI task frame."""
kind = getattr(self.data, "kind", "?")
return f"{self.name}(kind: {kind})"
@dataclass
class RTVIUIEventFrame(SystemFrame):
"""An inbound UI event from the client.
Pushed downstream by ``RTVIProcessor`` whenever a ``ui-event``
message arrives from the client, alongside firing the
``on_ui_message`` event handler. Mirrors the
frame-and-event pattern used by ``client-message``: pipeline
observers and processors that want to react to UI events at the
pipeline level can match on this frame; code that subscribes to
events instead (like the bridge in ``pipecat-ai-subagents``)
keeps using the event handler.
Parameters:
msg_id: The RTVI message id, as set by the client.
event: App-defined event (the ``data.event`` field).
payload: App-defined payload (the ``data.payload`` field).
"""
msg_id: str = ""
event: str = ""
payload: Any = None
def __str__(self):
"""String representation of the UI event frame."""
return f"{self.name}(event: {self.event})"
@dataclass
class RTVIUISnapshotFrame(SystemFrame):
"""An inbound accessibility-snapshot from the client.
Pushed downstream by ``RTVIProcessor`` whenever a ``ui-snapshot``
message arrives, alongside firing ``on_ui_message``. Carries
the serialized accessibility tree the client took of its DOM.
Parameters:
msg_id: The RTVI message id, as set by the client.
tree: The serialized accessibility tree.
"""
msg_id: str = ""
tree: Any = None
def __str__(self):
"""String representation of the UI snapshot frame."""
return f"{self.name}"
@dataclass
class RTVIUICancelTaskFrame(SystemFrame):
"""An inbound user-task-group cancellation request from the client.
Pushed downstream by ``RTVIProcessor`` whenever a
``ui-cancel-task`` message arrives, alongside firing
``on_ui_message``. The server-side framework should look up the
matching task group and cancel it (subject to whatever
cancellable policy the group was registered with).
Parameters:
msg_id: The RTVI message id, as set by the client.
task_id: The task group id the client wants cancelled.
reason: Optional human-readable reason.
"""
msg_id: str = ""
task_id: str = ""
reason: str | None = None
def __str__(self):
"""String representation of the UI cancel-task frame."""
return f"{self.name}(task_id: {self.task_id})"
@dataclass
class RTVIClientMessageFrame(SystemFrame):
"""A frame for sending messages from the client to the RTVI server.

View File

@@ -20,14 +20,14 @@ from typing import (
Literal,
)
from pydantic import BaseModel, ConfigDict
from pydantic import BaseModel
from pipecat.frames.frames import (
AggregationType,
)
# -- Constants --
PROTOCOL_VERSION = "1.3.0"
PROTOCOL_VERSION = "1.2.0"
MESSAGE_LABEL = "rtvi-ai"
MessageLiteral = Literal["rtvi-ai"]
@@ -549,474 +549,3 @@ class SystemLogMessage(BaseModel):
label: MessageLiteral = MESSAGE_LABEL
type: Literal["system-log"] = "system-log"
data: TextMessageData
# -- UI Agent Protocol -------------------------------------------------------
#
# A structured RTVI message vocabulary that lets server-side AI agents
# observe and drive a GUI app on the client side. The protocol covers
# five first-class RTVI message types:
#
# ui-event client-to-server event message
# ui-command server-to-client command message
# ui-snapshot client-to-server accessibility snapshot
# ui-cancel-task client-to-server cancellation request
# ui-task server-to-client task lifecycle envelope
#
# This section is data only (constants and payload models, no
# behavior). Higher-level frameworks like ``pipecat-ai-subagents``
# build the agent abstractions on top, and single-LLM Pipecat apps can
# target the same wire format directly via custom tools that emit
# typed RTVI messages with these types. The matching client-side
# implementation lives in ``@pipecat-ai/client-js`` and
# ``@pipecat-ai/client-react``.
# The wire-format ``type`` strings (``"ui-event"``, ``"ui-command"``,
# ``"ui-snapshot"``, ``"ui-cancel-task"``, ``"ui-task"``) are pinned
# as ``Literal[...]`` field defaults on the corresponding ``*Message``
# pydantic class below, matching the convention used for every other
# RTVI message type in this module.
# Each ``ui-task`` envelope carries a ``kind`` field that the client's
# task reducer dispatches on. The four kinds form the lifecycle of a
# user-facing task group:
#
# group_started → task_update* → task_completed × N → group_completed
#
# where N is the number of workers in the group. The kind strings are
# pinned as ``Literal[...]`` defaults on the matching ``UITask*Data``
# class below.
# -- UI envelope data classes --
class UIEventData(BaseModel):
"""Inner ``data`` for a ``ui-event`` message.
Parameters:
event: App-defined event.
payload: App-defined payload, schemaless by design.
"""
event: str
payload: Any | None = None
class UICommandData(BaseModel):
"""Inner ``data`` for a ``ui-command`` message.
Parameters:
command: App-defined command.
payload: App-defined payload (already a plain dict by the
time it lands on the wire). The standard command payload models
below produce the right shape via ``model_dump()``.
"""
command: str
payload: Any | None = None
class A11yNode(BaseModel):
"""One node in the UI accessibility snapshot tree.
Mirrors the client-side ``A11yNode`` wire shape. Extra fields are
allowed so clients can add platform-specific or future metadata
without breaking older servers.
Parameters:
ref: Stable client-assigned element reference.
role: ARIA-style role for the node.
name: Optional accessible name.
value: Optional current value for inputs/progress/etc.
state: Optional short state tags (e.g. ``"focused"``,
``"disabled"``, ``"offscreen"``).
level: Optional heading level.
colcount: Optional column count for grid-like containers.
rowcount: Optional row count for grid-like containers.
children: Optional child nodes.
"""
model_config = ConfigDict(extra="allow")
ref: str
role: str
name: str | None = None
value: str | None = None
state: list[str] | None = None
level: int | None = None
colcount: int | None = None
rowcount: int | None = None
children: list["A11yNode"] | None = None
class A11ySelection(BaseModel):
"""The user's current text selection in the UI snapshot.
Extra fields are allowed for forward compatibility with client
snapshot additions.
Parameters:
ref: Ref of the element that carries the selection.
text: Selected text.
start_offset: Optional selection start offset.
end_offset: Optional selection end offset.
"""
model_config = ConfigDict(extra="allow")
ref: str
text: str
start_offset: int | None = None
end_offset: int | None = None
class A11ySnapshot(BaseModel):
"""Client accessibility snapshot sent in a ``ui-snapshot`` message.
Mirrors the client-side ``A11ySnapshot`` wire shape. Extra fields
are allowed so clients can add compatible metadata over time.
Parameters:
root: Root accessibility node.
captured_at: Client-side epoch milliseconds when captured.
selection: Optional current text selection.
"""
model_config = ConfigDict(extra="allow")
root: A11yNode
captured_at: int
selection: A11ySelection | None = None
class UISnapshotData(BaseModel):
"""Inner ``data`` for a ``ui-snapshot`` message.
The accessibility snapshot tree mirrors the client-side
``A11ySnapshot`` wire shape and is kept forward-compatible by
allowing extra fields on the snapshot models.
Parameters:
tree: The serialized accessibility tree.
"""
tree: A11ySnapshot
class UICancelTaskData(BaseModel):
"""Inner ``data`` for a ``ui-cancel-task`` message.
Parameters:
task_id: The task group id the client wants cancelled.
reason: Optional human-readable reason.
"""
task_id: str
reason: str | None = None
class UITaskGroupStartedData(BaseModel):
"""``data`` for a ``ui-task`` envelope with kind ``group_started``.
Parameters:
kind: Always ``"group_started"``.
task_id: Shared task identifier for the group.
agents: Names of the agents the work was dispatched to.
label: Optional human-readable label for the group.
cancellable: Whether the client may request cancellation.
at: Epoch milliseconds when the group started.
"""
kind: Literal["group_started"] = "group_started"
task_id: str
agents: list[str] | None = None
label: str | None = None
cancellable: bool = True
at: int = 0
class UITaskUpdateData(BaseModel):
"""``data`` for a ``ui-task`` envelope with kind ``task_update``.
Parameters:
kind: Always ``"task_update"``.
task_id: The shared task identifier.
agent_name: The worker that produced the update.
data: The worker's update payload, forwarded verbatim.
at: Epoch milliseconds when the update was emitted.
"""
kind: Literal["task_update"] = "task_update"
task_id: str
agent_name: str
data: Any | None = None
at: int = 0
class UITaskCompletedData(BaseModel):
"""``data`` for a ``ui-task`` envelope with kind ``task_completed``.
Parameters:
kind: Always ``"task_completed"``.
task_id: The shared task identifier.
agent_name: The worker that produced the response.
status: Completion status string.
response: The worker's response payload.
at: Epoch milliseconds when the response was received.
"""
kind: Literal["task_completed"] = "task_completed"
task_id: str
agent_name: str
status: str
response: Any | None = None
at: int = 0
class UITaskGroupCompletedData(BaseModel):
"""``data`` for a ``ui-task`` envelope with kind ``group_completed``.
Parameters:
kind: Always ``"group_completed"``.
task_id: The shared task identifier.
at: Epoch milliseconds when the group completed.
"""
kind: Literal["group_completed"] = "group_completed"
task_id: str
at: int = 0
#: Discriminated union over the four task-lifecycle data shapes,
#: keyed by the ``kind`` field.
UITaskData = (
UITaskGroupStartedData | UITaskUpdateData | UITaskCompletedData | UITaskGroupCompletedData
)
# -- UI envelope message classes --
class UIEventMessage(BaseModel):
"""RTVI ``ui-event`` message (client → server)."""
label: MessageLiteral = MESSAGE_LABEL
type: Literal["ui-event"] = "ui-event"
id: str
data: UIEventData
class UICommandMessage(BaseModel):
"""RTVI ``ui-command`` message (server → client)."""
label: MessageLiteral = MESSAGE_LABEL
type: Literal["ui-command"] = "ui-command"
data: UICommandData
class UISnapshotMessage(BaseModel):
"""RTVI ``ui-snapshot`` message (client → server)."""
label: MessageLiteral = MESSAGE_LABEL
type: Literal["ui-snapshot"] = "ui-snapshot"
id: str
data: UISnapshotData
class UICancelTaskMessage(BaseModel):
"""RTVI ``ui-cancel-task`` message (client → server)."""
label: MessageLiteral = MESSAGE_LABEL
type: Literal["ui-cancel-task"] = "ui-cancel-task"
id: str
data: UICancelTaskData
class UITaskMessage(BaseModel):
"""RTVI ``ui-task`` message (server → client).
The ``data`` field is one of the four task-lifecycle
discriminated by the ``kind`` field.
"""
label: MessageLiteral = MESSAGE_LABEL
type: Literal["ui-task"] = "ui-task"
data: UITaskData
# -- UI command payloads --
#
# These models describe commands that have matching default React
# handlers in ``@pipecat-ai/client-react``'s ``standardHandlers``.
# Apps can use them as-is, override the client handler to customize
# rendering, or ignore them entirely and define their own command
# names.
#
# Server-side helpers that send commands accept these models directly.
# ``BaseModel.model_dump()`` converts them to the plain-dict shape
# that travels over the wire.
class Toast(BaseModel):
"""A transient notification surface shown on the client.
Parameters:
title: Required headline.
subtitle: Optional second line beneath the title.
description: Optional body text.
image_url: Optional leading image.
duration_ms: Optional dismiss timer. Client default applies
when None.
"""
title: str
subtitle: str | None = None
description: str | None = None
image_url: str | None = None
duration_ms: int | None = None
class Navigate(BaseModel):
"""Client-side navigation to a named view.
Parameters:
view: App-defined view name (route, screen id, tab key, etc.).
params: Optional view-specific parameters.
"""
view: str
params: dict | None = None
class ScrollTo(BaseModel):
"""Scroll a target element into view.
The client resolves the target by ``ref`` first (a snapshot ref
like ``"e42"`` assigned by the a11y walker), then falls back to
``target_id`` (``document.getElementById``). Supply whichever you
have; ``ref`` is the normal choice when acting on a node from
``<ui_state>``.
Parameters:
ref: Snapshot ref from ``<ui_state>``.
target_id: Element id registered on the client.
behavior: Optional scroll behavior hint. Typical values:
``"smooth"`` or ``"instant"``. Clients may ignore.
"""
ref: str | None = None
target_id: str | None = None
behavior: str | None = None
class Highlight(BaseModel):
"""Briefly emphasize a target element (flash, glow, pulse).
Parameters:
ref: Snapshot ref from ``<ui_state>``.
target_id: Element id registered on the client.
duration_ms: Optional highlight duration. Client default
applies when None.
"""
ref: str | None = None
target_id: str | None = None
duration_ms: int | None = None
class Focus(BaseModel):
"""Move input focus to a target element.
Parameters:
ref: Snapshot ref from ``<ui_state>``.
target_id: Element id registered on the client.
"""
ref: str | None = None
target_id: str | None = None
class Click(BaseModel):
"""Click an element on the client.
Closes the form-fill loop for non-text inputs (checkboxes, radios)
and exposes the rest of the action vocabulary (submit buttons,
links, app-specific clickable nodes). The standard handler
silently no-ops on ``disabled`` targets so the agent can't bypass
UI affordances the user is meant to control.
For native ``<select>``, prefer ``SetInputValue`` (clicking
options doesn't reliably change the selection); for custom
comboboxes (ARIA listbox + popup), apps wire their own command
matching the library's interaction model.
Parameters:
ref: Snapshot ref from ``<ui_state>``.
target_id: Element id registered on the client. Used as a
fallback when ``ref`` is not set or has gone stale.
"""
ref: str | None = None
target_id: str | None = None
class SetInputValue(BaseModel):
"""Write a value into a text input or textarea on the client.
Use this for form-filling: the agent has decided what should go
into a field (clarifying answer, tax form entry, etc.) and asks
the client to populate it. With ``replace=True`` (the default),
the existing value is overwritten; with ``replace=False`` the
value is appended.
The standard handler silently no-ops on ``disabled``, ``readonly``,
and ``<input type="hidden">`` targets so the agent can't write
into fields the user can't.
Parameters:
value: The text to write.
ref: Snapshot ref from ``<ui_state>``. Typically the ref of
an ``<input>`` or ``<textarea>``.
target_id: Element id registered on the client. Used as a
fallback when ``ref`` is not set or has gone stale.
replace: When True (the default), overwrite the current
value. When False, append to it.
"""
value: str = ""
ref: str | None = None
target_id: str | None = None
replace: bool = True
class SelectText(BaseModel):
"""Select text on the page so the user can see what the agent means.
Mirror of the ``selection`` field surfaced in the snapshot. Use
this to point the user's attention at a specific paragraph or
range after the agent has decided what it's referring to.
With ``start_offset`` and ``end_offset`` omitted, the entire
target's text content is selected (``Range.selectNodeContents``
for document elements; ``el.select()`` for ``<input>`` /
``<textarea>``).
Parameters:
ref: Snapshot ref from ``<ui_state>``. Typically the ref of
a paragraph or input element.
target_id: Element id registered on the client. Used as a
fallback when ``ref`` is not set or has gone stale.
start_offset: Character offset within the target's text
where the selection should start. For ``<input>`` and
``<textarea>`` this is the value offset; for document
elements it is computed against the concatenation of
descendant text nodes in document order.
end_offset: End character offset, exclusive. Same coordinate
system as ``start_offset``.
"""
ref: str | None = None
target_id: str | None = None
start_offset: int | None = None
end_offset: int | None = None

View File

@@ -58,8 +58,6 @@ from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi.frames import (
RTVIServerMessageFrame,
RTVIServerResponseFrame,
RTVIUICommandFrame,
RTVIUITaskFrame,
)
from pipecat.transports.base_output import BaseOutputTransport
from pipecat.utils.string import match_endofsentence
@@ -432,15 +430,6 @@ class RTVIObserver(BaseObserver):
elif isinstance(frame, RTVIServerMessageFrame):
message = RTVI.ServerMessage(data=frame.data)
await self.send_rtvi_message(message)
elif isinstance(frame, RTVIUICommandFrame):
message = RTVI.UICommandMessage(
data=RTVI.UICommandData(command=frame.command, payload=frame.payload)
)
await self.send_rtvi_message(message)
elif isinstance(frame, RTVIUITaskFrame):
if frame.data is not None:
message = RTVI.UITaskMessage(data=frame.data)
await self.send_rtvi_message(message)
elif isinstance(frame, RTVIServerResponseFrame):
if frame.error is not None:
await self._send_error_response(frame)
@@ -528,9 +517,6 @@ class RTVIObserver(BaseObserver):
text = await transform(text, agg_type)
isTTS = isinstance(frame, TTSTextFrame)
if agg_type is not AggregationType.WORD:
logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
if self._params.bot_output_enabled:
message = RTVI.BotOutputMessage(
data=RTVI.BotOutputMessageData(text=text, spoken=isTTS, aggregated_by=agg_type)

View File

@@ -32,12 +32,7 @@ from pipecat.frames.frames import (
SystemFrame,
)
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
from pipecat.processors.frameworks.rtvi.frames import (
RTVIClientMessageFrame,
RTVIUICancelTaskFrame,
RTVIUIEventFrame,
RTVIUISnapshotFrame,
)
from pipecat.processors.frameworks.rtvi.frames import RTVIClientMessageFrame
from pipecat.processors.frameworks.rtvi.observer import RTVIObserver, RTVIObserverParams
from pipecat.services.llm_service import (
FunctionCallParams, # TODO(aleix): we shouldn't import `services` from `processors`
@@ -81,7 +76,6 @@ class RTVIProcessor(FrameProcessor):
self._register_event_handler("on_bot_started")
self._register_event_handler("on_client_ready")
self._register_event_handler("on_client_message")
self._register_event_handler("on_ui_message")
self._input_transport = None
self._transport = transport
@@ -294,41 +288,6 @@ class RTVIProcessor(FrameProcessor):
case "client-message":
data = RTVI.RawClientMessageData.model_validate(message.data)
await self._handle_client_message(message.id, data)
case "ui-event":
event_data = RTVI.UIEventData.model_validate(message.data or {})
await self.push_frame(
RTVIUIEventFrame(
msg_id=message.id,
event=event_data.event,
payload=event_data.payload,
)
)
await self._call_event_handler(
"on_ui_message",
RTVI.UIEventMessage(id=message.id, data=event_data),
)
case "ui-snapshot":
snapshot_data = RTVI.UISnapshotData.model_validate(message.data or {})
await self.push_frame(
RTVIUISnapshotFrame(msg_id=message.id, tree=snapshot_data.tree)
)
await self._call_event_handler(
"on_ui_message",
RTVI.UISnapshotMessage(id=message.id, data=snapshot_data),
)
case "ui-cancel-task":
cancel_data = RTVI.UICancelTaskData.model_validate(message.data or {})
await self.push_frame(
RTVIUICancelTaskFrame(
msg_id=message.id,
task_id=cancel_data.task_id,
reason=cancel_data.reason,
)
)
await self._call_event_handler(
"on_ui_message",
RTVI.UICancelTaskMessage(id=message.id, data=cancel_data),
)
case "llm-function-call-result":
data = RTVI.LLMFunctionCallResultData.model_validate(message.data)
await self._handle_function_call_result(data)

View File

@@ -27,7 +27,7 @@ try:
gi.require_version("Gst", "1.0")
gi.require_version("GstApp", "1.0")
from gi.repository import Gst, GstApp # pyright: ignore[reportAttributeAccessIssue]
from gi.repository import Gst, GstApp
except ModuleNotFoundError as e:
logger.error(f"Exception: {e}")
logger.error(

View File

@@ -20,6 +20,7 @@ from pipecat.metrics.metrics import (
TTFBMetricsData,
TTSUsageMetricsData,
)
from pipecat.utils.asyncio.task_manager import BaseTaskManager
from pipecat.utils.base_object import BaseObject
@@ -39,12 +40,36 @@ class FrameProcessorMetrics(BaseObject):
processing times, and usage statistics.
"""
super().__init__()
self._task_manager = None
self._start_ttfb_time = 0
self._start_processing_time = 0
self._start_text_aggregation_time = 0
self._last_ttfb_time = 0
self._should_report_ttfb = True
async def setup(self, task_manager: BaseTaskManager):
"""Set up the metrics collector with a task manager.
Args:
task_manager: The task manager for handling async operations.
"""
self._task_manager = task_manager
async def cleanup(self):
"""Clean up metrics collection resources."""
await super().cleanup()
@property
def task_manager(self) -> BaseTaskManager:
"""Get the associated task manager.
Returns:
The task manager instance for async operations.
"""
if self._task_manager is None:
raise RuntimeError("task_manager not set; call setup() first")
return self._task_manager
@property
def ttfb(self) -> float | None:
"""Get the current TTFB value in seconds.

View File

@@ -19,10 +19,6 @@ All bots must implement a `bot(runner_args)` async function as the entry point.
The server automatically discovers and executes this function when connections
are established.
By default the runner starts a single FastAPI server that supports WebRTC, Daily,
and telephony transports simultaneously. Clients declare which transport they want
via the ``transport`` field in the ``/start`` request body (default: ``"webrtc"``).
Single transport example::
async def bot(runner_args: RunnerArguments):
@@ -59,38 +55,18 @@ Supported transports:
- WebRTC - Provides local WebRTC interface with prebuilt UI
- Telephony - Handles webhook and WebSocket connections for Twilio, Telnyx, Plivo, Exotel
The ``/start`` endpoint accepts::
{
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
// "plivo" | "exotel" — default: "webrtc"
// WebRTC-specific
"enableDefaultIceServers": false,
"body": {...},
// Daily-specific
"createDailyRoom": true,
"dailyRoomProperties": {...},
"dailyMeetingTokenProperties": {...},
"body": {...}
}
To run locally:
- All transports (default): ``python bot.py``
- WebRTC only: ``python bot.py -t webrtc``
- ESP32: ``python bot.py -t webrtc --esp32 --host 192.168.1.100``
- Daily only: ``python bot.py -t daily``
- Daily (direct, testing only): ``python bot.py -d``
- Telephony: ``python bot.py -t twilio -x your_username.ngrok.io``
- Exotel: ``python bot.py -t exotel`` (no proxy needed, but ngrok connection to HTTP 7860 is required)
- WhatsApp: ``python bot.py --whatsapp``
- WebRTC: `python bot.py -t webrtc`
- ESP32: `python bot.py -t webrtc --esp32 --host 192.168.1.100`
- Daily (server): `python bot.py -t daily`
- Daily (direct, testing only): `python bot.py -d`
- Telephony: `python bot.py -t twilio -x your_username.ngrok.io`
- Exotel: `python bot.py -t exotel` (no proxy needed, but ngrok connection to HTTP 7860 is required)
"""
import argparse
import asyncio
import importlib.util
import mimetypes
import os
import sys
@@ -109,10 +85,8 @@ from pipecat.runner.types import (
DailyRunnerArguments,
RunnerArguments,
SmallWebRTCRunnerArguments,
VonageRunnerArguments,
WebSocketRunnerArguments,
)
from pipecat.runner.vonage import configure as configure_vonage
try:
import uvicorn
@@ -132,18 +106,6 @@ load_dotenv(override=True)
os.environ["ENV"] = "local"
TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
TRANSPORT_ROUTE_DEPENDENCIES = {
"daily": ("daily",),
"webrtc": ("aiortc",),
"telephony": ("fastapi", "websockets"),
"websocket": ("fastapi", "websockets"),
}
TRANSPORT_INSTALL_HINTS = {
"daily": "install pipecat-ai[daily]",
"webrtc": "install pipecat-ai[webrtc]",
"telephony": "install pipecat-ai[websocket]",
"websocket": "install pipecat-ai[websocket]",
}
# Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
PIPECAT_ROOM_EXP_HOURS = 4.0
@@ -169,120 +131,6 @@ Import this to add custom routes from other packages before calling
"""
def _is_module_available(module: str) -> bool:
"""Check whether a module can be imported without importing it.
Args:
module: Fully-qualified module name to check.
Returns:
``True`` if Python can resolve the module, ``False`` otherwise.
"""
try:
return importlib.util.find_spec(module) is not None
except (ImportError, ModuleNotFoundError, ValueError):
return False
def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
"""Return module dependencies required for a transport route.
Args:
transport: Transport name from the runner request or CLI.
Returns:
Module names required to enable the transport route.
"""
if transport in TELEPHONY_TRANSPORTS:
return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
def _transport_routes_enabled(transport: str) -> bool:
"""Return whether a transport route can run in this environment.
Args:
transport: Transport name from the runner request or CLI.
Returns:
``True`` if the requested transport is enabled.
"""
return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
def _runner_url(args: argparse.Namespace) -> str:
"""Return the browser URL for the runner prebuilt client."""
return f"http://{args.host}:{args.port}"
def _transport_status_lists() -> tuple[list[str], list[str]]:
"""Return enabled and disabled transport labels for the startup banner."""
transports = ["daily", "webrtc", "telephony", "websocket"]
enabled = []
disabled = []
for label in transports:
if _transport_routes_enabled(label):
enabled.append(label)
else:
disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
return enabled, disabled
def _format_transport_status(labels: list[str]) -> str:
"""Format a startup banner transport status list."""
return ", ".join(labels) if labels else "none"
def _print_startup_message(args: argparse.Namespace):
"""Print connection information for the development runner."""
print()
if args.transport is None:
enabled, disabled = _transport_status_lists()
print("🚀 Bot ready!")
print(f" → Open: {_runner_url(args)}")
print(f" → Enabled transports: {_format_transport_status(enabled)}")
if disabled:
print(f" → Disabled transports: {_format_transport_status(disabled)}")
elif args.transport == "webrtc":
if args.esp32:
print("🚀 Bot ready! (ESP32 mode)")
elif args.whatsapp:
print("🚀 Bot ready! (WhatsApp)")
else:
print("🚀 Bot ready! (WebRTC)")
if _transport_routes_enabled("webrtc"):
print(f" → Open: {_runner_url(args)}")
else:
print(f" → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
elif args.transport == "daily":
print("🚀 Bot ready! (Daily)")
if not _transport_routes_enabled("daily"):
print(f" → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
else:
print(f" → Open: {_runner_url(args)}")
if args.dialin:
print(
f" → Daily dial-in webhook: "
f"http://{args.host}:{args.port}/daily-dialin-webhook"
)
print(" → Configure this URL in your Daily phone number settings")
elif args.transport in TELEPHONY_TRANSPORTS:
print(f"🚀 Bot ready! ({args.transport.capitalize()})")
if not _transport_routes_enabled(args.transport):
print(f" → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
else:
print(f" → Open: {_runner_url(args)}")
if args.proxy:
print(f" → XML webhook: http://{args.host}:{args.port}/")
print(f" → WebSocket: ws://{args.host}:{args.port}/ws")
elif args.transport == "vonage":
print()
print("🚀 Bot ready!")
print()
def _get_bot_module():
"""Get the bot module from the calling script."""
import importlib.util
@@ -338,35 +186,8 @@ async def _run_telephony_bot(websocket: WebSocket, args: argparse.Namespace):
await bot_module.bot(runner_args)
async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):
"""Run a bot for plain WebSocket transport."""
bot_module = _get_bot_module()
runner_args = WebSocketRunnerArguments(
websocket=websocket,
transport_type="websocket",
session_id=str(uuid.uuid4()),
)
runner_args.cli_args = args
await bot_module.bot(runner_args)
def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
"""Set up the plain WebSocket route at ``/ws-client``."""
if not _transport_routes_enabled("websocket"):
return
@app.websocket("/ws-client")
async def websocket_client_endpoint(websocket: WebSocket):
"""Handle plain WebSocket connections (non-telephony)."""
await websocket.accept()
logger.debug("Plain WebSocket connection accepted")
await _run_websocket_bot(websocket, args)
def _configure_server_app(args: argparse.Namespace):
"""Configure the module-level FastAPI app with routes for all transports."""
"""Configure the module-level FastAPI app with transport-specific routes."""
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
@@ -375,38 +196,34 @@ def _configure_server_app(args: argparse.Namespace):
allow_headers=["*"],
)
# Shared session store: session_id -> body data. Used by the WebRTC /start
# flow and the /sessions/{session_id}/... proxy routes.
active_sessions: dict[str, dict[str, Any]] = {}
_setup_frontend_routes(app)
_setup_webrtc_routes(app, args, active_sessions)
_setup_daily_routes(app, args)
_setup_telephony_routes(app, args)
_setup_websocket_routes(app, args)
_setup_unified_start_route(app, args, active_sessions)
if args.whatsapp:
_setup_whatsapp_routes(app, args)
# Set up transport-specific routes
if args.transport == "webrtc":
_setup_webrtc_routes(app, args)
if args.whatsapp:
_setup_whatsapp_routes(app, args)
elif args.transport == "daily":
_setup_daily_routes(app, args)
elif args.transport in TELEPHONY_TRANSPORTS:
_setup_telephony_routes(app, args)
else:
logger.warning(f"Unknown transport type: {args.transport}")
def _setup_unified_start_route(
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
):
"""Register the unified POST /start and GET /status endpoints.
def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
"""Set up WebRTC-specific routes."""
try:
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
Handles WebRTC, Daily, and telephony transport start flows. Clients specify
which transport they want via the ``transport`` field in the request body.
When ``-t`` was passed on the command line, requests for any other transport
are rejected with HTTP 400.
"""
ALL_TRANSPORTS = ["webrtc", "daily", *TELEPHONY_TRANSPORTS, "websocket"]
@app.get("/status")
async def status():
"""Return the transports supported by this runner instance."""
transports = [args.transport] if args.transport is not None else ALL_TRANSPORTS
return {"status": "ready", "transports": transports}
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
from pipecat.transports.smallwebrtc.request_handler import (
IceCandidate,
SmallWebRTCPatchRequest,
SmallWebRTCRequest,
SmallWebRTCRequestHandler,
)
except ImportError as e:
logger.error(f"WebRTC transport dependencies not installed: {e}")
return
class IceServer(TypedDict, total=False):
urls: str | list[str]
@@ -417,227 +234,32 @@ def _setup_unified_start_route(
class StartBotResult(TypedDict, total=False):
sessionId: str
iceConfig: IceConfig | None
dailyRoom: str | None
dailyToken: str | None
wsUrl: str | None
token: str | None
@app.post("/start")
async def start_agent(request: Request):
"""Start a bot session.
# In-memory store of active sessions: session_id -> session info
active_sessions: dict[str, dict[str, Any]] = {}
Accepts::
{
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
// "plivo" | "exotel" — default: "webrtc"
// WebRTC-specific
"enableDefaultIceServers": false,
"body": {...},
// Daily-specific
"createDailyRoom": true,
"dailyRoomProperties": {...},
"dailyMeetingTokenProperties": {...},
"body": {...}
}
"""
try:
request_data = await request.json()
logger.debug(f"Received request: {request_data}")
except Exception as e:
logger.error(f"Failed to parse request body: {e}")
request_data = {}
# Determine transport: explicit field → legacy Daily hint → CLI default → webrtc
transport = request_data.get("transport")
if transport is None and request_data.get("createDailyRoom", False):
transport = "daily"
if transport is None:
transport = args.transport or "webrtc"
# Enforce restriction when -t was explicitly set on the command line
if args.transport is not None and transport != args.transport:
raise HTTPException(
status_code=400,
detail=(
f"Transport '{transport}' is not allowed. "
f"Server is configured for '{args.transport}' only (-t {args.transport})."
),
)
if not _transport_routes_enabled(transport):
raise HTTPException(
status_code=400,
detail=(
f"Transport '{transport}' is disabled in this runner environment. "
"Check the startup banner for enabled transports."
),
)
if transport == "webrtc":
# WebRTC: register the session; the bot starts when the WebRTC offer arrives.
session_id = str(uuid.uuid4())
active_sessions[session_id] = request_data.get("body", {})
result = StartBotResult(
sessionId=session_id,
)
if request_data.get("enableDefaultIceServers"):
result["iceConfig"] = IceConfig(
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
)
return result
elif transport == "daily":
create_daily_room = request_data.get("createDailyRoom", False)
body = request_data.get("body", {})
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
bot_module = _get_bot_module()
existing_room_url = os.getenv("DAILY_ROOM_URL")
session_id = str(uuid.uuid4())
result: StartBotResult | None = None
if create_daily_room or existing_room_url:
from pipecat.runner.daily import configure
from pipecat.transports.daily.utils import (
DailyMeetingTokenProperties,
DailyRoomProperties,
)
async with aiohttp.ClientSession() as session:
room_properties = None
if daily_room_properties_dict:
daily_room_properties_dict.setdefault(
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
)
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
try:
room_properties = DailyRoomProperties(**daily_room_properties_dict)
logger.debug(f"Using custom room properties: {room_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyRoomProperties: {e}")
token_properties = None
if daily_token_properties_dict:
try:
token_properties = DailyMeetingTokenProperties(
**daily_token_properties_dict
)
logger.debug(f"Using custom token properties: {token_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
room_url, token = await configure(
session,
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
room_properties=room_properties,
token_properties=token_properties,
)
runner_args = DailyRunnerArguments(
room_url=room_url, token=token, body=body, session_id=session_id
)
result = StartBotResult(
dailyRoom=room_url,
dailyToken=token,
sessionId=session_id,
)
else:
runner_args = RunnerArguments(body=body, session_id=session_id)
runner_args.cli_args = args
asyncio.create_task(bot_module.bot(runner_args))
return result
elif transport in TELEPHONY_TRANSPORTS:
# Telephony: the bot starts when the provider connects to /ws.
# Return the WebSocket URL so the caller knows where to point their provider.
scheme = "wss" if args.host != "localhost" else "ws"
return StartBotResult(
wsUrl=f"{scheme}://{args.host}:{args.port}/ws",
)
elif transport == "websocket":
# Plain WebSocket: the bot starts when the client connects to /ws-client.
scheme = "wss" if args.host != "localhost" else "ws"
session_id = str(uuid.uuid4())
return StartBotResult(
wsUrl=f"{scheme}://{args.host}:{args.port}/ws-client",
sessionId=session_id,
token="mock_token",
)
else:
raise HTTPException(
status_code=400,
detail=f"Unknown transport '{transport}'.",
)
def _resolve_download_path(folder: str, filename: str) -> Path:
"""Resolve a download path and ensure it stays within the downloads folder."""
allowed_base = Path(folder).resolve()
file_path = (allowed_base / filename).resolve()
if not file_path.is_relative_to(allowed_base):
raise HTTPException(status_code=403, detail="Access denied")
return file_path
def _setup_frontend_routes(app: FastAPI):
"""Mount the prebuilt frontend UI and root redirect for all transports."""
try:
from pipecat_ai_prebuilt.frontend import PipecatPrebuiltUI
except ImportError as e:
logger.error(f"Prebuilt frontend not available: {e}")
return
app.mount("/client", PipecatPrebuiltUI)
# Mount the frontend
app.mount("/client", SmallWebRTCPrebuiltUI)
@app.get("/", include_in_schema=False)
async def root_redirect():
"""Redirect root requests to client interface."""
return RedirectResponse(url="/client/")
def _setup_webrtc_routes(
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
):
"""Set up WebRTC-specific routes."""
if not _transport_routes_enabled("webrtc"):
return
try:
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
from pipecat.transports.smallwebrtc.request_handler import (
IceCandidate,
SmallWebRTCPatchRequest,
SmallWebRTCRequest,
SmallWebRTCRequestHandler,
)
except ImportError as e:
logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
return
@app.get("/files/{filename:path}")
async def download_file(filename: str):
"""Handle file downloads."""
if not args.folder:
logger.warning(f"Attempting to download {filename}, but downloads folder not setup.")
raise HTTPException(404)
logger.warning(f"Attempting to dowload {filename}, but downloads folder not setup.")
return
file_path = _resolve_download_path(args.folder, filename)
if not file_path.exists():
file_path = Path(args.folder) / filename
if not os.path.exists(file_path):
raise HTTPException(404)
media_type, _ = mimetypes.guess_type(file_path)
return FileResponse(path=file_path, media_type=media_type, filename=file_path.name)
return FileResponse(path=file_path, media_type=media_type, filename=filename)
# Initialize the SmallWebRTC request handler
small_webrtc_handler: SmallWebRTCRequestHandler = SmallWebRTCRequestHandler(
@@ -682,6 +304,29 @@ def _setup_webrtc_routes(
await small_webrtc_handler.handle_patch_request(request)
return {"status": "success"}
@app.post("/start")
async def rtvi_start(request: Request):
"""Mimic Pipecat Cloud's /start endpoint."""
# Parse the request body
try:
request_data = await request.json()
logger.debug(f"Received request: {request_data}")
except Exception as e:
logger.error(f"Failed to parse request body: {e}")
request_data = {}
# Store session info immediately in memory, replicate the behavior expected on Pipecat Cloud
session_id = str(uuid.uuid4())
active_sessions[session_id] = request_data.get("body", {})
result: StartBotResult = {"sessionId": session_id}
if request_data.get("enableDefaultIceServers"):
result["iceConfig"] = IceConfig(
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
)
return result
@app.api_route(
"/sessions/{session_id}/{path:path}",
methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
@@ -906,13 +551,13 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):
def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
"""Set up Daily-specific routes."""
if not _transport_routes_enabled("daily"):
return
@app.get("/daily")
@app.get("/")
async def create_room_and_start_agent():
"""Launch a Daily bot and redirect to room."""
logger.debug("Starting bot with Daily transport and redirecting to Daily room")
print("Starting bot with Daily transport and redirecting to Daily room")
import aiohttp
from pipecat.runner.daily import configure
@@ -928,6 +573,105 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
asyncio.create_task(bot_module.bot(runner_args))
return RedirectResponse(room_url)
@app.post("/start")
async def start_agent(request: Request):
"""Handler for /start endpoints.
Expects POST body like::
{
"createDailyRoom": true,
"dailyRoomProperties": { "start_video_off": true },
"dailyMeetingTokenProperties": { "is_owner": true, "user_name": "Bot" },
"body": { "custom_data": "value" }
}
"""
print("Starting bot with Daily transport")
# Parse the request body
try:
request_data = await request.json()
logger.debug(f"Received request: {request_data}")
except Exception as e:
logger.error(f"Failed to parse request body: {e}")
request_data = {}
create_daily_room = request_data.get("createDailyRoom", False)
body = request_data.get("body", {})
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
bot_module = _get_bot_module()
existing_room_url = os.getenv("DAILY_ROOM_URL")
session_id = str(uuid.uuid4())
result = None
# Configure room if:
# 1. Explicitly requested via createDailyRoom in payload
# 2. Using pre-configured room from DAILY_ROOM_URL env var
if create_daily_room or existing_room_url:
import aiohttp
from pipecat.runner.daily import configure
from pipecat.transports.daily.utils import (
DailyMeetingTokenProperties,
DailyRoomProperties,
)
async with aiohttp.ClientSession() as session:
# Parse dailyRoomProperties if provided
room_properties = None
if daily_room_properties_dict:
# Apply Pipecat Cloud's session policy if caller didn't override.
daily_room_properties_dict.setdefault(
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
)
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
try:
room_properties = DailyRoomProperties(**daily_room_properties_dict)
logger.debug(f"Using custom room properties: {room_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyRoomProperties: {e}")
# Continue without custom properties
# Parse dailyMeetingTokenProperties if provided
token_properties = None
if daily_token_properties_dict:
try:
token_properties = DailyMeetingTokenProperties(
**daily_token_properties_dict
)
logger.debug(f"Using custom token properties: {token_properties}")
except Exception as e:
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
# Continue without custom properties
room_url, token = await configure(
session,
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
room_properties=room_properties,
token_properties=token_properties,
)
runner_args = DailyRunnerArguments(
room_url=room_url, token=token, body=body, session_id=session_id
)
result = {
"dailyRoom": room_url,
"dailyToken": token,
"sessionId": session_id,
}
else:
runner_args = RunnerArguments(body=body, session_id=session_id)
# Update CLI args.
runner_args.cli_args = args
# Start the bot in the background
asyncio.create_task(bot_module.bot(runner_args))
return result
if args.dialin:
@app.post("/daily-dialin-webhook")
@@ -976,6 +720,8 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
detail="Missing required fields: From, To, callId, callDomain",
)
import aiohttp
from pipecat.runner.daily import configure
from pipecat.runner.types import DailyDialinRequest, DialinSettings
@@ -1044,54 +790,44 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
"""Set up telephony-specific routes.
The WebSocket endpoint (``/ws``) is always registered so providers can
connect directly. The XML webhook (``POST /``) is only registered when a
specific telephony transport is chosen via ``-t`` because the XML template
is provider-specific and requires a proxy hostname (``--proxy``).
"""
if not _transport_routes_enabled("telephony"):
return
if args.transport in TELEPHONY_TRANSPORTS:
# XML response templates (Exotel doesn't use XML webhooks)
XML_TEMPLATES = {
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
"""Set up telephony-specific routes."""
# XML response templates (Exotel doesn't use XML webhooks)
XML_TEMPLATES = {
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://{args.proxy}/ws"></Stream>
</Connect>
<Pause length="40"/>
</Response>""",
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Connect>
<Stream url="wss://{args.proxy}/ws" bidirectionalMode="rtp"></Stream>
</Connect>
<Pause length="40"/>
</Response>""",
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000">wss://{args.proxy}/ws</Stream>
</Response>""",
}
}
@app.post("/")
async def start_call():
"""Handle telephony webhook and return XML response."""
if args.transport == "exotel":
# Exotel doesn't use POST webhooks - redirect to proper documentation
logger.debug("POST Exotel endpoint - not used")
return {
"error": "Exotel doesn't use POST webhooks",
"websocket_url": f"wss://{args.proxy}/ws",
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
}
else:
logger.debug(f"POST {args.transport.upper()} XML")
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
return HTMLResponse(content=xml_content, media_type="application/xml")
@app.post("/")
async def start_call():
"""Handle telephony webhook and return XML response."""
if args.transport == "exotel":
# Exotel doesn't use POST webhooks - redirect to proper documentation
logger.debug("POST Exotel endpoint - not used")
return {
"error": "Exotel doesn't use POST webhooks",
"websocket_url": f"wss://{args.proxy}/ws",
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
}
else:
logger.debug(f"POST {args.transport.upper()} XML")
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
return HTMLResponse(content=xml_content, media_type="application/xml")
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
@@ -1100,6 +836,11 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
logger.debug("WebSocket connection accepted")
await _run_telephony_bot(websocket, args)
@app.get("/")
async def start_agent():
"""Simple status endpoint for telephony transports."""
return {"status": f"Bot started with {args.transport}"}
async def _run_daily_direct(args: argparse.Namespace):
"""Run Daily bot with direct connection (no FastAPI server)."""
@@ -1131,25 +872,6 @@ async def _run_daily_direct(args: argparse.Namespace):
await bot_module.bot(runner_args)
async def _run_vonage():
"""Run Vonage bot (no FastAPI server)."""
logger.info("Running Vonage transport...")
application_id, session_id, token = await configure_vonage()
runner_args = VonageRunnerArguments(
application_id=application_id, vonage_session_id=session_id, token=token
)
runner_args.handle_sigint = True
# Get the bot module and run it directly
bot_module = _get_bot_module()
print(f"Joining Vonage session: {runner_args.vonage_session_id}")
print()
await bot_module.bot(runner_args)
def _validate_and_clean_proxy(proxy: str) -> str:
"""Validate and clean proxy hostname, removing protocol if present."""
if not proxy:
@@ -1189,27 +911,22 @@ def runner_port() -> int:
def main(parser: argparse.ArgumentParser | None = None):
"""Start the Pipecat development runner.
Parses command-line arguments and starts a FastAPI server that supports
WebRTC, Daily, and telephony transports simultaneously. Clients declare
which transport to use via the ``transport`` field in the ``/start`` body.
When ``-t`` is provided, the server restricts ``/start`` to that transport
only and displays transport-specific startup information.
Parses command-line arguments and starts a FastAPI server configured
for the specified transport type.
The runner discovers and runs any ``bot(runner_args)`` function found in the
calling module.
Command-line arguments:
- --host: Server host address (default: localhost)
- --host: Server host address (default: localhost) 879
- --port: Server port (default: 7860)
- -t/--transport: Restrict to a single transport and set as default for /start
(daily, webrtc, twilio, telnyx, plivo, exotel). Omit to support all transports.
- -t/--transport: Transport type (daily, webrtc, twilio, telnyx, plivo, exotel)
- -x/--proxy: Public proxy hostname for telephony webhooks
- -d/--direct: Connect directly to Daily room (automatically sets transport to daily)
- -f/--folder: Path to downloads folder
- --dialin: Enable Daily PSTN dial-in webhook handling
- --dialin: Enable Daily PSTN dial-in webhook handling (requires Daily transport)
- --esp32: Enable SDP munging for ESP32 compatibility (requires --host with IP address)
- --whatsapp: Ensure required WhatsApp environment variables are present
- --whatsapp: Ensure requried WhatsApp environment variables are present
- -v/--verbose: Increase logging verbosity
Args:
@@ -1229,12 +946,9 @@ def main(parser: argparse.ArgumentParser | None = None):
"-t",
"--transport",
type=str,
choices=["daily", "vonage", "webrtc", *TELEPHONY_TRANSPORTS],
default=None,
help=(
"Restrict the server to a single transport and set it as the default for /start. "
"Omit to support all transports simultaneously (default behaviour)."
),
choices=["daily", "webrtc", *TELEPHONY_TRANSPORTS],
default="webrtc",
help="Transport type",
)
parser.add_argument("-x", "--proxy", help="Public proxy host name")
parser.add_argument(
@@ -1252,7 +966,7 @@ def main(parser: argparse.ArgumentParser | None = None):
"--dialin",
action="store_true",
default=False,
help="Enable Daily PSTN dial-in webhook handling",
help="Enable Daily PSTN dial-in webhook handling (requires Daily transport)",
)
parser.add_argument(
"--esp32",
@@ -1264,7 +978,7 @@ def main(parser: argparse.ArgumentParser | None = None):
"--whatsapp",
action="store_true",
default=False,
help="Ensure required WhatsApp environment variables are present",
help="Ensure requried WhatsApp environment variables are present",
)
args = parser.parse_args()
@@ -1273,13 +987,12 @@ def main(parser: argparse.ArgumentParser | None = None):
if args.proxy:
args.proxy = _validate_and_clean_proxy(args.proxy)
# --direct implies Daily transport
if args.direct:
if args.transport is None or args.transport == "daily":
args.transport = "daily"
else:
logger.error("--direct flag only works with Daily transport (-t daily)")
return
# Auto-set transport to daily if --direct is used without explicit transport
if args.direct and args.transport == "webrtc": # webrtc is the default
args.transport = "daily"
elif args.direct and args.transport != "daily":
logger.error("--direct flag only works with Daily transport (-t daily)")
return
# Validate ESP32 requirements
if args.esp32 and args.host == "localhost":
@@ -1287,7 +1000,7 @@ def main(parser: argparse.ArgumentParser | None = None):
return
# Validate dial-in requirements
if args.dialin and args.transport is not None and args.transport != "daily":
if args.dialin and args.transport != "daily":
logger.error("--dialin flag only works with Daily transport (-t daily)")
return
@@ -1305,12 +1018,28 @@ def main(parser: argparse.ArgumentParser | None = None):
asyncio.run(_run_daily_direct(args))
return
# Print startup message
_print_startup_message(args)
if args.transport == "vonage":
asyncio.run(_run_vonage())
# Print startup message for server-based transports
if args.transport == "webrtc":
print()
if args.esp32:
print(f"🚀 Bot ready! (ESP32 mode)")
elif args.whatsapp:
print(f"🚀 Bot ready! (WhatsApp)")
else:
print(f"🚀 Bot ready!")
print(f" → Open http://{args.host}:{args.port}/client in your browser")
print()
elif args.transport == "daily":
print()
print(f"🚀 Bot ready!")
if args.dialin:
print(
f" → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
)
print(f" → Configure this URL in your Daily phone number settings")
else:
print(f" → Open http://{args.host}:{args.port} in your browser to start a session")
print()
return
RUNNER_DOWNLOADS_FOLDER = args.folder
RUNNER_HOST = args.host

View File

@@ -99,35 +99,16 @@ class DailyRunnerArguments(RunnerArguments):
token: str | None = None
@dataclass
class VonageRunnerArguments(RunnerArguments):
"""Vonage transport session arguments for the runner.
Parameters:
application_id: Vonage application ID
vonage_session_id: Vonage session ID
token: Vonage Session Token
"""
application_id: str
vonage_session_id: str
token: str
@dataclass
class WebSocketRunnerArguments(RunnerArguments):
"""WebSocket transport session arguments for the runner.
Parameters:
websocket: WebSocket connection for audio streaming
transport_type: Transport type identifier. Set to ``"websocket"`` for plain
WebSocket connections; ``None`` triggers auto-detection from the first
telephony provider message.
body: Additional request data
"""
websocket: WebSocket
transport_type: str | None = None
@dataclass

View File

@@ -33,7 +33,7 @@ import json
import os
import re
from collections.abc import Callable
from typing import Any, cast
from typing import Any
from fastapi import WebSocket
from loguru import logger
@@ -42,10 +42,9 @@ from pipecat.runner.types import (
DailyRunnerArguments,
LiveKitRunnerArguments,
SmallWebRTCRunnerArguments,
VonageRunnerArguments,
WebSocketRunnerArguments,
)
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.base_transport import BaseTransport
def _detect_transport_type_from_message(message_data: dict) -> str:
@@ -272,14 +271,6 @@ def get_transport_client_id(transport: BaseTransport, client: Any) -> str:
except ImportError:
pass
try:
from pipecat.transports.vonage.video_connector import VonageVideoConnectorTransport
if isinstance(transport, VonageVideoConnectorTransport):
return client["streamId"]
except ImportError:
pass
logger.warning(f"Unable to get client id from unsupported transport {type(transport)}")
return ""
@@ -312,24 +303,6 @@ async def maybe_capture_participant_camera(
except ImportError:
pass
try:
from pipecat.transports.vonage.video_connector import (
SubscribeSettings,
VonageVideoConnectorTransport,
)
if isinstance(transport, VonageVideoConnectorTransport):
await transport.subscribe_to_stream(
client["streamId"],
SubscribeSettings(
subscribe_to_audio=True,
subscribe_to_video=True,
preferred_framerate=framerate if framerate != 0 else None,
),
)
except ImportError:
pass
async def maybe_capture_participant_screen(
transport: BaseTransport, client: Any, framerate: int = 0
@@ -536,35 +509,37 @@ async def create_transport(
"daily": lambda: DailyParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"webrtc": lambda: TransportParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
),
"twilio": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
# add_wav_header and serializer will be set automatically
),
"telnyx": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
# add_wav_header and serializer will be set automatically
),
"plivo": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
# add_wav_header and serializer will be set automatically
),
"exotel": lambda: FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
# add_wav_header and serializer will be set automatically
),
"vonage": lambda: VonageVideoConnectorTransportParams(
audio_in_enabled=True,
audio_out_enabled=True
),
}
transport = await create_transport(runner_args, transport_params)
@@ -593,12 +568,6 @@ async def create_transport(
)
elif isinstance(runner_args, WebSocketRunnerArguments):
if runner_args.transport_type == "websocket":
params = _get_transport_params("websocket", transport_params)
from pipecat.transports.websocket.fastapi import FastAPIWebsocketTransport
return FastAPIWebsocketTransport(websocket=runner_args.websocket, params=params)
# Parse once to determine the provider and get data
transport_type, call_data = await parse_telephony_websocket(runner_args.websocket)
params = _get_transport_params(transport_type, transport_params)
@@ -618,31 +587,6 @@ async def create_transport(
runner_args.room_name,
params=params,
)
elif isinstance(runner_args, VonageRunnerArguments):
from pipecat.transports.vonage.video_connector import (
VonageVideoConnectorTransport,
VonageVideoConnectorTransportParams,
)
try:
params = cast(
VonageVideoConnectorTransportParams,
_get_transport_params("vonage", transport_params),
)
except ValueError:
webrtc_params: TransportParams = cast(
TransportParams, _get_transport_params("webrtc", transport_params)
)
params = VonageVideoConnectorTransportParams(
**webrtc_params.model_dump(),
video_in_auto_subscribe=True,
)
return VonageVideoConnectorTransport(
runner_args.application_id,
runner_args.vonage_session_id,
runner_args.token,
params=params,
)
else:
raise ValueError(f"Unsupported runner arguments type: {type(runner_args)}")

Some files were not shown because too many files have changed in this diff Show More