Compare commits
111 Commits
pk/optiona
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
780c004168 | ||
|
|
28f9203401 | ||
|
|
77cc314a08 | ||
|
|
4a8d1d0b5e | ||
|
|
87f5d60693 | ||
|
|
c699b31daa | ||
|
|
ee674ffb01 | ||
|
|
86a5710801 | ||
|
|
4a96b2a9e6 | ||
|
|
105d6f27da | ||
|
|
e0e3cd336a | ||
|
|
9586db5b50 | ||
|
|
a890ab7b21 | ||
|
|
c1bf7dbb4a | ||
|
|
709a0ce839 | ||
|
|
be93350eae | ||
|
|
4a96ab7073 | ||
|
|
c321f50e76 | ||
|
|
bca337f97e | ||
|
|
5d9e8c5ac5 | ||
|
|
70773bce0a | ||
|
|
8bdb49bd1a | ||
|
|
81bb81c1d0 | ||
|
|
e1bdee598c | ||
|
|
185a89bb3b | ||
|
|
6b9deefbe3 | ||
|
|
deefc32faf | ||
|
|
a5e6886b80 | ||
|
|
d11a4ba0cd | ||
|
|
38407e091d | ||
|
|
82cd931efa | ||
|
|
33e5d1f89b | ||
|
|
861dd23873 | ||
|
|
b825dd779e | ||
|
|
1487da53a9 | ||
|
|
aff84a5d9e | ||
|
|
c09f6d5adb | ||
|
|
e2d249e5d9 | ||
|
|
956b39b0dc | ||
|
|
e298491068 | ||
|
|
97b00042df | ||
|
|
bc769eaa82 | ||
|
|
ee5aa4dc71 | ||
|
|
dd38fbc735 | ||
|
|
a1c40df471 | ||
|
|
c4ff9300c9 | ||
|
|
cab4585cbb | ||
|
|
18368d047e | ||
|
|
e3abb4b6d7 | ||
|
|
0fd971d59d | ||
|
|
c61672194d | ||
|
|
c51a817efa | ||
|
|
d85eda6da8 | ||
|
|
71feb42711 | ||
|
|
6b93ca0cb6 | ||
|
|
b6ecce754b | ||
|
|
d39e6bf921 | ||
|
|
63064860ef | ||
|
|
f5158d51e7 | ||
|
|
94dbd2fa68 | ||
|
|
c6ea6c6522 | ||
|
|
58a22aeeb1 | ||
|
|
5403aa56e4 | ||
|
|
0e0d76d020 | ||
|
|
b493ed8d3a | ||
|
|
c3338667b1 | ||
|
|
ea296babe9 | ||
|
|
b13af2b053 | ||
|
|
7b6d878f07 | ||
|
|
8e405f15aa | ||
|
|
44a40e8eb2 | ||
|
|
ea97cb1a78 | ||
|
|
22650b1b56 | ||
|
|
b76831e677 | ||
|
|
b57111743f | ||
|
|
dcbb0070c9 | ||
|
|
73278d3309 | ||
|
|
c8efe319b3 | ||
|
|
49bda11ae8 | ||
|
|
07640582ce | ||
|
|
078af6969a | ||
|
|
9f40ba21c2 | ||
|
|
82f0896d6a | ||
|
|
7e4cd23de4 | ||
|
|
97f50c8aa2 | ||
|
|
08680732f6 | ||
|
|
064b68aa01 | ||
|
|
b0f8ea7e28 | ||
|
|
ad50c8d5d5 | ||
|
|
39e7f9e354 | ||
|
|
7cc7968abb | ||
|
|
52d8008783 | ||
|
|
a3ce963b54 | ||
|
|
e70ee603b2 | ||
|
|
111e59a7b1 | ||
|
|
079282d140 | ||
|
|
0ccdd808e6 | ||
|
|
863a1bf177 | ||
|
|
58333b2705 | ||
|
|
ecaff1d1eb | ||
|
|
9b55d4ddd4 | ||
|
|
d6655e7a5e | ||
|
|
33b73df6ec | ||
|
|
c9f0172e9f | ||
|
|
2638885c62 | ||
|
|
cb426cbb14 | ||
|
|
d39beff817 | ||
|
|
1eade184f1 | ||
|
|
3fa193b983 | ||
|
|
6feeee515f | ||
|
|
55fb4b0845 |
91
.claude/skills/squash-commits/SKILL.md
Normal file
91
.claude/skills/squash-commits/SKILL.md
Normal file
@@ -0,0 +1,91 @@
|
||||
---
|
||||
name: squash-commits
|
||||
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
|
||||
---
|
||||
|
||||
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
|
||||
|
||||
## Instructions
|
||||
|
||||
### 1. Safety check
|
||||
|
||||
```bash
|
||||
git status --short
|
||||
```
|
||||
|
||||
If there are uncommitted changes, stop and tell the user to commit or stash them first.
|
||||
|
||||
### 2. Inspect the branch
|
||||
|
||||
```bash
|
||||
git log main..HEAD --oneline
|
||||
git diff main..HEAD --name-only
|
||||
```
|
||||
|
||||
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
|
||||
|
||||
### 3. Create a backup branch
|
||||
|
||||
```bash
|
||||
git branch backup/<current-branch-name>
|
||||
```
|
||||
|
||||
Tell the user the backup exists so they can recover if needed.
|
||||
|
||||
### 4. Soft-reset to main and unstage everything
|
||||
|
||||
```bash
|
||||
git reset --soft main
|
||||
git restore --staged .
|
||||
```
|
||||
|
||||
All branch changes are now in the working tree, unstaged. No content has changed.
|
||||
|
||||
### 5. Plan the logical groups
|
||||
|
||||
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
|
||||
|
||||
- Core feature or fix (new source files + modified core files)
|
||||
- Secondary features or fixes (each as its own commit if distinct)
|
||||
- Refactoring or renames
|
||||
- Tests
|
||||
- Changelogs / docs
|
||||
|
||||
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
|
||||
|
||||
Present the proposed grouping to the user and ask for confirmation before committing.
|
||||
|
||||
### 6. Commit in logical groups
|
||||
|
||||
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
|
||||
|
||||
```bash
|
||||
git add <file1> <file2> ...
|
||||
git commit -m "..."
|
||||
```
|
||||
|
||||
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
|
||||
|
||||
### 7. Verify
|
||||
|
||||
```bash
|
||||
git log main..HEAD --oneline
|
||||
git diff main..HEAD --name-only
|
||||
git status --short
|
||||
```
|
||||
|
||||
Confirm:
|
||||
- Commit count is small and each message is meaningful
|
||||
- The set of changed files vs `main` is identical to before
|
||||
- Working tree is clean
|
||||
|
||||
### 8. Remind about force-push
|
||||
|
||||
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
|
||||
|
||||
## Rules
|
||||
|
||||
- Never change file contents. If you find yourself editing a file, stop.
|
||||
- Never skip the backup branch step.
|
||||
- Never force-push without explicit user instruction.
|
||||
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.
|
||||
509
CHANGELOG.md
509
CHANGELOG.md
@@ -7,6 +7,515 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [1.2.1] - 2026-05-15
|
||||
|
||||
### Changed
|
||||
|
||||
- Changed the default WebSocket endpoints for `GradiumSTTService` and
|
||||
`GradiumTTSService` to the region-neutral
|
||||
`wss://api.gradium.ai/api/speech/asr` and
|
||||
`wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
|
||||
traffic to the nearest endpoint. Override the url to pin to a specific
|
||||
region.
|
||||
(PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
|
||||
responded by calling a tool. The user turn never finalized, so the assistant
|
||||
aggregator gated the tool-result context push and the LLM continuation never
|
||||
ran. Tool calls now finalize the turn the moment they start, before the
|
||||
function dispatches.
|
||||
(PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
|
||||
|
||||
## [1.2.0] - 2026-05-14
|
||||
|
||||
### Added
|
||||
|
||||
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a
|
||||
per-session identifier in local development the same way they can in Pipecat
|
||||
Cloud. The development runner now mints a UUID at every construction site,
|
||||
and paths that already returned a `sessionId` to the caller (Daily `/start`,
|
||||
dial-in webhook) share that same UUID with the runner args instead of
|
||||
generating two. The SmallWebRTC `/api/offer` endpoint also accepts an
|
||||
optional `session_id` query parameter so the `/sessions/{session_id}/...`
|
||||
proxy can thread it through.
|
||||
(PR [#4385](https://github.com/pipecat-ai/pipecat/pull/4385))
|
||||
|
||||
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService`
|
||||
for controlling Cartesia's server-side text buffering. When unset, Pipecat
|
||||
picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE`
|
||||
mode (custom buffering — avoids stacking client-side aggregation on top of
|
||||
Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode
|
||||
(Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to
|
||||
override.
|
||||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||||
|
||||
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and
|
||||
`DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
|
||||
Improvement Program. When set, the value is forwarded to Deepgram as a query
|
||||
parameter on the speak request. Defaults to `None`, which preserves the
|
||||
existing behavior. See https://dpgr.am/deepgram-mip for pricing implications
|
||||
before enabling.
|
||||
(PR [#4400](https://github.com/pipecat-ai/pipecat/pull/4400))
|
||||
|
||||
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set
|
||||
via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that
|
||||
appends a developer-role message to the context whenever `LLMSetToolsFrame`
|
||||
changes the set of advertised standard tools. Helps the LLM stay coherent
|
||||
across mid-conversation tool changes, mitigating several flavors of
|
||||
tool-call-related hallucination: calling tools that have been removed,
|
||||
avoiding tools that have been re-added, and hallucinating output (made-up
|
||||
answers or tool-call-shaped non-tool-calls) when tools are unavailable.
|
||||
(PR [#4404](https://github.com/pipecat-ai/pipecat/pull/4404))
|
||||
|
||||
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in
|
||||
`pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the
|
||||
inference-triggered event and suppresses `on_user_turn_stopped`, leaving
|
||||
finalization to another strategy in the chain such as
|
||||
`LLMTurnCompletionUserTurnStopStrategy`.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` —
|
||||
a generic stop strategy that finalizes the user turn whenever a
|
||||
`UserTurnInferenceCompletedFrame` arrives, regardless of which component
|
||||
produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base;
|
||||
future producers (Flux, custom end-of-turn classifiers, etc.) can use the
|
||||
base directly or subclass it to add producer-specific setup.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Added `on_user_turn_inference_triggered`, a new event on the user turn
|
||||
controller, processor, aggregator and stop strategies that fires when a
|
||||
strategy has enough signal to start LLM inference. By default it fires
|
||||
together with `on_user_turn_stopped`; a gating strategy can fire only the
|
||||
inference-triggered event and defer finalization to a peer.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Added `FilterIncompleteUserTurnStrategies` in
|
||||
`pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization
|
||||
that wraps the detector chain with `deferred(...)` and appends
|
||||
`LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case:
|
||||
`user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass
|
||||
`config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`.
|
||||
When installed, the strategy gates `on_user_turn_stopped` on a
|
||||
`UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by
|
||||
any component that can judge turn completeness — e.g. the
|
||||
`UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout`
|
||||
provides a safety net if no completion frame ever arrives.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Added first-class RTVI support for the UI Agent Protocol:
|
||||
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server
|
||||
messages, plus `ui-command` and `ui-task` server-to-client messages, with
|
||||
paired `*Data` / `*Message` pydantic models.
|
||||
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`,
|
||||
`Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching
|
||||
default handlers live in `@pipecat-ai/client-react`.
|
||||
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`,
|
||||
and `ui-cancel-task` messages.
|
||||
- Adds five UI pipeline frames, mirroring the `client-message`
|
||||
frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` /
|
||||
`RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` /
|
||||
`UITaskMessage` envelopes, while the processor pushes inbound
|
||||
`RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame`
|
||||
alongside `on_ui_message`.
|
||||
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
|
||||
(PR [#4407](https://github.com/pipecat-ai/pipecat/pull/4407))
|
||||
|
||||
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore
|
||||
processor now resolve credentials via the standard boto3 provider chain (EC2
|
||||
instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
|
||||
`~/.aws/credentials`) when explicit credentials and `AWS_*` environment
|
||||
variables are absent. Services running with IAM roles no longer need to
|
||||
export static credentials.
|
||||
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
|
||||
|
||||
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can
|
||||
bias transcription for both file-based and realtime transcription.
|
||||
(PR [#4426](https://github.com/pipecat-ai/pipecat/pull/4426))
|
||||
|
||||
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and
|
||||
`DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum
|
||||
silence duration before the watchdog sends a silence packet to prevent
|
||||
dangling turns. The actual threshold is `max(chunk_duration * 2,
|
||||
watchdog_min_timeout)`, so it also adapts automatically to the audio chunk
|
||||
size in use.
|
||||
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
|
||||
|
||||
- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on
|
||||
models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini
|
||||
2.x); the conversation now continues while the tool runs. On models that
|
||||
don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time
|
||||
warning explaining the limitation. (Note: an intermittent 1008 error can
|
||||
occasionally fire on Gemini 2.5 during long-running tool calls; we
|
||||
auto-reconnect.)
|
||||
(PR [#4448](https://github.com/pipecat-ai/pipecat/pull/4448))
|
||||
|
||||
- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition
|
||||
using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint.
|
||||
Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is
|
||||
VAD-aware, and automatically reconnects on error.
|
||||
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
|
||||
|
||||
- Added NVIDIA Magpie TTS services via AWS SageMaker:
|
||||
`NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM
|
||||
back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream
|
||||
with full interruption support via `InterruptibleTTSService`).
|
||||
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
|
||||
|
||||
- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`,
|
||||
for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
|
||||
(PR [#4470](https://github.com/pipecat-ai/pipecat/pull/4470))
|
||||
|
||||
- Inworld TTS updates:
|
||||
- Added `delivery_mode` setting (`STABLE`/`BALANCED`/`CREATIVE`) to
|
||||
`InworldTTSService` and `InworldHttpTTSService`, enabling the
|
||||
stability-vs-creativity tradeoff in `inworld-tts-2`.
|
||||
- Added language support to `InworldTTSService` and
|
||||
`InworldHttpTTSService`. The `language` setting is now forwarded to the API,
|
||||
and a new `language_to_inworld_language()` helper normalizes Pipecat
|
||||
`Language` enums to Inworld's BCP-47 locale tags.
|
||||
(PR [#4473](https://github.com/pipecat-ai/pipecat/pull/4473))
|
||||
|
||||
### Changed
|
||||
|
||||
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the
|
||||
generally available `tts-rt-v1`.
|
||||
(PR [#4386](https://github.com/pipecat-ai/pipecat/pull/4386))
|
||||
|
||||
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16`
|
||||
to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the
|
||||
`use_normalized_timestamps` and `max_buffer_delay_ms` fields.
|
||||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||||
|
||||
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead
|
||||
of the deprecated `use_original_timestamps` field. Word timestamps now
|
||||
reflect what was actually spoken (post text-normalization and
|
||||
pronunciation-dictionary substitution), matching the convention Pipecat uses
|
||||
for ElevenLabs. This is a behavior change for `sonic-3` users, who were
|
||||
previously receiving timestamps tied to the input transcript.
|
||||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||||
|
||||
- Broadened `tool_resources` to `app_resources` for easy access not just in
|
||||
tool handlers but in other places like custom `FrameProcessor`s. Three
|
||||
changes: a rename (`tool_resources` → `app_resources`), a new `app_resources`
|
||||
property on `PipelineTask`, and a new `pipeline_task` property on
|
||||
`FrameProcessor`. Tool handlers now read `params.app_resources`; custom
|
||||
processors read `self.pipeline_task.app_resources`. The previous
|
||||
`tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and
|
||||
`FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit
|
||||
`DeprecationWarning`s.
|
||||
(PR [#4395](https://github.com/pipecat-ai/pipecat/pull/4395))
|
||||
|
||||
- Lowered the per-message log in
|
||||
`SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App
|
||||
messages can be high-frequency and were noisy at debug level; set the loguru
|
||||
level to `TRACE` to see them again.
|
||||
(PR [#4397](https://github.com/pipecat-ai/pipecat/pull/4397))
|
||||
|
||||
- Changed the default model for `GrokRealtimeLLMService` to
|
||||
`grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The
|
||||
previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is
|
||||
being removed.
|
||||
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
|
||||
|
||||
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to
|
||||
`inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`,
|
||||
`InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing
|
||||
users can pin the prior model explicitly via the `model`/`tts_model`
|
||||
argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid
|
||||
model IDs.
|
||||
(PR [#4422](https://github.com/pipecat-ai/pipecat/pull/4422))
|
||||
|
||||
- Changed the default model for `GrokLLMService` from `grok-3` to
|
||||
`grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
|
||||
(PR [#4429](https://github.com/pipecat-ai/pipecat/pull/4429))
|
||||
|
||||
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic:
|
||||
`max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms.
|
||||
This prevents false silence injections when large audio chunks are sent at
|
||||
lower frequency.
|
||||
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
|
||||
|
||||
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the
|
||||
turn is complete (on `on_turn_context_completed`) rather than waiting until
|
||||
all audio has finished playing back. The `isFinal` message from ElevenLabs is
|
||||
now used to signal `TTSStoppedFrame` and clean up the audio context,
|
||||
improving turn transition timing.
|
||||
(PR [#4433](https://github.com/pipecat-ai/pipecat/pull/4433))
|
||||
|
||||
- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio
|
||||
encoding by default, which returns audio bytes without headers.
|
||||
(PR [#4446](https://github.com/pipecat-ai/pipecat/pull/4446))
|
||||
|
||||
- Moved `create_task`, `cancel_task`, the `task_manager` property, and
|
||||
`setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom
|
||||
`BaseObject` subclasses (turn strategies, controllers, etc.) now inherit
|
||||
these methods directly instead of reimplementing the task manager wiring.
|
||||
Owners propagate the task manager to their child `BaseObject`s via `await
|
||||
child.setup(task_manager)`.
|
||||
(PR [#4449](https://github.com/pipecat-ai/pipecat/pull/4449))
|
||||
|
||||
- Changed the default OpenAI Realtime input audio transcription model from
|
||||
`gpt-4o-transcribe` to `gpt-realtime-whisper` for both
|
||||
`OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does
|
||||
not accept the `prompt` parameter; if a prompt is supplied alongside
|
||||
`gpt-realtime-whisper`, it is dropped automatically and a warning is logged.
|
||||
To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or
|
||||
`"gpt-4o-mini-transcribe"`).
|
||||
(PR [#4450](https://github.com/pipecat-ai/pipecat/pull/4450))
|
||||
|
||||
- Updated the default model for `CartesiaTTSService` and
|
||||
`CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
|
||||
(PR [#4462](https://github.com/pipecat-ai/pipecat/pull/4462))
|
||||
|
||||
- Changed the default model for `OpenAIRealtimeLLMService` from
|
||||
`gpt-realtime-1.5` to `gpt-realtime-2`.
|
||||
(PR [#4472](https://github.com/pipecat-ai/pipecat/pull/4472))
|
||||
|
||||
### Deprecated
|
||||
|
||||
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use
|
||||
`user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add
|
||||
`LLMTurnCompletionUserTurnStopStrategy` to a custom
|
||||
`user_turn_strategies.stop`) instead. Setting the legacy flag still works for
|
||||
one release: the aggregator emits a `DeprecationWarning` and rewires the
|
||||
strategies as if you had passed `FilterIncompleteUserTurnStrategies`
|
||||
directly.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the
|
||||
`create_file_resampler()` / `create_stream_resampler()` factories).
|
||||
Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class
|
||||
will be removed in Pipecat 2.0 along with the default `resampy` and `numba`
|
||||
dependencies.
|
||||
(PR [#4428](https://github.com/pipecat-ai/pipecat/pull/4428))
|
||||
|
||||
### Fixed
|
||||
|
||||
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as
|
||||
`ErrorFrame`s. The latest API emits a `flush_done` per transcript when
|
||||
server-side buffering is disabled; Pipecat now consumes them silently since
|
||||
each turn already has its own `context_id`.
|
||||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||||
|
||||
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`,
|
||||
`VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance
|
||||
(e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both
|
||||
the class and an instance.
|
||||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||||
|
||||
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200
|
||||
response — one with the API's error text and a second, less informative
|
||||
"Unknown error" frame from the outer exception handler. It now pushes a
|
||||
single frame that includes the HTTP status code and returns cleanly.
|
||||
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
|
||||
|
||||
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally
|
||||
for user turn stop strategies. It is now only imported when
|
||||
`default_user_turn_stop_strategies()` is called. This improves startup time
|
||||
and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning
|
||||
when the default stop strategies are not used.
|
||||
(PR [#4393](https://github.com/pipecat-ai/pipecat/pull/4393))
|
||||
|
||||
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was
|
||||
stored in `Settings` but never sent to xAI, so every session silently fell
|
||||
back to xAI's server-side default. The model is now passed via the `?model=`
|
||||
query parameter on the WebSocket URL as xAI's Voice Agent API requires.
|
||||
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
|
||||
|
||||
- Fixed `on_user_turn_stopped` firing prematurely when
|
||||
`filter_incomplete_user_turns` was enabled. The event now fires only after
|
||||
the LLM confirms the user turn is complete (`✓`); previously the smart-turn
|
||||
detector's tentative stop was bubbling up before the LLM had a chance to veto
|
||||
it, causing observers, transcript appenders and UI indicators to receive an
|
||||
early — and sometimes duplicated — signal.
|
||||
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
|
||||
|
||||
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting
|
||||
across two assistant messages in the LLM context and not surfacing in
|
||||
`on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted
|
||||
at the end of a TTS context now carries a PTS just past the last word so it
|
||||
can't overtake clock-queued `TTSTextFrame`s in the transport's output, and
|
||||
`LLMAssistantAggregator` now triggers
|
||||
`on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the
|
||||
frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting
|
||||
transcripts).
|
||||
(PR [#4414](https://github.com/pipecat-ai/pipecat/pull/4414))
|
||||
|
||||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged
|
||||
words (e.g. `bookLook`) when using Flash models. Flash often splits sentences
|
||||
mid-stream into alignment chunks that begin with a real inter-word space, but
|
||||
the previous fix unconditionally stripped that space from every chunk.
|
||||
Leading spaces are now stripped only on the first alignment chunk of an
|
||||
utterance, so subsequent chunks correctly flush partial words across
|
||||
boundaries.
|
||||
(PR [#4415](https://github.com/pipecat-ai/pipecat/pull/4415))
|
||||
|
||||
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor
|
||||
erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
|
||||
was set in the environment. The half-populated kwargs are no longer forwarded
|
||||
to aioboto3; partial env-var configurations now fall through to the boto3
|
||||
credential chain like fully-unset configurations do.
|
||||
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
|
||||
|
||||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing
|
||||
romanized/normalized text to the LLM context. With non-Latin input (e.g.,
|
||||
Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao
|
||||
!` instead of `你好!`), which then degraded subsequent LLM turns. The services
|
||||
now consume `alignment` by default and only switch to `normalizedAlignment` /
|
||||
`normalized_alignment` when `pronunciation_dictionary_locators` is configured
|
||||
(where `alignment` has overlapping restarts that produce duplicated/garbled
|
||||
words, per #4316). Both fields are read with preferred-with-fallback
|
||||
semantics since each is nullable per the API schema.
|
||||
(PR [#4424](https://github.com/pipecat-ai/pipecat/pull/4424))
|
||||
|
||||
- Fixed a deadlock in `TTSService` that could permanently stall pipeline
|
||||
processing when all three conditions occurred together:
|
||||
`pause_frame_processing=True`, an interruption arrived before any TTS audio
|
||||
was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`,
|
||||
`FunctionCallResultFrame`) was in the processing queue at that moment. The
|
||||
process task would block on `__process_event.wait()` indefinitely because
|
||||
`BotStoppedSpeakingFrame` never arrives (no audio was played) and the
|
||||
interruption handler did not resume processing. Affects services using
|
||||
`pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and
|
||||
ResembleAI.
|
||||
(PR [#4431](https://github.com/pipecat-ai/pipecat/pull/4431))
|
||||
|
||||
- Fixed interruptions being delayed when a slow non-uninterruptible frame was
|
||||
processing and an uninterruptible frame was waiting in the queue. The bot
|
||||
would stall until the slow frame finished instead of cancelling it
|
||||
immediately on interruption.
|
||||
(PR [#4434](https://github.com/pipecat-ai/pipecat/pull/4434))
|
||||
|
||||
- Fixed `TTSService` dropping uninterruptible frames (e.g.
|
||||
`FunctionCallResultFrame`) from its internal serialization queue when an
|
||||
interruption occurs. Previously, the queue was recreated on every
|
||||
interruption, silently discarding any queued frames. The queue is now reset
|
||||
instead of recreated, preserving uninterruptible frames so they are always
|
||||
delivered downstream.
|
||||
(PR [#4435](https://github.com/pipecat-ai/pipecat/pull/4435))
|
||||
|
||||
- Fixed a race condition in the Daily transport that caused `AttributeError:
|
||||
'NoneType' object has no attribute 'send_app_message'` when tearing down a
|
||||
pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the
|
||||
same `DailyTransportClient` and both call `cleanup()`, which was releasing
|
||||
the underlying `CallClient` on the first call — leaving the second caller
|
||||
with a `None` client.
|
||||
(PR [#4440](https://github.com/pipecat-ai/pipecat/pull/4440))
|
||||
|
||||
- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService`
|
||||
and `OpenAIRealtimeLLMService`. These services previously honored the flag by
|
||||
simply not cancelling in-flight function calls on interruption; the
|
||||
introduction of the new async-tool mechanism (which threads
|
||||
started/intermediate/final messages through the LLM context) broke that path
|
||||
because the realtime services didn't know how to interpret those messages.
|
||||
Note that new-style streamed intermediate results
|
||||
(`FunctionCallResultProperties(is_final=False)`) are not supported on these
|
||||
realtime services. Similar fixes for other impacted realtime services are
|
||||
forthcoming.
|
||||
(PR [#4441](https://github.com/pipecat-ai/pipecat/pull/4441))
|
||||
|
||||
- Fixed two misspelled Gemini TTS voice names in
|
||||
`GeminiTTSService.AVAILABLE_VOICES`.
|
||||
(PR [#4443](https://github.com/pipecat-ai/pipecat/pull/4443))
|
||||
|
||||
- Extended the `cancel_on_interruption=False` regression fix to
|
||||
`GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and
|
||||
`UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in
|
||||
#4441 (each service detects async-tool messages in the LLM context and routes
|
||||
the final result to its formal tool-result channel; Azure inherits
|
||||
transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different
|
||||
approach because its API freezes the conversation between
|
||||
`client_tool_invocation` and the matching `client_tool_result` — for
|
||||
async-registered functions it now ships a placeholder `client_tool_result`
|
||||
immediately when the function is invoked (to unfreeze the conversation), then
|
||||
injects the real result as user-side text once the tool finishes. Streamed
|
||||
intermediate results (`FunctionCallResultProperties(is_final=False)`) are
|
||||
still not supported on any of these realtime services. `GeminiLiveLLMService`
|
||||
and `InworldRealtimeLLMService` are excluded for now: Gemini Live's
|
||||
async-tool path needs deeper investigation, and Inworld tool calling needs to
|
||||
be sorted out first.
|
||||
(PR [#4447](https://github.com/pipecat-ai/pipecat/pull/4447))
|
||||
|
||||
- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses
|
||||
(observed with `gpt-realtime-2`). A single response can now contain more than
|
||||
one audio item, and the first item's `audio.done` may arrive after the second
|
||||
item's deltas have started. Deltas still arrive strictly in playback order,
|
||||
so we continue to forward them as received (matching OpenAI's reference
|
||||
implementation). The fix removes spurious warnings, ensures truncation always
|
||||
targets the latest audio item, and emits a single bracketing
|
||||
`TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is
|
||||
now pushed on `response.done`).
|
||||
(PR [#4465](https://github.com/pipecat-ai/pipecat/pull/4465))
|
||||
|
||||
- Fixed missing `output` attribute on LLM OpenTelemetry spans when the LLM call
|
||||
is interrupted mid-stream.
|
||||
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
|
||||
|
||||
- Fixed incorrect `metrics.ttfb` on STT OpenTelemetry spans, and parented them
|
||||
to the current turn span.
|
||||
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
|
||||
|
||||
- Fixed incorrect `metrics.ttfb` on TTS OpenTelemetry spans for streaming
|
||||
services.
|
||||
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
|
||||
|
||||
- Extended the `cancel_on_interruption=False` regression fix to
|
||||
`InworldRealtimeLLMService`. Uses the same approach as in #4441 (the service
|
||||
detects async-tool messages in the LLM context and routes the final result to
|
||||
its formal tool-result channel). Note: as of this writing, Inworld Realtime
|
||||
doesn't appear to handle the resulting delayed tool result reliably — the
|
||||
routing is best-effort and the service surfaces a one-time warning when
|
||||
async-tool messages are seen. Streamed intermediate results
|
||||
(`FunctionCallResultProperties(is_final=False)`) are still not supported on
|
||||
this realtime service. (Inworld was excluded from #4447 pending resolution of
|
||||
an unrelated tool-calling issue, which turned out to be an account-level
|
||||
matter.)
|
||||
(PR [#4474](https://github.com/pipecat-ai/pipecat/pull/4474))
|
||||
|
||||
- Fixed Cartesia TTS Korean word timestamps to use normal spacing rules,
|
||||
preserving word boundaries and per-word timestamp alignment during downstream
|
||||
aggregation.
|
||||
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
|
||||
|
||||
- Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve
|
||||
provider text spacing, avoiding artificial spaces when timestamp groups are
|
||||
reassembled downstream.
|
||||
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
|
||||
|
||||
- Fixed `SonioxSTTService` final transcription frames missing detected language
|
||||
metadata when Soniox returns token-level language annotations.
|
||||
(PR [#4482](https://github.com/pipecat-ai/pipecat/pull/4482))
|
||||
|
||||
- Fixed Soniox final transcription language detection to use the most common
|
||||
recognized token language, avoiding mislabeling an utterance when the last
|
||||
token is tagged with a different language.
|
||||
(PR [#4495](https://github.com/pipecat-ai/pipecat/pull/4495))
|
||||
|
||||
- Fixed dropped audio in streaming TTS services whose wire protocol doesn't
|
||||
echo `context_id` back on incoming audio (Sarvam, Smallest, Soniox, Inworld,
|
||||
and others). Previously, audio that arrived between contexts or at the very
|
||||
start of a turn was tagged with `context_id=None` and silently dropped with
|
||||
an "unable to append audio to context: no context ID provided" debug log.
|
||||
`TTSService.get_active_audio_context_id()` now falls back to the
|
||||
synthesis-side `_turn_context_id` when the playback cursor isn't set yet.
|
||||
(PR [#4497](https://github.com/pipecat-ai/pipecat/pull/4497))
|
||||
|
||||
### Security
|
||||
|
||||
- Fixed a path traversal issue in the development runner's
|
||||
`/files/{filename:path}` download endpoint. Previously, when the runner was
|
||||
started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could
|
||||
escape the configured folder because `%2F`-encoded separators bypassed
|
||||
Starlette's path normalisation. The endpoint now resolves the joined path and
|
||||
rejects any filename that escapes the allowed base with a 403, and also
|
||||
returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
|
||||
(PR [#4417](https://github.com/pipecat-ai/pipecat/pull/4417))
|
||||
|
||||
## [1.1.0] - 2026-04-27
|
||||
|
||||
### Added
|
||||
|
||||
@@ -92,10 +92,10 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
|
||||
| Category | Services |
|
||||
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
|
||||
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
|
||||
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
|
||||
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
|
||||
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
|
||||
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
|
||||
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
|
||||
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |
|
||||
|
||||
1
changelog/4052.added.md
Normal file
1
changelog/4052.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.
|
||||
1
changelog/4306.fixed.md
Normal file
1
changelog/4306.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
|
||||
1
changelog/4380.fixed.2.md
Normal file
1
changelog/4380.fixed.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
|
||||
1
changelog/4380.fixed.3.md
Normal file
1
changelog/4380.fixed.3.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
|
||||
1
changelog/4380.fixed.4.md
Normal file
1
changelog/4380.fixed.4.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
|
||||
1
changelog/4380.fixed.md
Normal file
1
changelog/4380.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
|
||||
@@ -1 +0,0 @@
|
||||
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned a `sessionId` to the caller (Daily `/start`, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC `/api/offer` endpoint also accepts an optional `session_id` query parameter so the `/sessions/{session_id}/...` proxy can thread it through.
|
||||
@@ -1 +0,0 @@
|
||||
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the generally available `tts-rt-v1`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService` for controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE` mode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to override.
|
||||
@@ -1 +0,0 @@
|
||||
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16` to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the `use_normalized_timestamps` and `max_buffer_delay_ms` fields.
|
||||
@@ -1 +0,0 @@
|
||||
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead of the deprecated `use_original_timestamps` field. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change for `sonic-3` users, who were previously receiving timestamps tied to the input transcript.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`, `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both the class and an instance.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as `ErrorFrame`s. The latest API emits a `flush_done` per transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its own `context_id`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally for user turn stop strategies. It is now only imported when `default_user_turn_stop_strategies()` is called. This improves startup time and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used.
|
||||
@@ -1 +0,0 @@
|
||||
- Broadened `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Three changes: a rename (`tool_resources` → `app_resources`), a new `app_resources` property on `PipelineTask`, and a new `pipeline_task` property on `FrameProcessor`. Tool handlers now read `params.app_resources`; custom processors read `self.pipeline_task.app_resources`. The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.
|
||||
@@ -1 +0,0 @@
|
||||
- Lowered the per-message log in `SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App messages can be high-frequency and were noisy at debug level; set the loguru level to `TRACE` to see them again.
|
||||
@@ -1 +0,0 @@
|
||||
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults to `None`, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default model for `GrokRealtimeLLMService` to `grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is being removed.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was stored in `Settings` but never sent to xAI, so every session silently fell back to xAI's server-side default. The model is now passed via the `?model=` query parameter on the WebSocket URL as xAI's Voice Agent API requires.
|
||||
@@ -1 +0,0 @@
|
||||
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that appends a developer-role message to the context whenever `LLMSetToolsFrame` changes the set of advertised standard tools. Helps the LLM stay coherent across mid-conversation tool changes, mitigating several flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, and hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`. When installed, the strategy gates `on_user_turn_stopped` on a `UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by any component that can judge turn completeness — e.g. the `UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout` provides a safety net if no completion frame ever arrives.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in `pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the inference-triggered event and suppresses `on_user_turn_stopped`, leaving finalization to another strategy in the chain such as `LLMTurnCompletionUserTurnStopStrategy`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `FilterIncompleteUserTurnStrategies` in `pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization that wraps the detector chain with `deferred(...)` and appends `LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case: `user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass `config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` — a generic stop strategy that finalizes the user turn whenever a `UserTurnInferenceCompletedFrame` arrives, regardless of which component produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base; future producers (Flux, custom end-of-turn classifiers, etc.) can use the base directly or subclass it to add producer-specific setup.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `on_user_turn_inference_triggered`, a new event on the user turn controller, processor, aggregator and stop strategies that fires when a strategy has enough signal to start LLM inference. By default it fires together with `on_user_turn_stopped`; a gating strategy can fire only the inference-triggered event and defer finalization to a peer.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use `user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add `LLMTurnCompletionUserTurnStopStrategy` to a custom `user_turn_strategies.stop`) instead. Setting the legacy flag still works for one release: the aggregator emits a `DeprecationWarning` and rewires the strategies as if you had passed `FilterIncompleteUserTurnStrategies` directly.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `on_user_turn_stopped` firing prematurely when `filter_incomplete_user_turns` was enabled. The event now fires only after the LLM confirms the user turn is complete (`✓`); previously the smart-turn detector's tentative stop was bubbling up before the LLM had a chance to veto it, causing observers, transcript appenders and UI indicators to receive an early — and sometimes duplicated — signal.
|
||||
@@ -1,6 +0,0 @@
|
||||
- Added first-class RTVI support for the UI Agent Protocol:
|
||||
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server messages, plus `ui-command` and `ui-task` server-to-client messages, with paired `*Data` / `*Message` pydantic models.
|
||||
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`, `Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching default handlers live in `@pipecat-ai/client-react`.
|
||||
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`, and `ui-cancel-task` messages.
|
||||
- Adds five UI pipeline frames, mirroring the `client-message` frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` / `RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` / `UITaskMessage` envelopes, while the processor pushes inbound `RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame` alongside `on_ui_message`.
|
||||
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting across two assistant messages in the LLM context and not surfacing in `on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted at the end of a TTS context now carries a PTS just past the last word so it can't overtake clock-queued `TTSTextFrame`s in the transport's output, and `LLMAssistantAggregator` now triggers `on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting transcripts).
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged words (e.g. `bookLook`) when using Flash models. Flash often splits sentences mid-stream into alignment chunks that begin with a real inter-word space, but the previous fix unconditionally stripped that space from every chunk. Leading spaces are now stripped only on the first alignment chunk of an utterance, so subsequent chunks correctly flush partial words across boundaries.
|
||||
@@ -1 +0,0 @@
|
||||
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor now resolve credentials via the standard boto3 provider chain (EC2 instance profiles, EKS pod roles / IRSA, ECS task roles, SSO, `~/.aws/credentials`) when explicit credentials and `AWS_*` environment variables are absent. Services running with IAM roles no longer need to export static credentials.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` was set in the environment. The half-populated kwargs are no longer forwarded to aioboto3; partial env-var configurations now fall through to the boto3 credential chain like fully-unset configurations do.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed a path traversal issue in the development runner's `/files/{filename:path}` download endpoint. Previously, when the runner was started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could escape the configured folder because `%2F`-encoded separators bypassed Starlette's path normalisation. The endpoint now resolves the joined path and rejects any filename that escapes the allowed base with a 403, and also returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to `inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`, `InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing users can pin the prior model explicitly via the `model`/`tts_model` argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid model IDs.
|
||||
1
changelog/4423.added.md
Normal file
1
changelog/4423.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing romanized/normalized text to the LLM context. With non-Latin input (e.g., Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao !` instead of `你好!`), which then degraded subsequent LLM turns. The services now consume `alignment` by default and only switch to `normalizedAlignment` / `normalized_alignment` when `pronunciation_dictionary_locators` is configured (where `alignment` has overlapping restarts that produce duplicated/garbled words, per #4316). Both fields are read with preferred-with-fallback semantics since each is nullable per the API schema.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can bias transcription for both file-based and realtime transcription.
|
||||
@@ -1 +0,0 @@
|
||||
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the `create_file_resampler()` / `create_stream_resampler()` factories). Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class will be removed in Pipecat 2.0 along with the default `resampy` and `numba` dependencies.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default model for `GrokLLMService` from `grok-3` to `grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and `DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum silence duration before the watchdog sends a silence packet to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`, so it also adapts automatically to the audio chunk size in use.
|
||||
@@ -1 +0,0 @@
|
||||
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic: `max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms. This prevents false silence injections when large audio chunks are sent at lower frequency.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed a deadlock in `TTSService` that could permanently stall pipeline processing when all three conditions occurred together: `pause_frame_processing=True`, an interruption arrived before any TTS audio was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`, `FunctionCallResultFrame`) was in the processing queue at that moment. The process task would block on `__process_event.wait()` indefinitely because `BotStoppedSpeakingFrame` never arrives (no audio was played) and the interruption handler did not resume processing. Affects services using `pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and ResembleAI.
|
||||
@@ -1 +0,0 @@
|
||||
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the turn is complete (on `on_turn_context_completed`) rather than waiting until all audio has finished playing back. The `isFinal` message from ElevenLabs is now used to signal `TTSStoppedFrame` and clean up the audio context, improving turn transition timing.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed interruptions being delayed when a slow non-uninterruptible frame was processing and an uninterruptible frame was waiting in the queue. The bot would stall until the slow frame finished instead of cancelling it immediately on interruption.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `TTSService` dropping uninterruptible frames (e.g. `FunctionCallResultFrame`) from its internal serialization queue when an interruption occurs. Previously, the queue was recreated on every interruption, silently discarding any queued frames. The queue is now reset instead of recreated, preserving uninterruptible frames so they are always delivered downstream.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed a race condition in the Daily transport that caused `AttributeError: 'NoneType' object has no attribute 'send_app_message'` when tearing down a pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the same `DailyTransportClient` and both call `cleanup()`, which was releasing the underlying `CallClient` on the first call — leaving the second caller with a `None` client.
|
||||
@@ -1 +0,0 @@
|
||||
- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService` and `OpenAIRealtimeLLMService`. These services previously honored the flag by simply not cancelling in-flight function calls on interruption; the introduction of the new async-tool mechanism (which threads started/intermediate/final messages through the LLM context) broke that path because the realtime services didn't know how to interpret those messages. Note that new-style streamed intermediate results (`FunctionCallResultProperties(is_final=False)`) are not supported on these realtime services. Similar fixes for other impacted realtime services are forthcoming.
|
||||
1
changelog/4442.added.2.md
Normal file
1
changelog/4442.added.2.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).
|
||||
1
changelog/4442.added.md
Normal file
1
changelog/4442.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.
|
||||
1
changelog/4442.changed.md
Normal file
1
changelog/4442.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed two misspelled Gemini TTS voice names in `GeminiTTSService.AVAILABLE_VOICES`.
|
||||
@@ -1 +0,0 @@
|
||||
- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio encoding by default, which returns audio bytes without headers.
|
||||
@@ -1 +0,0 @@
|
||||
- Extended the `cancel_on_interruption=False` regression fix to `GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and `UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in #4441 (each service detects async-tool messages in the LLM context and routes the final result to its formal tool-result channel; Azure inherits transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different approach because its API freezes the conversation between `client_tool_invocation` and the matching `client_tool_result` — for async-registered functions it now ships a placeholder `client_tool_result` immediately when the function is invoked (to unfreeze the conversation), then injects the real result as user-side text once the tool finishes. Streamed intermediate results (`FunctionCallResultProperties(is_final=False)`) are still not supported on any of these realtime services. `GeminiLiveLLMService` and `InworldRealtimeLLMService` are excluded for now: Gemini Live's async-tool path needs deeper investigation, and Inworld appears to have a pre-existing problem with even simple tool calling on its Realtime API.
|
||||
@@ -1 +0,0 @@
|
||||
- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini 2.x); the conversation now continues while the tool runs. On models that don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time warning explaining the limitation. (Note: an intermittent 1008 error can occasionally fire on Gemini 2.5 during long-running tool calls; we auto-reconnect.)
|
||||
@@ -1 +0,0 @@
|
||||
- Moved `create_task`, `cancel_task`, the `task_manager` property, and `setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom `BaseObject` subclasses (turn strategies, controllers, etc.) now inherit these methods directly instead of reimplementing the task manager wiring. Owners propagate the task manager to their child `BaseObject`s via `await child.setup(task_manager)`.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default OpenAI Realtime input audio transcription model from `gpt-4o-transcribe` to `gpt-realtime-whisper` for both `OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does not accept the `prompt` parameter; if a prompt is supplied alongside `gpt-realtime-whisper`, it is dropped automatically and a warning is logged. To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or `"gpt-4o-mini-transcribe"`).
|
||||
@@ -1 +0,0 @@
|
||||
- Updated the default model for `CartesiaTTSService` and `CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
|
||||
@@ -1 +0,0 @@
|
||||
- Added NVIDIA Magpie TTS services via AWS SageMaker: `NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream with full interruption support via `InterruptibleTTSService`).
|
||||
@@ -1 +0,0 @@
|
||||
- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint. Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is VAD-aware, and automatically reconnects on error.
|
||||
@@ -1 +0,0 @@
|
||||
- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses (observed with `gpt-realtime-2`). A single response can now contain more than one audio item, and the first item's `audio.done` may arrive after the second item's deltas have started. Deltas still arrive strictly in playback order, so we continue to forward them as received (matching OpenAI's reference implementation). The fix removes spurious warnings, ensures truncation always targets the latest audio item, and emits a single bracketing `TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is now pushed on `response.done`).
|
||||
@@ -1 +0,0 @@
|
||||
- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`, for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
|
||||
@@ -1 +0,0 @@
|
||||
- Changed the default model for `OpenAIRealtimeLLMService` from `gpt-realtime-1.5` to `gpt-realtime-2`.
|
||||
1
changelog/4507.fixed.md
Normal file
1
changelog/4507.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.
|
||||
1
changelog/4514.fixed.md
Normal file
1
changelog/4514.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.
|
||||
1
changelog/4521.added.md
Normal file
1
changelog/4521.added.md
Normal file
@@ -0,0 +1 @@
|
||||
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
|
||||
1
changelog/4521.changed.md
Normal file
1
changelog/4521.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.
|
||||
1
changelog/4521.removed.md
Normal file
1
changelog/4521.removed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.
|
||||
1
changelog/4522.changed.md
Normal file
1
changelog/4522.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.
|
||||
1
changelog/4524.changed.md
Normal file
1
changelog/4524.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.
|
||||
1
changelog/4524.fixed.md
Normal file
1
changelog/4524.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.
|
||||
1
changelog/4527.fixed.md
Normal file
1
changelog/4527.fixed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.
|
||||
1
changelog/4531.changed.md
Normal file
1
changelog/4531.changed.md
Normal file
@@ -0,0 +1 @@
|
||||
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.
|
||||
@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
|
||||
HUME_API_KEY=...
|
||||
HUME_VOICE_ID=...
|
||||
|
||||
# Inception
|
||||
INCEPTION_API_KEY=...
|
||||
|
||||
# Inworld
|
||||
INWORLD_API_KEY=...
|
||||
|
||||
@@ -211,6 +214,11 @@ TWILIO_AUTH_TOKEN=...
|
||||
# Ultravox Realtime
|
||||
ULTRAVOX_API_KEY=...
|
||||
|
||||
# Vonage
|
||||
VONAGE_APPLICATION_ID=...
|
||||
VONAGE_SESSION_ID=...
|
||||
VONAGE_TOKEN=...
|
||||
|
||||
# WhatsApp
|
||||
WHATSAPP_TOKEN=...
|
||||
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
|
||||
|
||||
177
examples/function-calling/function-calling-inception.py
Normal file
177
examples/function-calling/function-calling-inception.py
Normal file
@@ -0,0 +1,177 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.adapters.schemas.function_schema import FunctionSchema
|
||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.frames.frames import LLMRunFrame, TTSSpeakFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
from pipecat.services.inception.llm import InceptionLLMService
|
||||
from pipecat.services.llm_service import FunctionCallParams
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
|
||||
async def fetch_weather_from_api(params: FunctionCallParams):
|
||||
await params.result_callback({"conditions": "nice", "temperature": "75"})
|
||||
|
||||
|
||||
async def fetch_restaurant_recommendation(params: FunctionCallParams):
|
||||
await params.result_callback({"name": "The Golden Dragon"})
|
||||
|
||||
|
||||
# We use lambdas to defer transport parameter creation until the transport
|
||||
# type is selected at runtime.
|
||||
transport_params = {
|
||||
"daily": lambda: DailyParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"twilio": lambda: FastAPIWebsocketParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"webrtc": lambda: TransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
logger.info(f"Starting bot")
|
||||
|
||||
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
settings=CartesiaTTSService.Settings(
|
||||
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
),
|
||||
)
|
||||
|
||||
llm = InceptionLLMService(
|
||||
api_key=os.environ["INCEPTION_API_KEY"],
|
||||
settings=InceptionLLMService.Settings(
|
||||
reasoning_effort="instant",
|
||||
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
|
||||
),
|
||||
)
|
||||
# You can also register a function_name of None to get all functions
|
||||
# sent to the same callback with an additional function_name parameter.
|
||||
llm.register_function("get_current_weather", fetch_weather_from_api)
|
||||
llm.register_function("get_restaurant_recommendation", fetch_restaurant_recommendation)
|
||||
|
||||
@llm.event_handler("on_function_calls_started")
|
||||
async def on_function_calls_started(service, function_calls):
|
||||
await tts.queue_frame(TTSSpeakFrame("Let me check on that."))
|
||||
|
||||
weather_function = FunctionSchema(
|
||||
name="get_current_weather",
|
||||
description="Get the current weather",
|
||||
properties={
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
"format": {
|
||||
"type": "string",
|
||||
"enum": ["celsius", "fahrenheit"],
|
||||
"description": "The temperature unit to use. Infer this from the user's location.",
|
||||
},
|
||||
},
|
||||
required=["location", "format"],
|
||||
)
|
||||
|
||||
restaurant_function = FunctionSchema(
|
||||
name="get_restaurant_recommendation",
|
||||
description="Get a restaurant recommendation",
|
||||
properties={
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
},
|
||||
required=["location"],
|
||||
)
|
||||
tools = ToolsSchema(standard_tools=[weather_function, restaurant_function])
|
||||
|
||||
context = LLMContext(tools=tools)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(),
|
||||
stt,
|
||||
user_aggregator,
|
||||
llm,
|
||||
tts,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
logger.info(f"Client connected")
|
||||
# Kick off the conversation.
|
||||
context.add_message(
|
||||
{"role": "developer", "content": "Please introduce yourself to the user."}
|
||||
)
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
async def on_client_disconnected(transport, client):
|
||||
logger.info(f"Client disconnected")
|
||||
await task.cancel()
|
||||
|
||||
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
"""Main bot entry point compatible with Pipecat Cloud."""
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
await run_bot(transport, runner_args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from pipecat.runner.run import main
|
||||
|
||||
main()
|
||||
@@ -68,9 +68,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
tts = OpenAITTSService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
settings=OpenAITTSService.Settings(
|
||||
instructions="Please speak clearly and at a moderate pace.",
|
||||
voice="ballad",
|
||||
),
|
||||
instructions="Please speak clearly and at a moderate pace.",
|
||||
)
|
||||
|
||||
llm = OpenAILLMService(
|
||||
|
||||
@@ -71,8 +71,8 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
|
||||
llm = QwenLLMService(
|
||||
api_key=os.environ["QWEN_API_KEY"],
|
||||
model="qwen2.5-72b-instruct",
|
||||
settings=QwenLLMService.Settings(
|
||||
model="qwen2.5-72b-instruct",
|
||||
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
|
||||
),
|
||||
)
|
||||
|
||||
@@ -28,10 +28,14 @@ Usage:
|
||||
"""
|
||||
|
||||
import os
|
||||
import random
|
||||
from datetime import datetime
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.adapters.schemas.function_schema import FunctionSchema
|
||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.frames.frames import LLMRunFrame
|
||||
from pipecat.observers.loggers.transcription_log_observer import (
|
||||
TranscriptionLogObserver,
|
||||
@@ -48,6 +52,7 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.inworld.realtime.llm import InworldRealtimeLLMService
|
||||
from pipecat.services.llm_service import FunctionCallParams
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
@@ -55,6 +60,43 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
load_dotenv(override=True)
|
||||
|
||||
|
||||
async def fetch_weather_from_api(params: FunctionCallParams):
|
||||
temperature = (
|
||||
random.randint(60, 85)
|
||||
if params.arguments["format"] == "fahrenheit"
|
||||
else random.randint(15, 30)
|
||||
)
|
||||
await params.result_callback(
|
||||
{
|
||||
"conditions": "nice",
|
||||
"temperature": temperature,
|
||||
"location": params.arguments["location"],
|
||||
"format": params.arguments["format"],
|
||||
"timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
weather_function = FunctionSchema(
|
||||
name="get_current_weather",
|
||||
description="Get the current weather",
|
||||
properties={
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and state, e.g. San Francisco, CA",
|
||||
},
|
||||
"format": {
|
||||
"type": "string",
|
||||
"enum": ["celsius", "fahrenheit"],
|
||||
"description": "The temperature unit to use. Infer this from the users location.",
|
||||
},
|
||||
},
|
||||
required=["location", "format"],
|
||||
)
|
||||
|
||||
tools = ToolsSchema(standard_tools=[weather_function])
|
||||
|
||||
|
||||
# --- Transport Configuration ---
|
||||
|
||||
# No local VAD needed — Inworld's server-side semantic VAD handles turn detection.
|
||||
@@ -85,7 +127,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
# See: https://docs.inworld.ai/router/introduction
|
||||
llm = InworldRealtimeLLMService(
|
||||
api_key=os.environ["INWORLD_API_KEY"],
|
||||
llm_model="xai/grok-4-1-fast-non-reasoning",
|
||||
llm_model="openai/gpt-4.1-mini",
|
||||
voice="Sarah",
|
||||
settings=InworldRealtimeLLMService.Settings(
|
||||
system_instruction="""You are a helpful and friendly AI assistant powered by Inworld.
|
||||
@@ -97,9 +139,14 @@ Always be helpful and proactive in offering assistance.""",
|
||||
),
|
||||
)
|
||||
|
||||
# Create context with initial message
|
||||
# Note: function calling requires a paid Inworld account and a
|
||||
# function-calling-capable model
|
||||
llm.register_function("get_current_weather", fetch_weather_from_api)
|
||||
|
||||
# Create context with initial message + tools
|
||||
context = LLMContext(
|
||||
[{"role": "developer", "content": "Say hello and introduce yourself!"}],
|
||||
tools,
|
||||
)
|
||||
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(context)
|
||||
|
||||
@@ -51,7 +51,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
|
||||
stt = GradiumSTTService(
|
||||
api_key=os.environ["GRADIUM_API_KEY"],
|
||||
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
|
||||
settings=GradiumSTTService.Settings(
|
||||
language=Language.EN,
|
||||
delay_in_frames=8,
|
||||
|
||||
134
examples/transports/transports-vonage.py
Normal file
134
examples/transports/transports-vonage.py
Normal file
@@ -0,0 +1,134 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Example of using OpenAI Realtime voice LLM service with Vonage Video Connector transport."""
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
from collections.abc import Callable
|
||||
from typing import Any
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.frames.frames import LLMRunFrame
|
||||
from pipecat.observers.loggers.transcription_log_observer import TranscriptionLogObserver
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
)
|
||||
from pipecat.runner.vonage import configure
|
||||
from pipecat.services.openai.realtime.events import (
|
||||
AudioConfiguration,
|
||||
AudioInput,
|
||||
InputAudioNoiseReduction,
|
||||
InputAudioTranscription,
|
||||
SemanticTurnDetection,
|
||||
SessionProperties,
|
||||
)
|
||||
from pipecat.services.openai.realtime.llm import OpenAIRealtimeLLMService
|
||||
from pipecat.transports.vonage.video_connector import (
|
||||
VonageVideoConnectorTransport,
|
||||
VonageVideoConnectorTransportParams,
|
||||
)
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
logger.remove(0)
|
||||
logger.add(sys.stderr, level="DEBUG")
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
"""Main entry point for the OpenAI Realtime vonage video connector example."""
|
||||
(application_id, session_id, token) = await configure()
|
||||
|
||||
transport = VonageVideoConnectorTransport(
|
||||
application_id,
|
||||
session_id,
|
||||
token,
|
||||
VonageVideoConnectorTransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
publisher_name="Bot",
|
||||
),
|
||||
)
|
||||
|
||||
llm = OpenAIRealtimeLLMService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
settings=OpenAIRealtimeLLMService.Settings(
|
||||
system_instruction="""You are a helpful and friendly AI.
|
||||
|
||||
Act like a human, but remember that you aren't a human and that you can't do human
|
||||
things in the real world. Your voice and personality should be warm and engaging, with a lively and
|
||||
playful tone.
|
||||
|
||||
If interacting in a non-English language, start by using the standard accent or dialect familiar to
|
||||
the user. Talk quickly.
|
||||
|
||||
You are participating in a voice conversation. Keep your responses concise, short, and to the point
|
||||
unless specifically asked to elaborate on a topic.
|
||||
|
||||
Remember, your responses should be short. Just one or two sentences, usually. Respond in English.""",
|
||||
session_properties=SessionProperties(
|
||||
audio=AudioConfiguration(
|
||||
input=AudioInput(
|
||||
transcription=InputAudioTranscription(),
|
||||
turn_detection=SemanticTurnDetection(),
|
||||
noise_reduction=InputAudioNoiseReduction(type="near_field"),
|
||||
)
|
||||
),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
context = LLMContext(
|
||||
[{"role": "developer", "content": "Say hello!"}],
|
||||
)
|
||||
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(),
|
||||
user_aggregator,
|
||||
llm,
|
||||
transport.output(),
|
||||
assistant_aggregator,
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
observers=[TranscriptionLogObserver()],
|
||||
)
|
||||
|
||||
event_handler: Callable[[str], Callable[[Any], Any]] = transport.event_handler
|
||||
|
||||
@event_handler("on_client_connected")
|
||||
async def on_client_connected(transport: VonageVideoConnectorTransport, client: object) -> None:
|
||||
logger.info("Client connected")
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
runner = PipelineRunner()
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -0,0 +1,201 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Example 22: Filter Incomplete Turns
|
||||
|
||||
Demonstrates LLM-based turn completion detection to suppress bot responses when
|
||||
the user was cut off mid-thought. The LLM outputs one of three markers:
|
||||
- ✓ (complete): User finished their thought, respond normally
|
||||
- ○ (incomplete short): User was cut off, wait ~5s then prompt
|
||||
- ◐ (incomplete long): User needs time to think, wait ~10s then prompt
|
||||
|
||||
When incomplete is detected, the bot's response is suppressed. After the timeout
|
||||
expires, the LLM is automatically prompted to re-engage the user.
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from loguru import logger
|
||||
|
||||
from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.audio.vad.silero import SileroVADAnalyzer
|
||||
from pipecat.frames.frames import LLMRunFrame
|
||||
from pipecat.pipeline.pipeline import Pipeline
|
||||
from pipecat.pipeline.runner import PipelineRunner
|
||||
from pipecat.pipeline.task import PipelineParams, PipelineTask
|
||||
from pipecat.processors.aggregators.llm_context import LLMContext
|
||||
from pipecat.processors.aggregators.llm_response_universal import (
|
||||
AssistantTurnStoppedMessage,
|
||||
LLMContextAggregatorPair,
|
||||
LLMUserAggregatorParams,
|
||||
UserTurnStoppedMessage,
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.deepgram.stt import DeepgramSTTService
|
||||
from pipecat.services.llm_service import FunctionCallParams
|
||||
from pipecat.services.openai.llm import OpenAILLMService
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
|
||||
from pipecat.turns.user_turn_strategies import FilterIncompleteUserTurnStrategies
|
||||
|
||||
load_dotenv(override=True)
|
||||
|
||||
|
||||
# We use lambdas to defer transport parameter creation until the transport
|
||||
# type is selected at runtime.
|
||||
transport_params = {
|
||||
"daily": lambda: DailyParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"twilio": lambda: FastAPIWebsocketParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
"webrtc": lambda: TransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True,
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
async def get_weather(params: FunctionCallParams, location: str):
|
||||
"""Return the current weather for a location.
|
||||
|
||||
A stub that always reports the same conditions — replace with a real
|
||||
weather API in production.
|
||||
|
||||
Args:
|
||||
location (str): The city and state or country, e.g. "Paris, France".
|
||||
"""
|
||||
await params.result_callback(
|
||||
{
|
||||
"location": location,
|
||||
"temperature_celsius": 22,
|
||||
"conditions": "partly cloudy",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
logger.info(f"Starting bot")
|
||||
|
||||
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
|
||||
|
||||
llm = OpenAILLMService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
settings=OpenAILLMService.Settings(
|
||||
system_instruction=(
|
||||
"You are a helpful assistant in a voice conversation. Your "
|
||||
"responses will be spoken aloud, so avoid emojis, bullet "
|
||||
"points, or other formatting that can't be spoken. Respond to "
|
||||
"what the user said in a creative, helpful, and brief way. "
|
||||
"If the user asks about the weather, call the get_weather "
|
||||
"tool and speak the result back naturally."
|
||||
),
|
||||
),
|
||||
)
|
||||
llm.register_direct_function(get_weather)
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
settings=CartesiaTTSService.Settings(
|
||||
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
),
|
||||
)
|
||||
|
||||
context = LLMContext(tools=ToolsSchema(standard_tools=[get_weather]))
|
||||
# `FilterIncompleteUserTurnStrategies` pairs the default detector
|
||||
# chain with `LLMTurnCompletionUserTurnStopStrategy`: detectors
|
||||
# trigger LLM inference but the public `on_user_turn_stopped` event
|
||||
# fires only when the LLM confirms ✓. The LLM marks each response
|
||||
# with one of:
|
||||
# ✓ = complete (respond normally)
|
||||
# ○ = incomplete short (wait 5s, then prompt)
|
||||
# ◐ = incomplete long (wait 10s, then prompt)
|
||||
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
|
||||
context,
|
||||
user_params=LLMUserAggregatorParams(
|
||||
vad_analyzer=SileroVADAnalyzer(),
|
||||
user_turn_strategies=FilterIncompleteUserTurnStrategies(
|
||||
# Optional: customize turn completion behavior
|
||||
# config=UserTurnCompletionConfig(
|
||||
# incomplete_short_timeout=5.0,
|
||||
# incomplete_long_timeout=10.0,
|
||||
# incomplete_short_prompt="Custom prompt...",
|
||||
# incomplete_long_prompt="Custom prompt...",
|
||||
# instructions="Custom turn completion instructions...",
|
||||
# ),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
pipeline = Pipeline(
|
||||
[
|
||||
transport.input(), # Transport user input
|
||||
stt,
|
||||
user_aggregator, # User responses
|
||||
llm, # LLM
|
||||
tts, # TTS
|
||||
transport.output(), # Transport bot output
|
||||
assistant_aggregator, # Assistant spoken responses
|
||||
]
|
||||
)
|
||||
|
||||
task = PipelineTask(
|
||||
pipeline,
|
||||
params=PipelineParams(
|
||||
enable_metrics=True,
|
||||
enable_usage_metrics=True,
|
||||
),
|
||||
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_connected")
|
||||
async def on_client_connected(transport, client):
|
||||
logger.info(f"Client connected")
|
||||
# Kick off the conversation.
|
||||
context.add_message(
|
||||
{"role": "developer", "content": "Please introduce yourself to the user."}
|
||||
)
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
async def on_client_disconnected(transport, client):
|
||||
logger.info(f"Client disconnected")
|
||||
await task.cancel()
|
||||
|
||||
@user_aggregator.event_handler("on_user_turn_stopped")
|
||||
async def on_user_turn_stopped(aggregator, strategy, message: UserTurnStoppedMessage):
|
||||
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
|
||||
line = f"{timestamp}user: {message.content}"
|
||||
logger.info(f"Transcript: {line}")
|
||||
|
||||
@assistant_aggregator.event_handler("on_assistant_turn_stopped")
|
||||
async def on_assistant_turn_stopped(aggregator, message: AssistantTurnStoppedMessage):
|
||||
timestamp = f"[{message.timestamp}] " if message.timestamp else ""
|
||||
line = f"{timestamp}assistant: {message.content}"
|
||||
logger.info(f"Transcript: {line}")
|
||||
|
||||
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
|
||||
|
||||
await runner.run(task)
|
||||
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
"""Main bot entry point compatible with Pipecat Cloud."""
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
await run_bot(transport, runner_args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
from pipecat.runner.run import main
|
||||
|
||||
main()
|
||||
@@ -50,10 +50,7 @@ transport_params = {
|
||||
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
logger.info(f"Starting bot")
|
||||
|
||||
stt = GradiumSTTService(
|
||||
api_key=os.environ["GRADIUM_API_KEY"],
|
||||
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
|
||||
)
|
||||
stt = GradiumSTTService(api_key=os.environ["GRADIUM_API_KEY"])
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
|
||||
@@ -22,9 +22,9 @@ from pipecat.processors.aggregators.llm_response_universal import (
|
||||
)
|
||||
from pipecat.runner.types import RunnerArguments
|
||||
from pipecat.runner.utils import create_transport
|
||||
from pipecat.services.cartesia.tts import CartesiaTTSService
|
||||
from pipecat.services.openai.llm import OpenAILLMService
|
||||
from pipecat.services.soniox.stt import SonioxSTTService
|
||||
from pipecat.services.soniox.tts import SonioxTTSService
|
||||
from pipecat.transcriptions.language import Language
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
from pipecat.transports.daily.transport import DailyParams
|
||||
@@ -53,12 +53,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
|
||||
stt = SonioxSTTService(api_key=os.environ["SONIOX_API_KEY"])
|
||||
|
||||
tts = CartesiaTTSService(
|
||||
api_key=os.environ["CARTESIA_API_KEY"],
|
||||
settings=CartesiaTTSService.Settings(
|
||||
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
|
||||
),
|
||||
)
|
||||
tts = SonioxTTSService(api_key=os.environ["SONIOX_API_KEY"])
|
||||
|
||||
llm = OpenAILLMService(
|
||||
api_key=os.environ["OPENAI_API_KEY"],
|
||||
@@ -103,9 +98,9 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
await task.queue_frames([LLMRunFrame()])
|
||||
|
||||
await asyncio.sleep(10)
|
||||
logger.info("Updating Soniox STT settings: language=es")
|
||||
logger.info("Updating Soniox STT settings: language_hints=[es]")
|
||||
await task.queue_frame(
|
||||
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language=Language.ES))
|
||||
STTUpdateSettingsFrame(delta=SonioxSTTService.Settings(language_hints=[Language.ES]))
|
||||
)
|
||||
|
||||
@transport.event_handler("on_client_disconnected")
|
||||
|
||||
@@ -55,7 +55,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
tts = GradiumTTSService(
|
||||
api_key=os.environ["GRADIUM_API_KEY"],
|
||||
settings=GradiumTTSService.Settings(voice="YTpq7expH9539ERJ"),
|
||||
url="wss://us.api.gradium.ai/api/speech/tts",
|
||||
)
|
||||
|
||||
llm = OpenAILLMService(
|
||||
|
||||
@@ -54,7 +54,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
|
||||
stt = GradiumSTTService(
|
||||
api_key=os.environ["GRADIUM_API_KEY"],
|
||||
api_endpoint_base_url="wss://us.api.gradium.ai/api/speech/asr",
|
||||
settings=GradiumSTTService.Settings(
|
||||
language=Language.EN,
|
||||
),
|
||||
@@ -62,7 +61,6 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
|
||||
tts = GradiumTTSService(
|
||||
api_key=os.environ["GRADIUM_API_KEY"],
|
||||
url="wss://us.api.gradium.ai/api/speech/tts",
|
||||
settings=GradiumTTSService.Settings(
|
||||
voice="YTpq7expH9539ERJ",
|
||||
),
|
||||
|
||||
@@ -58,6 +58,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
|
||||
# Add strict mode to enforce the language hints
|
||||
language_hints=[Language.EN],
|
||||
language_hints_strict=True,
|
||||
enable_language_identification=True,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@@ -77,6 +77,7 @@ groq = [ "groq>=0.23.0,<2" ]
|
||||
gstreamer = [ "pygobject~=3.50.0" ]
|
||||
heygen = [ "livekit>=1.0.13,<2", "pipecat-ai[websockets-base]" ]
|
||||
hume = [ "hume>=0.11.2,<1" ]
|
||||
inception = []
|
||||
inworld = [ "pipecat-ai[websockets-base]" ]
|
||||
koala = [ "pvkoala~=2.0.3" ]
|
||||
kokoro = [ "kokoro-onnx>=0.5.0,<1", "requests>=2.32.5,<3" ]
|
||||
@@ -103,7 +104,7 @@ piper = [ "piper-tts>=1.3.0,<2", "requests>=2.32.5,<3" ]
|
||||
qwen = []
|
||||
resembleai = [ "pipecat-ai[websockets-base]" ]
|
||||
rime = [ "pipecat-ai[websockets-base]" ]
|
||||
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-small-webrtc-prebuilt>=2.5.0"]
|
||||
runner = [ "python-dotenv>=1.0.0,<2.0.0", "uvicorn>=0.32.0,<1.0.0", "fastapi>=0.115.6,<1", "pipecat-ai-prebuilt>=1.0.1"]
|
||||
sagemaker = ["aws_sdk_sagemaker_runtime_http2; python_version>='3.12'"]
|
||||
sambanova = []
|
||||
sarvam = [ "sarvamai==0.1.28", "pipecat-ai[websockets-base]" ]
|
||||
@@ -119,6 +120,7 @@ tavus = [ "pipecat-ai[daily]" ]
|
||||
together = []
|
||||
tracing = [ "opentelemetry-sdk>=1.33.0,<2", "opentelemetry-api>=1.33.0,<2", "opentelemetry-instrumentation>=0.54b0,<1" ]
|
||||
ultravox = [ "pipecat-ai[websockets-base]" ]
|
||||
vonage-video-connector = [ "vonage-video-connector~=0.2.3b0; python_full_version>='3.13' and python_full_version<'3.14' and platform_system=='Linux'" ]
|
||||
webrtc = [ "aiortc>=1.14.0,<2", "opencv-python>=4.11.0.86,<5" ]
|
||||
websocket = [ "pipecat-ai[websockets-base]", "fastapi>=0.115.6,<1" ]
|
||||
websockets-base = [ "websockets>=13.1,<16.0" ]
|
||||
|
||||
@@ -198,6 +198,7 @@ TESTS_FUNCTION_CALLING = [
|
||||
("function-calling/function-calling-sarvam.py", EVAL_WEATHER),
|
||||
("function-calling/function-calling-novita.py", EVAL_WEATHER),
|
||||
("function-calling/function-calling-deepseek.py", EVAL_WEATHER),
|
||||
("function-calling/function-calling-inception.py", EVAL_WEATHER),
|
||||
# Video
|
||||
("function-calling/function-calling-anthropic-video.py", EVAL_VISION_CAMERA),
|
||||
("function-calling/function-calling-aws-video.py", EVAL_VISION_CAMERA),
|
||||
@@ -242,6 +243,7 @@ TESTS_VIDEO_AVATAR = [
|
||||
|
||||
TESTS_TURN_MANAGEMENT = [
|
||||
("turn-management/turn-management-filter-incomplete-turns.py", EVAL_COMPLETE_TURN),
|
||||
("turn-management/turn-management-filter-incomplete-turns-function-calling.py", EVAL_WEATHER),
|
||||
]
|
||||
|
||||
TESTS_THINKING = [
|
||||
|
||||
@@ -383,10 +383,14 @@ class AggregatedTextFrame(TextFrame):
|
||||
Parameters:
|
||||
aggregated_by: Method used to aggregate the text frames.
|
||||
context_id: Unique identifier for the TTS context that generated this text.
|
||||
raw_text: The full matched text including start/end pattern delimiters, set when
|
||||
this frame was produced from a PatternMatch (e.g. a ``<code>...</code>`` block).
|
||||
None for ordinary sentence aggregations.
|
||||
"""
|
||||
|
||||
aggregated_by: AggregationType | str
|
||||
context_id: str | None = None
|
||||
raw_text: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
|
||||
@@ -25,6 +25,7 @@ from pipecat.adapters.schemas.tools_schema import ToolsSchema
|
||||
from pipecat.audio.vad.vad_analyzer import VADAnalyzer
|
||||
from pipecat.audio.vad.vad_controller import VADController
|
||||
from pipecat.frames.frames import (
|
||||
AggregatedTextFrame,
|
||||
AssistantImageRawFrame,
|
||||
BotStartedSpeakingFrame,
|
||||
BotStoppedSpeakingFrame,
|
||||
@@ -1496,9 +1497,14 @@ class LLMAssistantAggregator(LLMContextAggregator):
|
||||
if len(frame.text) == 0:
|
||||
return
|
||||
|
||||
text = (
|
||||
frame.raw_text
|
||||
if isinstance(frame, AggregatedTextFrame) and frame.raw_text
|
||||
else frame.text
|
||||
)
|
||||
self._aggregation.append(
|
||||
TextPartForConcatenation(
|
||||
frame.text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
|
||||
text, includes_inter_part_spaces=frame.includes_inter_frame_spaces
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@@ -23,6 +23,7 @@ from pipecat.frames.frames import (
|
||||
)
|
||||
from pipecat.processors.frame_processor import FrameDirection, FrameProcessor
|
||||
from pipecat.utils.text.base_text_aggregator import BaseTextAggregator
|
||||
from pipecat.utils.text.pattern_pair_aggregator import PatternMatch
|
||||
from pipecat.utils.text.simple_text_aggregator import SimpleTextAggregator
|
||||
|
||||
|
||||
@@ -85,7 +86,11 @@ class LLMTextProcessor(FrameProcessor):
|
||||
out_frame = AggregatedTextFrame(
|
||||
text=aggregation.text,
|
||||
aggregated_by=aggregation.type,
|
||||
raw_text=aggregation.full_match
|
||||
if isinstance(aggregation, PatternMatch)
|
||||
else aggregation.text,
|
||||
)
|
||||
out_frame.append_to_context = True
|
||||
out_frame.skip_tts = in_frame.skip_tts
|
||||
await self.push_frame(out_frame)
|
||||
|
||||
@@ -96,6 +101,9 @@ class LLMTextProcessor(FrameProcessor):
|
||||
out_frame = AggregatedTextFrame(
|
||||
text=remaining.text,
|
||||
aggregated_by=remaining.type,
|
||||
raw_text=remaining.full_match
|
||||
if isinstance(remaining, PatternMatch)
|
||||
else remaining.text,
|
||||
)
|
||||
out_frame.skip_tts = skip_tts
|
||||
await self.push_frame(out_frame)
|
||||
|
||||
@@ -528,6 +528,9 @@ class RTVIObserver(BaseObserver):
|
||||
text = await transform(text, agg_type)
|
||||
|
||||
isTTS = isinstance(frame, TTSTextFrame)
|
||||
if agg_type is not AggregationType.WORD:
|
||||
logger.trace(f"{self} Aggregated LLM text: {text}, {agg_type} spoken:{isTTS}")
|
||||
|
||||
if self._params.bot_output_enabled:
|
||||
message = RTVI.BotOutputMessage(
|
||||
data=RTVI.BotOutputMessageData(text=text, spoken=isTTS, aggregated_by=agg_type)
|
||||
|
||||
@@ -19,6 +19,10 @@ All bots must implement a `bot(runner_args)` async function as the entry point.
|
||||
The server automatically discovers and executes this function when connections
|
||||
are established.
|
||||
|
||||
By default the runner starts a single FastAPI server that supports WebRTC, Daily,
|
||||
and telephony transports simultaneously. Clients declare which transport they want
|
||||
via the ``transport`` field in the ``/start`` request body (default: ``"webrtc"``).
|
||||
|
||||
Single transport example::
|
||||
|
||||
async def bot(runner_args: RunnerArguments):
|
||||
@@ -55,18 +59,38 @@ Supported transports:
|
||||
- WebRTC - Provides local WebRTC interface with prebuilt UI
|
||||
- Telephony - Handles webhook and WebSocket connections for Twilio, Telnyx, Plivo, Exotel
|
||||
|
||||
The ``/start`` endpoint accepts::
|
||||
|
||||
{
|
||||
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
|
||||
// "plivo" | "exotel" — default: "webrtc"
|
||||
|
||||
// WebRTC-specific
|
||||
"enableDefaultIceServers": false,
|
||||
"body": {...},
|
||||
|
||||
// Daily-specific
|
||||
"createDailyRoom": true,
|
||||
"dailyRoomProperties": {...},
|
||||
"dailyMeetingTokenProperties": {...},
|
||||
"body": {...}
|
||||
}
|
||||
|
||||
To run locally:
|
||||
|
||||
- WebRTC: `python bot.py -t webrtc`
|
||||
- ESP32: `python bot.py -t webrtc --esp32 --host 192.168.1.100`
|
||||
- Daily (server): `python bot.py -t daily`
|
||||
- Daily (direct, testing only): `python bot.py -d`
|
||||
- Telephony: `python bot.py -t twilio -x your_username.ngrok.io`
|
||||
- Exotel: `python bot.py -t exotel` (no proxy needed, but ngrok connection to HTTP 7860 is required)
|
||||
- All transports (default): ``python bot.py``
|
||||
- WebRTC only: ``python bot.py -t webrtc``
|
||||
- ESP32: ``python bot.py -t webrtc --esp32 --host 192.168.1.100``
|
||||
- Daily only: ``python bot.py -t daily``
|
||||
- Daily (direct, testing only): ``python bot.py -d``
|
||||
- Telephony: ``python bot.py -t twilio -x your_username.ngrok.io``
|
||||
- Exotel: ``python bot.py -t exotel`` (no proxy needed, but ngrok connection to HTTP 7860 is required)
|
||||
- WhatsApp: ``python bot.py --whatsapp``
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import importlib.util
|
||||
import mimetypes
|
||||
import os
|
||||
import sys
|
||||
@@ -85,8 +109,10 @@ from pipecat.runner.types import (
|
||||
DailyRunnerArguments,
|
||||
RunnerArguments,
|
||||
SmallWebRTCRunnerArguments,
|
||||
VonageRunnerArguments,
|
||||
WebSocketRunnerArguments,
|
||||
)
|
||||
from pipecat.runner.vonage import configure as configure_vonage
|
||||
|
||||
try:
|
||||
import uvicorn
|
||||
@@ -106,6 +132,18 @@ load_dotenv(override=True)
|
||||
os.environ["ENV"] = "local"
|
||||
|
||||
TELEPHONY_TRANSPORTS = ["twilio", "telnyx", "plivo", "exotel"]
|
||||
TRANSPORT_ROUTE_DEPENDENCIES = {
|
||||
"daily": ("daily",),
|
||||
"webrtc": ("aiortc",),
|
||||
"telephony": ("fastapi", "websockets"),
|
||||
"websocket": ("fastapi", "websockets"),
|
||||
}
|
||||
TRANSPORT_INSTALL_HINTS = {
|
||||
"daily": "install pipecat-ai[daily]",
|
||||
"webrtc": "install pipecat-ai[webrtc]",
|
||||
"telephony": "install pipecat-ai[websocket]",
|
||||
"websocket": "install pipecat-ai[websocket]",
|
||||
}
|
||||
|
||||
# Mirror Pipecat Cloud's 4-hour max session limit so dev rooms get cleaned up.
|
||||
PIPECAT_ROOM_EXP_HOURS = 4.0
|
||||
@@ -131,6 +169,120 @@ Import this to add custom routes from other packages before calling
|
||||
"""
|
||||
|
||||
|
||||
def _is_module_available(module: str) -> bool:
|
||||
"""Check whether a module can be imported without importing it.
|
||||
|
||||
Args:
|
||||
module: Fully-qualified module name to check.
|
||||
|
||||
Returns:
|
||||
``True`` if Python can resolve the module, ``False`` otherwise.
|
||||
"""
|
||||
try:
|
||||
return importlib.util.find_spec(module) is not None
|
||||
except (ImportError, ModuleNotFoundError, ValueError):
|
||||
return False
|
||||
|
||||
|
||||
def _transport_route_dependencies(transport: str) -> tuple[str, ...]:
|
||||
"""Return module dependencies required for a transport route.
|
||||
|
||||
Args:
|
||||
transport: Transport name from the runner request or CLI.
|
||||
|
||||
Returns:
|
||||
Module names required to enable the transport route.
|
||||
"""
|
||||
if transport in TELEPHONY_TRANSPORTS:
|
||||
return TRANSPORT_ROUTE_DEPENDENCIES["telephony"]
|
||||
return TRANSPORT_ROUTE_DEPENDENCIES.get(transport, ())
|
||||
|
||||
|
||||
def _transport_routes_enabled(transport: str) -> bool:
|
||||
"""Return whether a transport route can run in this environment.
|
||||
|
||||
Args:
|
||||
transport: Transport name from the runner request or CLI.
|
||||
|
||||
Returns:
|
||||
``True`` if the requested transport is enabled.
|
||||
"""
|
||||
return all(_is_module_available(module) for module in _transport_route_dependencies(transport))
|
||||
|
||||
|
||||
def _runner_url(args: argparse.Namespace) -> str:
|
||||
"""Return the browser URL for the runner prebuilt client."""
|
||||
return f"http://{args.host}:{args.port}"
|
||||
|
||||
|
||||
def _transport_status_lists() -> tuple[list[str], list[str]]:
|
||||
"""Return enabled and disabled transport labels for the startup banner."""
|
||||
transports = ["daily", "webrtc", "telephony", "websocket"]
|
||||
enabled = []
|
||||
disabled = []
|
||||
|
||||
for label in transports:
|
||||
if _transport_routes_enabled(label):
|
||||
enabled.append(label)
|
||||
else:
|
||||
disabled.append(f"{label} ({TRANSPORT_INSTALL_HINTS[label]})")
|
||||
|
||||
return enabled, disabled
|
||||
|
||||
|
||||
def _format_transport_status(labels: list[str]) -> str:
|
||||
"""Format a startup banner transport status list."""
|
||||
return ", ".join(labels) if labels else "none"
|
||||
|
||||
|
||||
def _print_startup_message(args: argparse.Namespace):
|
||||
"""Print connection information for the development runner."""
|
||||
print()
|
||||
if args.transport is None:
|
||||
enabled, disabled = _transport_status_lists()
|
||||
print("🚀 Bot ready!")
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
print(f" → Enabled transports: {_format_transport_status(enabled)}")
|
||||
if disabled:
|
||||
print(f" → Disabled transports: {_format_transport_status(disabled)}")
|
||||
elif args.transport == "webrtc":
|
||||
if args.esp32:
|
||||
print("🚀 Bot ready! (ESP32 mode)")
|
||||
elif args.whatsapp:
|
||||
print("🚀 Bot ready! (WhatsApp)")
|
||||
else:
|
||||
print("🚀 Bot ready! (WebRTC)")
|
||||
if _transport_routes_enabled("webrtc"):
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
else:
|
||||
print(f" → WebRTC disabled ({TRANSPORT_INSTALL_HINTS['webrtc']})")
|
||||
elif args.transport == "daily":
|
||||
print("🚀 Bot ready! (Daily)")
|
||||
if not _transport_routes_enabled("daily"):
|
||||
print(f" → Daily disabled ({TRANSPORT_INSTALL_HINTS['daily']})")
|
||||
else:
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
if args.dialin:
|
||||
print(
|
||||
f" → Daily dial-in webhook: "
|
||||
f"http://{args.host}:{args.port}/daily-dialin-webhook"
|
||||
)
|
||||
print(" → Configure this URL in your Daily phone number settings")
|
||||
elif args.transport in TELEPHONY_TRANSPORTS:
|
||||
print(f"🚀 Bot ready! ({args.transport.capitalize()})")
|
||||
if not _transport_routes_enabled(args.transport):
|
||||
print(f" → Telephony disabled ({TRANSPORT_INSTALL_HINTS['telephony']})")
|
||||
else:
|
||||
print(f" → Open: {_runner_url(args)}")
|
||||
if args.proxy:
|
||||
print(f" → XML webhook: http://{args.host}:{args.port}/")
|
||||
print(f" → WebSocket: ws://{args.host}:{args.port}/ws")
|
||||
elif args.transport == "vonage":
|
||||
print()
|
||||
print("🚀 Bot ready!")
|
||||
print()
|
||||
|
||||
|
||||
def _get_bot_module():
|
||||
"""Get the bot module from the calling script."""
|
||||
import importlib.util
|
||||
@@ -186,8 +338,35 @@ async def _run_telephony_bot(websocket: WebSocket, args: argparse.Namespace):
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
async def _run_websocket_bot(websocket: WebSocket, args: argparse.Namespace):
|
||||
"""Run a bot for plain WebSocket transport."""
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
runner_args = WebSocketRunnerArguments(
|
||||
websocket=websocket,
|
||||
transport_type="websocket",
|
||||
session_id=str(uuid.uuid4()),
|
||||
)
|
||||
runner_args.cli_args = args
|
||||
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
def _setup_websocket_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up the plain WebSocket route at ``/ws-client``."""
|
||||
if not _transport_routes_enabled("websocket"):
|
||||
return
|
||||
|
||||
@app.websocket("/ws-client")
|
||||
async def websocket_client_endpoint(websocket: WebSocket):
|
||||
"""Handle plain WebSocket connections (non-telephony)."""
|
||||
await websocket.accept()
|
||||
logger.debug("Plain WebSocket connection accepted")
|
||||
await _run_websocket_bot(websocket, args)
|
||||
|
||||
|
||||
def _configure_server_app(args: argparse.Namespace):
|
||||
"""Configure the module-level FastAPI app with transport-specific routes."""
|
||||
"""Configure the module-level FastAPI app with routes for all transports."""
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
@@ -196,17 +375,207 @@ def _configure_server_app(args: argparse.Namespace):
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Set up transport-specific routes
|
||||
if args.transport == "webrtc":
|
||||
_setup_webrtc_routes(app, args)
|
||||
if args.whatsapp:
|
||||
_setup_whatsapp_routes(app, args)
|
||||
elif args.transport == "daily":
|
||||
_setup_daily_routes(app, args)
|
||||
elif args.transport in TELEPHONY_TRANSPORTS:
|
||||
_setup_telephony_routes(app, args)
|
||||
else:
|
||||
logger.warning(f"Unknown transport type: {args.transport}")
|
||||
# Shared session store: session_id -> body data. Used by the WebRTC /start
|
||||
# flow and the /sessions/{session_id}/... proxy routes.
|
||||
active_sessions: dict[str, dict[str, Any]] = {}
|
||||
|
||||
_setup_frontend_routes(app)
|
||||
_setup_webrtc_routes(app, args, active_sessions)
|
||||
_setup_daily_routes(app, args)
|
||||
_setup_telephony_routes(app, args)
|
||||
_setup_websocket_routes(app, args)
|
||||
_setup_unified_start_route(app, args, active_sessions)
|
||||
|
||||
if args.whatsapp:
|
||||
_setup_whatsapp_routes(app, args)
|
||||
|
||||
|
||||
def _setup_unified_start_route(
|
||||
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
|
||||
):
|
||||
"""Register the unified POST /start and GET /status endpoints.
|
||||
|
||||
Handles WebRTC, Daily, and telephony transport start flows. Clients specify
|
||||
which transport they want via the ``transport`` field in the request body.
|
||||
When ``-t`` was passed on the command line, requests for any other transport
|
||||
are rejected with HTTP 400.
|
||||
"""
|
||||
ALL_TRANSPORTS = ["webrtc", "daily", *TELEPHONY_TRANSPORTS, "websocket"]
|
||||
|
||||
@app.get("/status")
|
||||
async def status():
|
||||
"""Return the transports supported by this runner instance."""
|
||||
transports = [args.transport] if args.transport is not None else ALL_TRANSPORTS
|
||||
return {"status": "ready", "transports": transports}
|
||||
|
||||
class IceServer(TypedDict, total=False):
|
||||
urls: str | list[str]
|
||||
|
||||
class IceConfig(TypedDict):
|
||||
iceServers: list[IceServer]
|
||||
|
||||
class StartBotResult(TypedDict, total=False):
|
||||
sessionId: str
|
||||
iceConfig: IceConfig | None
|
||||
dailyRoom: str | None
|
||||
dailyToken: str | None
|
||||
wsUrl: str | None
|
||||
token: str | None
|
||||
|
||||
@app.post("/start")
|
||||
async def start_agent(request: Request):
|
||||
"""Start a bot session.
|
||||
|
||||
Accepts::
|
||||
|
||||
{
|
||||
"transport": "webrtc", // "webrtc" | "daily" | "twilio" | "telnyx" |
|
||||
// "plivo" | "exotel" — default: "webrtc"
|
||||
|
||||
// WebRTC-specific
|
||||
"enableDefaultIceServers": false,
|
||||
"body": {...},
|
||||
|
||||
// Daily-specific
|
||||
"createDailyRoom": true,
|
||||
"dailyRoomProperties": {...},
|
||||
"dailyMeetingTokenProperties": {...},
|
||||
"body": {...}
|
||||
}
|
||||
"""
|
||||
try:
|
||||
request_data = await request.json()
|
||||
logger.debug(f"Received request: {request_data}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse request body: {e}")
|
||||
request_data = {}
|
||||
|
||||
# Determine transport: explicit field → legacy Daily hint → CLI default → webrtc
|
||||
transport = request_data.get("transport")
|
||||
if transport is None and request_data.get("createDailyRoom", False):
|
||||
transport = "daily"
|
||||
if transport is None:
|
||||
transport = args.transport or "webrtc"
|
||||
|
||||
# Enforce restriction when -t was explicitly set on the command line
|
||||
if args.transport is not None and transport != args.transport:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=(
|
||||
f"Transport '{transport}' is not allowed. "
|
||||
f"Server is configured for '{args.transport}' only (-t {args.transport})."
|
||||
),
|
||||
)
|
||||
|
||||
if not _transport_routes_enabled(transport):
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=(
|
||||
f"Transport '{transport}' is disabled in this runner environment. "
|
||||
"Check the startup banner for enabled transports."
|
||||
),
|
||||
)
|
||||
|
||||
if transport == "webrtc":
|
||||
# WebRTC: register the session; the bot starts when the WebRTC offer arrives.
|
||||
session_id = str(uuid.uuid4())
|
||||
active_sessions[session_id] = request_data.get("body", {})
|
||||
|
||||
result = StartBotResult(
|
||||
sessionId=session_id,
|
||||
)
|
||||
if request_data.get("enableDefaultIceServers"):
|
||||
result["iceConfig"] = IceConfig(
|
||||
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
|
||||
)
|
||||
return result
|
||||
|
||||
elif transport == "daily":
|
||||
create_daily_room = request_data.get("createDailyRoom", False)
|
||||
body = request_data.get("body", {})
|
||||
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
|
||||
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
|
||||
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
existing_room_url = os.getenv("DAILY_ROOM_URL")
|
||||
session_id = str(uuid.uuid4())
|
||||
result: StartBotResult | None = None
|
||||
|
||||
if create_daily_room or existing_room_url:
|
||||
from pipecat.runner.daily import configure
|
||||
from pipecat.transports.daily.utils import (
|
||||
DailyMeetingTokenProperties,
|
||||
DailyRoomProperties,
|
||||
)
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
room_properties = None
|
||||
if daily_room_properties_dict:
|
||||
daily_room_properties_dict.setdefault(
|
||||
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
|
||||
)
|
||||
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
|
||||
try:
|
||||
room_properties = DailyRoomProperties(**daily_room_properties_dict)
|
||||
logger.debug(f"Using custom room properties: {room_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyRoomProperties: {e}")
|
||||
|
||||
token_properties = None
|
||||
if daily_token_properties_dict:
|
||||
try:
|
||||
token_properties = DailyMeetingTokenProperties(
|
||||
**daily_token_properties_dict
|
||||
)
|
||||
logger.debug(f"Using custom token properties: {token_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
|
||||
|
||||
room_url, token = await configure(
|
||||
session,
|
||||
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
|
||||
room_properties=room_properties,
|
||||
token_properties=token_properties,
|
||||
)
|
||||
runner_args = DailyRunnerArguments(
|
||||
room_url=room_url, token=token, body=body, session_id=session_id
|
||||
)
|
||||
result = StartBotResult(
|
||||
dailyRoom=room_url,
|
||||
dailyToken=token,
|
||||
sessionId=session_id,
|
||||
)
|
||||
else:
|
||||
runner_args = RunnerArguments(body=body, session_id=session_id)
|
||||
|
||||
runner_args.cli_args = args
|
||||
asyncio.create_task(bot_module.bot(runner_args))
|
||||
return result
|
||||
|
||||
elif transport in TELEPHONY_TRANSPORTS:
|
||||
# Telephony: the bot starts when the provider connects to /ws.
|
||||
# Return the WebSocket URL so the caller knows where to point their provider.
|
||||
scheme = "wss" if args.host != "localhost" else "ws"
|
||||
return StartBotResult(
|
||||
wsUrl=f"{scheme}://{args.host}:{args.port}/ws",
|
||||
)
|
||||
|
||||
elif transport == "websocket":
|
||||
# Plain WebSocket: the bot starts when the client connects to /ws-client.
|
||||
scheme = "wss" if args.host != "localhost" else "ws"
|
||||
session_id = str(uuid.uuid4())
|
||||
return StartBotResult(
|
||||
wsUrl=f"{scheme}://{args.host}:{args.port}/ws-client",
|
||||
sessionId=session_id,
|
||||
token="mock_token",
|
||||
)
|
||||
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Unknown transport '{transport}'.",
|
||||
)
|
||||
|
||||
|
||||
def _resolve_download_path(folder: str, filename: str) -> Path:
|
||||
@@ -220,11 +589,30 @@ def _resolve_download_path(folder: str, filename: str) -> Path:
|
||||
return file_path
|
||||
|
||||
|
||||
def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up WebRTC-specific routes."""
|
||||
def _setup_frontend_routes(app: FastAPI):
|
||||
"""Mount the prebuilt frontend UI and root redirect for all transports."""
|
||||
try:
|
||||
from pipecat_ai_small_webrtc_prebuilt.frontend import SmallWebRTCPrebuiltUI
|
||||
from pipecat_ai_prebuilt.frontend import PipecatPrebuiltUI
|
||||
except ImportError as e:
|
||||
logger.error(f"Prebuilt frontend not available: {e}")
|
||||
return
|
||||
|
||||
app.mount("/client", PipecatPrebuiltUI)
|
||||
|
||||
@app.get("/", include_in_schema=False)
|
||||
async def root_redirect():
|
||||
"""Redirect root requests to client interface."""
|
||||
return RedirectResponse(url="/client/")
|
||||
|
||||
|
||||
def _setup_webrtc_routes(
|
||||
app: FastAPI, args: argparse.Namespace, active_sessions: dict[str, dict[str, Any]]
|
||||
):
|
||||
"""Set up WebRTC-specific routes."""
|
||||
if not _transport_routes_enabled("webrtc"):
|
||||
return
|
||||
|
||||
try:
|
||||
from pipecat.transports.smallwebrtc.connection import SmallWebRTCConnection
|
||||
from pipecat.transports.smallwebrtc.request_handler import (
|
||||
IceCandidate,
|
||||
@@ -233,30 +621,9 @@ def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
|
||||
SmallWebRTCRequestHandler,
|
||||
)
|
||||
except ImportError as e:
|
||||
logger.error(f"WebRTC transport dependencies not installed: {e}")
|
||||
logger.warning(f"WebRTC routes disabled after dependency check passed: {e}")
|
||||
return
|
||||
|
||||
class IceServer(TypedDict, total=False):
|
||||
urls: str | list[str]
|
||||
|
||||
class IceConfig(TypedDict):
|
||||
iceServers: list[IceServer]
|
||||
|
||||
class StartBotResult(TypedDict, total=False):
|
||||
sessionId: str
|
||||
iceConfig: IceConfig | None
|
||||
|
||||
# In-memory store of active sessions: session_id -> session info
|
||||
active_sessions: dict[str, dict[str, Any]] = {}
|
||||
|
||||
# Mount the frontend
|
||||
app.mount("/client", SmallWebRTCPrebuiltUI)
|
||||
|
||||
@app.get("/", include_in_schema=False)
|
||||
async def root_redirect():
|
||||
"""Redirect root requests to client interface."""
|
||||
return RedirectResponse(url="/client/")
|
||||
|
||||
@app.get("/files/{filename:path}")
|
||||
async def download_file(filename: str):
|
||||
"""Handle file downloads."""
|
||||
@@ -315,29 +682,6 @@ def _setup_webrtc_routes(app: FastAPI, args: argparse.Namespace):
|
||||
await small_webrtc_handler.handle_patch_request(request)
|
||||
return {"status": "success"}
|
||||
|
||||
@app.post("/start")
|
||||
async def rtvi_start(request: Request):
|
||||
"""Mimic Pipecat Cloud's /start endpoint."""
|
||||
# Parse the request body
|
||||
try:
|
||||
request_data = await request.json()
|
||||
logger.debug(f"Received request: {request_data}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse request body: {e}")
|
||||
request_data = {}
|
||||
|
||||
# Store session info immediately in memory, replicate the behavior expected on Pipecat Cloud
|
||||
session_id = str(uuid.uuid4())
|
||||
active_sessions[session_id] = request_data.get("body", {})
|
||||
|
||||
result: StartBotResult = {"sessionId": session_id}
|
||||
if request_data.get("enableDefaultIceServers"):
|
||||
result["iceConfig"] = IceConfig(
|
||||
iceServers=[IceServer(urls=["stun:stun.l.google.com:19302"])]
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
@app.api_route(
|
||||
"/sessions/{session_id}/{path:path}",
|
||||
methods=["GET", "POST", "PUT", "PATCH", "DELETE"],
|
||||
@@ -562,13 +906,13 @@ def _setup_whatsapp_routes(app: FastAPI, args: argparse.Namespace):
|
||||
|
||||
def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up Daily-specific routes."""
|
||||
if not _transport_routes_enabled("daily"):
|
||||
return
|
||||
|
||||
@app.get("/")
|
||||
@app.get("/daily")
|
||||
async def create_room_and_start_agent():
|
||||
"""Launch a Daily bot and redirect to room."""
|
||||
print("Starting bot with Daily transport and redirecting to Daily room")
|
||||
|
||||
import aiohttp
|
||||
logger.debug("Starting bot with Daily transport and redirecting to Daily room")
|
||||
|
||||
from pipecat.runner.daily import configure
|
||||
|
||||
@@ -584,105 +928,6 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
asyncio.create_task(bot_module.bot(runner_args))
|
||||
return RedirectResponse(room_url)
|
||||
|
||||
@app.post("/start")
|
||||
async def start_agent(request: Request):
|
||||
"""Handler for /start endpoints.
|
||||
|
||||
Expects POST body like::
|
||||
{
|
||||
"createDailyRoom": true,
|
||||
"dailyRoomProperties": { "start_video_off": true },
|
||||
"dailyMeetingTokenProperties": { "is_owner": true, "user_name": "Bot" },
|
||||
"body": { "custom_data": "value" }
|
||||
}
|
||||
"""
|
||||
print("Starting bot with Daily transport")
|
||||
|
||||
# Parse the request body
|
||||
try:
|
||||
request_data = await request.json()
|
||||
logger.debug(f"Received request: {request_data}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse request body: {e}")
|
||||
request_data = {}
|
||||
|
||||
create_daily_room = request_data.get("createDailyRoom", False)
|
||||
body = request_data.get("body", {})
|
||||
daily_room_properties_dict = request_data.get("dailyRoomProperties", None)
|
||||
daily_token_properties_dict = request_data.get("dailyMeetingTokenProperties", None)
|
||||
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
existing_room_url = os.getenv("DAILY_ROOM_URL")
|
||||
|
||||
session_id = str(uuid.uuid4())
|
||||
result = None
|
||||
|
||||
# Configure room if:
|
||||
# 1. Explicitly requested via createDailyRoom in payload
|
||||
# 2. Using pre-configured room from DAILY_ROOM_URL env var
|
||||
if create_daily_room or existing_room_url:
|
||||
import aiohttp
|
||||
|
||||
from pipecat.runner.daily import configure
|
||||
from pipecat.transports.daily.utils import (
|
||||
DailyMeetingTokenProperties,
|
||||
DailyRoomProperties,
|
||||
)
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
# Parse dailyRoomProperties if provided
|
||||
room_properties = None
|
||||
if daily_room_properties_dict:
|
||||
# Apply Pipecat Cloud's session policy if caller didn't override.
|
||||
daily_room_properties_dict.setdefault(
|
||||
"exp", time.time() + PIPECAT_ROOM_EXP_HOURS * 3600
|
||||
)
|
||||
daily_room_properties_dict.setdefault("eject_at_room_exp", True)
|
||||
try:
|
||||
room_properties = DailyRoomProperties(**daily_room_properties_dict)
|
||||
logger.debug(f"Using custom room properties: {room_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyRoomProperties: {e}")
|
||||
# Continue without custom properties
|
||||
|
||||
# Parse dailyMeetingTokenProperties if provided
|
||||
token_properties = None
|
||||
if daily_token_properties_dict:
|
||||
try:
|
||||
token_properties = DailyMeetingTokenProperties(
|
||||
**daily_token_properties_dict
|
||||
)
|
||||
logger.debug(f"Using custom token properties: {token_properties}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to parse dailyMeetingTokenProperties: {e}")
|
||||
# Continue without custom properties
|
||||
|
||||
room_url, token = await configure(
|
||||
session,
|
||||
room_exp_duration=PIPECAT_ROOM_EXP_HOURS,
|
||||
room_properties=room_properties,
|
||||
token_properties=token_properties,
|
||||
)
|
||||
runner_args = DailyRunnerArguments(
|
||||
room_url=room_url, token=token, body=body, session_id=session_id
|
||||
)
|
||||
result = {
|
||||
"dailyRoom": room_url,
|
||||
"dailyToken": token,
|
||||
"sessionId": session_id,
|
||||
}
|
||||
else:
|
||||
runner_args = RunnerArguments(body=body, session_id=session_id)
|
||||
|
||||
# Update CLI args.
|
||||
runner_args.cli_args = args
|
||||
|
||||
# Start the bot in the background
|
||||
asyncio.create_task(bot_module.bot(runner_args))
|
||||
|
||||
return result
|
||||
|
||||
if args.dialin:
|
||||
|
||||
@app.post("/daily-dialin-webhook")
|
||||
@@ -731,8 +976,6 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
detail="Missing required fields: From, To, callId, callDomain",
|
||||
)
|
||||
|
||||
import aiohttp
|
||||
|
||||
from pipecat.runner.daily import configure
|
||||
from pipecat.runner.types import DailyDialinRequest, DialinSettings
|
||||
|
||||
@@ -801,44 +1044,54 @@ def _setup_daily_routes(app: FastAPI, args: argparse.Namespace):
|
||||
|
||||
|
||||
def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
|
||||
"""Set up telephony-specific routes."""
|
||||
# XML response templates (Exotel doesn't use XML webhooks)
|
||||
XML_TEMPLATES = {
|
||||
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
"""Set up telephony-specific routes.
|
||||
|
||||
The WebSocket endpoint (``/ws``) is always registered so providers can
|
||||
connect directly. The XML webhook (``POST /``) is only registered when a
|
||||
specific telephony transport is chosen via ``-t`` because the XML template
|
||||
is provider-specific and requires a proxy hostname (``--proxy``).
|
||||
"""
|
||||
if not _transport_routes_enabled("telephony"):
|
||||
return
|
||||
|
||||
if args.transport in TELEPHONY_TRANSPORTS:
|
||||
# XML response templates (Exotel doesn't use XML webhooks)
|
||||
XML_TEMPLATES = {
|
||||
"twilio": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Connect>
|
||||
<Stream url="wss://{args.proxy}/ws"></Stream>
|
||||
</Connect>
|
||||
<Pause length="40"/>
|
||||
</Response>""",
|
||||
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
"telnyx": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Connect>
|
||||
<Stream url="wss://{args.proxy}/ws" bidirectionalMode="rtp"></Stream>
|
||||
</Connect>
|
||||
<Pause length="40"/>
|
||||
</Response>""",
|
||||
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
"plivo": f"""<?xml version="1.0" encoding="UTF-8"?>
|
||||
<Response>
|
||||
<Stream bidirectional="true" keepCallAlive="true" contentType="audio/x-mulaw;rate=8000">wss://{args.proxy}/ws</Stream>
|
||||
</Response>""",
|
||||
}
|
||||
}
|
||||
|
||||
@app.post("/")
|
||||
async def start_call():
|
||||
"""Handle telephony webhook and return XML response."""
|
||||
if args.transport == "exotel":
|
||||
# Exotel doesn't use POST webhooks - redirect to proper documentation
|
||||
logger.debug("POST Exotel endpoint - not used")
|
||||
return {
|
||||
"error": "Exotel doesn't use POST webhooks",
|
||||
"websocket_url": f"wss://{args.proxy}/ws",
|
||||
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
|
||||
}
|
||||
else:
|
||||
logger.debug(f"POST {args.transport.upper()} XML")
|
||||
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
|
||||
return HTMLResponse(content=xml_content, media_type="application/xml")
|
||||
@app.post("/")
|
||||
async def start_call():
|
||||
"""Handle telephony webhook and return XML response."""
|
||||
if args.transport == "exotel":
|
||||
# Exotel doesn't use POST webhooks - redirect to proper documentation
|
||||
logger.debug("POST Exotel endpoint - not used")
|
||||
return {
|
||||
"error": "Exotel doesn't use POST webhooks",
|
||||
"websocket_url": f"wss://{args.proxy}/ws",
|
||||
"note": "Configure the WebSocket URL above in your Exotel App Bazaar Voicebot Applet",
|
||||
}
|
||||
else:
|
||||
logger.debug(f"POST {args.transport.upper()} XML")
|
||||
xml_content = XML_TEMPLATES.get(args.transport, "<Response></Response>")
|
||||
return HTMLResponse(content=xml_content, media_type="application/xml")
|
||||
|
||||
@app.websocket("/ws")
|
||||
async def websocket_endpoint(websocket: WebSocket):
|
||||
@@ -847,11 +1100,6 @@ def _setup_telephony_routes(app: FastAPI, args: argparse.Namespace):
|
||||
logger.debug("WebSocket connection accepted")
|
||||
await _run_telephony_bot(websocket, args)
|
||||
|
||||
@app.get("/")
|
||||
async def start_agent():
|
||||
"""Simple status endpoint for telephony transports."""
|
||||
return {"status": f"Bot started with {args.transport}"}
|
||||
|
||||
|
||||
async def _run_daily_direct(args: argparse.Namespace):
|
||||
"""Run Daily bot with direct connection (no FastAPI server)."""
|
||||
@@ -883,6 +1131,25 @@ async def _run_daily_direct(args: argparse.Namespace):
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
async def _run_vonage():
|
||||
"""Run Vonage bot (no FastAPI server)."""
|
||||
logger.info("Running Vonage transport...")
|
||||
|
||||
application_id, session_id, token = await configure_vonage()
|
||||
runner_args = VonageRunnerArguments(
|
||||
application_id=application_id, vonage_session_id=session_id, token=token
|
||||
)
|
||||
runner_args.handle_sigint = True
|
||||
|
||||
# Get the bot module and run it directly
|
||||
bot_module = _get_bot_module()
|
||||
|
||||
print(f"Joining Vonage session: {runner_args.vonage_session_id}")
|
||||
print()
|
||||
|
||||
await bot_module.bot(runner_args)
|
||||
|
||||
|
||||
def _validate_and_clean_proxy(proxy: str) -> str:
|
||||
"""Validate and clean proxy hostname, removing protocol if present."""
|
||||
if not proxy:
|
||||
@@ -922,22 +1189,27 @@ def runner_port() -> int:
|
||||
def main(parser: argparse.ArgumentParser | None = None):
|
||||
"""Start the Pipecat development runner.
|
||||
|
||||
Parses command-line arguments and starts a FastAPI server configured
|
||||
for the specified transport type.
|
||||
Parses command-line arguments and starts a FastAPI server that supports
|
||||
WebRTC, Daily, and telephony transports simultaneously. Clients declare
|
||||
which transport to use via the ``transport`` field in the ``/start`` body.
|
||||
|
||||
When ``-t`` is provided, the server restricts ``/start`` to that transport
|
||||
only and displays transport-specific startup information.
|
||||
|
||||
The runner discovers and runs any ``bot(runner_args)`` function found in the
|
||||
calling module.
|
||||
|
||||
Command-line arguments:
|
||||
- --host: Server host address (default: localhost) 879
|
||||
- --host: Server host address (default: localhost)
|
||||
- --port: Server port (default: 7860)
|
||||
- -t/--transport: Transport type (daily, webrtc, twilio, telnyx, plivo, exotel)
|
||||
- -t/--transport: Restrict to a single transport and set as default for /start
|
||||
(daily, webrtc, twilio, telnyx, plivo, exotel). Omit to support all transports.
|
||||
- -x/--proxy: Public proxy hostname for telephony webhooks
|
||||
- -d/--direct: Connect directly to Daily room (automatically sets transport to daily)
|
||||
- -f/--folder: Path to downloads folder
|
||||
- --dialin: Enable Daily PSTN dial-in webhook handling (requires Daily transport)
|
||||
- --dialin: Enable Daily PSTN dial-in webhook handling
|
||||
- --esp32: Enable SDP munging for ESP32 compatibility (requires --host with IP address)
|
||||
- --whatsapp: Ensure requried WhatsApp environment variables are present
|
||||
- --whatsapp: Ensure required WhatsApp environment variables are present
|
||||
- -v/--verbose: Increase logging verbosity
|
||||
|
||||
Args:
|
||||
@@ -957,9 +1229,12 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
"-t",
|
||||
"--transport",
|
||||
type=str,
|
||||
choices=["daily", "webrtc", *TELEPHONY_TRANSPORTS],
|
||||
default="webrtc",
|
||||
help="Transport type",
|
||||
choices=["daily", "vonage", "webrtc", *TELEPHONY_TRANSPORTS],
|
||||
default=None,
|
||||
help=(
|
||||
"Restrict the server to a single transport and set it as the default for /start. "
|
||||
"Omit to support all transports simultaneously (default behaviour)."
|
||||
),
|
||||
)
|
||||
parser.add_argument("-x", "--proxy", help="Public proxy host name")
|
||||
parser.add_argument(
|
||||
@@ -977,7 +1252,7 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
"--dialin",
|
||||
action="store_true",
|
||||
default=False,
|
||||
help="Enable Daily PSTN dial-in webhook handling (requires Daily transport)",
|
||||
help="Enable Daily PSTN dial-in webhook handling",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--esp32",
|
||||
@@ -989,7 +1264,7 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
"--whatsapp",
|
||||
action="store_true",
|
||||
default=False,
|
||||
help="Ensure requried WhatsApp environment variables are present",
|
||||
help="Ensure required WhatsApp environment variables are present",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
@@ -998,12 +1273,13 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
if args.proxy:
|
||||
args.proxy = _validate_and_clean_proxy(args.proxy)
|
||||
|
||||
# Auto-set transport to daily if --direct is used without explicit transport
|
||||
if args.direct and args.transport == "webrtc": # webrtc is the default
|
||||
args.transport = "daily"
|
||||
elif args.direct and args.transport != "daily":
|
||||
logger.error("--direct flag only works with Daily transport (-t daily)")
|
||||
return
|
||||
# --direct implies Daily transport
|
||||
if args.direct:
|
||||
if args.transport is None or args.transport == "daily":
|
||||
args.transport = "daily"
|
||||
else:
|
||||
logger.error("--direct flag only works with Daily transport (-t daily)")
|
||||
return
|
||||
|
||||
# Validate ESP32 requirements
|
||||
if args.esp32 and args.host == "localhost":
|
||||
@@ -1011,7 +1287,7 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
return
|
||||
|
||||
# Validate dial-in requirements
|
||||
if args.dialin and args.transport != "daily":
|
||||
if args.dialin and args.transport is not None and args.transport != "daily":
|
||||
logger.error("--dialin flag only works with Daily transport (-t daily)")
|
||||
return
|
||||
|
||||
@@ -1029,28 +1305,12 @@ def main(parser: argparse.ArgumentParser | None = None):
|
||||
asyncio.run(_run_daily_direct(args))
|
||||
return
|
||||
|
||||
# Print startup message for server-based transports
|
||||
if args.transport == "webrtc":
|
||||
print()
|
||||
if args.esp32:
|
||||
print(f"🚀 Bot ready! (ESP32 mode)")
|
||||
elif args.whatsapp:
|
||||
print(f"🚀 Bot ready! (WhatsApp)")
|
||||
else:
|
||||
print(f"🚀 Bot ready!")
|
||||
print(f" → Open http://{args.host}:{args.port}/client in your browser")
|
||||
print()
|
||||
elif args.transport == "daily":
|
||||
print()
|
||||
print(f"🚀 Bot ready!")
|
||||
if args.dialin:
|
||||
print(
|
||||
f" → Daily dial-in webhook: http://{args.host}:{args.port}/daily-dialin-webhook"
|
||||
)
|
||||
print(f" → Configure this URL in your Daily phone number settings")
|
||||
else:
|
||||
print(f" → Open http://{args.host}:{args.port} in your browser to start a session")
|
||||
# Print startup message
|
||||
_print_startup_message(args)
|
||||
if args.transport == "vonage":
|
||||
asyncio.run(_run_vonage())
|
||||
print()
|
||||
return
|
||||
|
||||
RUNNER_DOWNLOADS_FOLDER = args.folder
|
||||
RUNNER_HOST = args.host
|
||||
|
||||
@@ -99,16 +99,35 @@ class DailyRunnerArguments(RunnerArguments):
|
||||
token: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class VonageRunnerArguments(RunnerArguments):
|
||||
"""Vonage transport session arguments for the runner.
|
||||
|
||||
Parameters:
|
||||
application_id: Vonage application ID
|
||||
vonage_session_id: Vonage session ID
|
||||
token: Vonage Session Token
|
||||
"""
|
||||
|
||||
application_id: str
|
||||
vonage_session_id: str
|
||||
token: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class WebSocketRunnerArguments(RunnerArguments):
|
||||
"""WebSocket transport session arguments for the runner.
|
||||
|
||||
Parameters:
|
||||
websocket: WebSocket connection for audio streaming
|
||||
transport_type: Transport type identifier. Set to ``"websocket"`` for plain
|
||||
WebSocket connections; ``None`` triggers auto-detection from the first
|
||||
telephony provider message.
|
||||
body: Additional request data
|
||||
"""
|
||||
|
||||
websocket: WebSocket
|
||||
transport_type: str | None = None
|
||||
|
||||
|
||||
@dataclass
|
||||
|
||||
@@ -33,7 +33,7 @@ import json
|
||||
import os
|
||||
import re
|
||||
from collections.abc import Callable
|
||||
from typing import Any
|
||||
from typing import Any, cast
|
||||
|
||||
from fastapi import WebSocket
|
||||
from loguru import logger
|
||||
@@ -42,9 +42,10 @@ from pipecat.runner.types import (
|
||||
DailyRunnerArguments,
|
||||
LiveKitRunnerArguments,
|
||||
SmallWebRTCRunnerArguments,
|
||||
VonageRunnerArguments,
|
||||
WebSocketRunnerArguments,
|
||||
)
|
||||
from pipecat.transports.base_transport import BaseTransport
|
||||
from pipecat.transports.base_transport import BaseTransport, TransportParams
|
||||
|
||||
|
||||
def _detect_transport_type_from_message(message_data: dict) -> str:
|
||||
@@ -271,6 +272,14 @@ def get_transport_client_id(transport: BaseTransport, client: Any) -> str:
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
try:
|
||||
from pipecat.transports.vonage.video_connector import VonageVideoConnectorTransport
|
||||
|
||||
if isinstance(transport, VonageVideoConnectorTransport):
|
||||
return client["streamId"]
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
logger.warning(f"Unable to get client id from unsupported transport {type(transport)}")
|
||||
return ""
|
||||
|
||||
@@ -303,6 +312,24 @@ async def maybe_capture_participant_camera(
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
try:
|
||||
from pipecat.transports.vonage.video_connector import (
|
||||
SubscribeSettings,
|
||||
VonageVideoConnectorTransport,
|
||||
)
|
||||
|
||||
if isinstance(transport, VonageVideoConnectorTransport):
|
||||
await transport.subscribe_to_stream(
|
||||
client["streamId"],
|
||||
SubscribeSettings(
|
||||
subscribe_to_audio=True,
|
||||
subscribe_to_video=True,
|
||||
preferred_framerate=framerate if framerate != 0 else None,
|
||||
),
|
||||
)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
|
||||
async def maybe_capture_participant_screen(
|
||||
transport: BaseTransport, client: Any, framerate: int = 0
|
||||
@@ -534,6 +561,10 @@ async def create_transport(
|
||||
audio_out_enabled=True,
|
||||
# add_wav_header and serializer will be set automatically
|
||||
),
|
||||
"vonage": lambda: VonageVideoConnectorTransportParams(
|
||||
audio_in_enabled=True,
|
||||
audio_out_enabled=True
|
||||
),
|
||||
}
|
||||
|
||||
transport = await create_transport(runner_args, transport_params)
|
||||
@@ -562,6 +593,12 @@ async def create_transport(
|
||||
)
|
||||
|
||||
elif isinstance(runner_args, WebSocketRunnerArguments):
|
||||
if runner_args.transport_type == "websocket":
|
||||
params = _get_transport_params("websocket", transport_params)
|
||||
from pipecat.transports.websocket.fastapi import FastAPIWebsocketTransport
|
||||
|
||||
return FastAPIWebsocketTransport(websocket=runner_args.websocket, params=params)
|
||||
|
||||
# Parse once to determine the provider and get data
|
||||
transport_type, call_data = await parse_telephony_websocket(runner_args.websocket)
|
||||
params = _get_transport_params(transport_type, transport_params)
|
||||
@@ -581,6 +618,31 @@ async def create_transport(
|
||||
runner_args.room_name,
|
||||
params=params,
|
||||
)
|
||||
elif isinstance(runner_args, VonageRunnerArguments):
|
||||
from pipecat.transports.vonage.video_connector import (
|
||||
VonageVideoConnectorTransport,
|
||||
VonageVideoConnectorTransportParams,
|
||||
)
|
||||
|
||||
try:
|
||||
params = cast(
|
||||
VonageVideoConnectorTransportParams,
|
||||
_get_transport_params("vonage", transport_params),
|
||||
)
|
||||
except ValueError:
|
||||
webrtc_params: TransportParams = cast(
|
||||
TransportParams, _get_transport_params("webrtc", transport_params)
|
||||
)
|
||||
params = VonageVideoConnectorTransportParams(
|
||||
**webrtc_params.model_dump(),
|
||||
video_in_auto_subscribe=True,
|
||||
)
|
||||
|
||||
return VonageVideoConnectorTransport(
|
||||
runner_args.application_id,
|
||||
runner_args.vonage_session_id,
|
||||
runner_args.token,
|
||||
params=params,
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unsupported runner arguments type: {type(runner_args)}")
|
||||
|
||||
52
src/pipecat/runner/vonage.py
Normal file
52
src/pipecat/runner/vonage.py
Normal file
@@ -0,0 +1,52 @@
|
||||
#
|
||||
# Copyright (c) 2024-2026, Daily
|
||||
#
|
||||
# SPDX-License-Identifier: BSD 2-Clause License
|
||||
#
|
||||
|
||||
"""Vonage session configuration utilities.
|
||||
|
||||
This module extracts the necessary parameters to connect to a Vonage Video session.
|
||||
|
||||
Required environment variables:
|
||||
|
||||
- VONAGE_APPLICATION_ID - Vonage application ID
|
||||
- VONAGE_SESSION_ID - Vonage session ID
|
||||
- VONAGE_TOKEN - Vonage token
|
||||
|
||||
Example:
|
||||
from pipecat.runner.vonage import configure
|
||||
|
||||
application_id, session_id, token = await configure()
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
|
||||
async def configure() -> tuple[str, str, str]:
|
||||
"""Configure Vonage application ID, session ID and token from environment.
|
||||
|
||||
Returns:
|
||||
Tuple containing the server application_id, session_id and token.
|
||||
|
||||
Raises:
|
||||
Exception: If required Vonage configuration is not provided.
|
||||
"""
|
||||
application_id = os.getenv("VONAGE_APPLICATION_ID")
|
||||
session_id = os.getenv("VONAGE_SESSION_ID")
|
||||
token = os.getenv("VONAGE_TOKEN")
|
||||
|
||||
if not application_id:
|
||||
raise Exception(
|
||||
"No Vonage application ID specified. Use set VONAGE_APPLICATION_ID in your environment."
|
||||
)
|
||||
|
||||
if not session_id:
|
||||
raise Exception(
|
||||
"No Vonage Session ID specified. Use set VONAGE_SESSION_ID in your environment."
|
||||
)
|
||||
|
||||
if not token:
|
||||
raise Exception("No Vonage token specified. Use set VONAGE_TOKEN in your environment.")
|
||||
|
||||
return (application_id, session_id, token)
|
||||
@@ -586,9 +586,9 @@ class AssemblyAISTTService(WebsocketSTTService):
|
||||
await self._call_event_handler("on_connected")
|
||||
logger.debug(f"{self} Connected to AssemblyAI WebSocket")
|
||||
except Exception as e:
|
||||
self._websocket = None
|
||||
self._connected = False
|
||||
await self.push_error(error_msg=f"Unable to connect to AssemblyAI: {e}", exception=e)
|
||||
raise
|
||||
|
||||
async def _disconnect_websocket(self):
|
||||
"""Close the websocket connection to AssemblyAI."""
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user