Compare commits

..

175 Commits

Author SHA1 Message Date
Mark Backman
9054912dfb Update to add_workers 2026-05-21 23:20:40 -04:00
Mark Backman
0b9500aae4 Match shopping-list client styling to the other UI demos
Restyle from a bespoke dark theme to the light theme the other UI demos
share: the canonical :root tokens (--border, --muted, --highlight), the
#fafafa/#18181b body, the sticky white header with the light/red Connect
button, the fixed bottom-right #status toast, and the amber
ui-highlight-pulse keyframe. index.html drops the custom topbar wrapper for
the standard <header> plus a standalone #status element.
2026-05-21 23:20:40 -04:00
Mark Backman
10b8feb9ea Add shopping-list UIWorker example (bridge-free voice + UI)
Demonstrates the 'every input acts, may speak' pattern without bridging: a
standard voice pipeline (STT → LLM → TTS) whose LLM only converses, plus a
separate UIWorker that does all the list work. The voice pipeline's user
aggregator fires on_user_turn_stopped each turn and dispatches the transcript
to the UIWorker as a respond job (a bus message); the UIWorker reads the
auto-injected <ui_state> snapshot and drives the list silently via add_item /
set_checked / remove_item commands (plus the standard highlight). Items are
checkboxes whose label and checked state the snapshot exposes.

Includes a vanilla-JS client following the existing UI-demo client style.
2026-05-21 23:20:40 -04:00
Mark Backman
1c94feaaff Inject <ui_state> via the LLM's on_before_process_frame hook
Move <ui_state> snapshot injection out of respond_with_llm into a
cross-cutting on_before_process_frame handler on the UIWorker's LLM, so it
appends the current snapshot to the context the request is built from, just
before each inference. Injection is gated to the user-turn-initiating
inference so a tool-calling turn never stacks duplicate <ui_state> blocks;
respond_with_llm no longer injects manually.

Also drop the bridged parameter from UIWorker: there is no viable way to
bridge a UIWorker between workers — a shared, teed context would be polluted
by the injection, and per-worker turn detection off teed frames isn't
supported. Other workers keep their PipelineWorker bridging.
2026-05-21 23:20:40 -04:00
Mark Backman
950fc10f05 Add document-review UIWorker example
Synthesis example: a ReplyToolMixin UIWorker adds a start_review tool that fans
out to clarity/tone peers via start_user_job_group, translates each reviewer
response into an add_note command in on_job_response, handles a client
note_click event via @on_ui_event, and keeps history across turns.
2026-05-21 23:20:40 -04:00
Mark Backman
07725429b2 Add async-tasks UIWorker example
A UIWorker with a custom reply tool fans research out to three BaseWorker peers
via start_user_job_group; their progress streams to the client as ui-task cards
and the user can cancel a group mid-flight.
2026-05-21 23:20:40 -04:00
Mark Backman
6b0e204d66 Add form-fill UIWorker example
A ReplyToolMixin UIWorker that fills inputs (fills) and toggles checkboxes /
presses submit (click) by voice — the state-changing half of the standard
action set.
2026-05-21 23:20:40 -04:00
Mark Backman
f826da9ac9 Add deixis UIWorker example
A ReplyToolMixin UIWorker that grounds in the user's text selection (the
<selection> block in the snapshot) and points back via select_text — both
directions of deictic reference.
2026-05-21 23:20:40 -04:00
Mark Backman
81b956d963 Add pointing UIWorker example
The voice LLM delegates to a ReplyToolMixin UIWorker that scrolls offscreen
items into view and highlights the phones it names — exercising the scroll_to /
highlight UI commands and the [offscreen] state tag.
2026-05-21 23:20:40 -04:00
Mark Backman
2254a8d0a2 Add hello-snapshot UIWorker example
Smallest UIWorker demo: a voice LLM in the main pipeline delegates
screen-relevant utterances to a UIWorker via a respond job; the UIWorker
auto-injects the current <ui_state> and answers grounded in what's on screen.
Includes a vanilla-JS client that streams accessibility snapshots over RTVI.
2026-05-21 23:20:40 -04:00
Mark Backman
f1f5a986e8 Add UIWorker
UIWorker is an LLMContextWorker that observes and drives a client GUI over the
RTVI UI channel: it stores accessibility snapshots, auto-injects <ui_state> at
the start of each respond job, dispatches client events to @on_ui_event
handlers, sends UI commands back to the client, and surfaces fan-out work as
cancellable task cards via user_job_group(). The optional ReplyToolMixin exposes
a bundled reply tool.

The prompt_guide parameter auto-appends the UI wire-format guide to the LLM's
system instruction (default UI_STATE_PROMPT_GUIDE; override with a string or
disable with None), so the LLM can parse the injected <ui_state> / <ui_event>
messages without the app concatenating the guide by hand.
2026-05-21 23:20:40 -04:00
Mark Backman
02667a7255 Add native RTVI⇄bus UI bridge to PipelineWorker
When RTVI is enabled, PipelineWorker now republishes inbound ui-event /
ui-snapshot / ui-cancel-task messages onto the bus as a broadcast
BusUIEventMessage, and translates outbound BusUICommandMessage / BusUITask*
carriers into the matching RTVI frames. This lets a UIWorker on the bus observe
and drive the client UI with no decorator or manual wiring; when no UIWorker is
present the events are simply unconsumed.

The BusUI* carriers live in the bus layer so both pipeline and workers can
reference them without an import cycle.
2026-05-21 23:03:37 -04:00
Mark Backman
ee3d1128ec Add LLMService.append_system_instruction()
Composes durable text onto a user-provided system instruction (alongside the
turn-completion and async-tool-cancellation addons) so it is prepended on every
inference and survives context-message resets. The user's base prompt is now
snapshotted once and the effective instruction is always rebuilt from it,
replacing the prior lazy capture/restore logic with a single invariant.
2026-05-21 23:03:37 -04:00
Aleix Conchillo Flaqué
e8ec7c585f Rename PipelineRunner.add_worker() to variadic add_workers(*workers)
Lets callers register multiple workers in a single call instead of
awaiting add_worker() repeatedly. Updates all examples, docs, tests,
and proxy worker docstrings to use the new API.
2026-05-21 19:46:53 -07:00
Aleix Conchillo Flaqué
f91179a640 Forward active from PipelineWorker through to BaseWorker
PipelineWorker.__init__ was only forwarding `name` to BaseWorker, so
the `active` flag (the other BaseWorker constructor arg) wasn't
reachable from PipelineWorker callers. Add `active: bool = True` to
the signature and pass it through.
2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
e85f3fe606 update uv.lock 2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
d07ba562eb Separate bus messages from pipeline frames
BusMessage was a mixin tacked onto DataFrame / SystemFrame so the bus
could reuse the frame priority machinery. That made every bus message
also a Frame, which is misleading — bus messages travel on the bus, not
through pipelines. If a worker actually needs to ship a frame, it wraps
it in BusFrameMessage.

BusMessage is now a plain dataclass base carrying source/target.
BusDataMessage and BusSystemMessage are empty subclasses that exist
only as priority markers. The bus router and the priority queue check
``isinstance(item, BusSystemMessage)`` directly instead of
``isinstance(item, SystemFrame)``.

The serializer test that round-tripped DataFrame.name (a non-init
field) is rewritten against a local _MessageWithNonInit(BusDataMessage)
subclass so the serializer's init=False path stays covered.
2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
b03247f360 Rename BaseTask → BaseWorker and reserve "task" for asyncio
Replaces every "task" identifier that referred to the BaseTask
abstraction with "worker". Asyncio task plumbing (asyncio.Task,
BaseTaskManager, TaskManager, create_task, cancel_task, etc.) stays
untouched. Highlights:

- Classes: BaseTask → BaseWorker, PipelineTask → PipelineWorker,
  LLMTask → LLMWorker, LLMContextTask → LLMContextWorker, TaskBus →
  WorkerBus, TaskRegistry → WorkerRegistry, TaskActivationArgs →
  WorkerActivationArgs, TaskReadyData → WorkerReadyData,
  TaskRegistryEntry → WorkerRegistryEntry, TaskObserver →
  WorkerObserver, all Bus*TaskMessage → Bus*WorkerMessage,
  BusAddTaskMessage.task field → worker, BusWorkerRegistryMessage.tasks
  field → workers.
- Methods/decorators: activate_task → activate_worker, deactivate_task
  → deactivate_worker, add_task → add_worker, watch_task →
  watch_worker, @task_ready → @worker_ready, setup_pipeline_task hook
  → setup_pipeline_worker.
- Params/fields: FrameProcessorSetup.pipeline_task and
  FunctionCallParams.pipeline_task → pipeline_worker. Parameter names
  like task_name → worker_name; spawn/run accept worker:.
- Files: pipeline/base_task.py → base_worker.py, pipeline/task.py →
  worker.py (plus a re-export shim at pipeline/task.py),
  task_observer.py → worker_observer.py, task_ready_decorator.py →
  worker_ready_decorator.py, pipecat.tasks → pipecat.workers,
  llm_task.py → llm_worker.py, llm_context_task.py →
  llm_context_worker.py, examples/multi-task → examples/multi-worker.

Back-compat:
- PipelineTask kept as a deprecated subclass of PipelineWorker that
  warns on construction.
- pipecat.pipeline.task re-exports PipelineWorker/PipelineTask/etc. so
  existing user imports keep working.
- FrameProcessor.pipeline_task kept as a deprecated property that
  forwards to pipeline_worker.

Local variables in examples that hold a worker (task = PipelineTask(...))
are renamed to worker = PipelineWorker(...). Asyncio-task locals
(runner_task, etc.) are preserved.
2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
b9aed0d673 Rename BaseTask.send_error to send_bus_error_message
Symmetric with send_bus_message; "send_bus_error" on its own reads
ambiguously (sounds like an error about the bus, à la SIGBUS) and the
underlying types are BusTaskErrorMessage / BusTaskLocalErrorMessage,
so keeping "_message" in the name matches what's actually sent.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
d8947c68a9 Rename BaseTask.send_message to send_bus_message
Mirrors on_bus_message and makes it explicit that the call goes out on
the task bus, not on a transport (transports have their own
send_message for client/peer messaging).
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
373894fc65 Fold BaseTask.handoff_to into activate_task(deactivate_self=...)
BaseTask.handoff_to was just deactivate_self + activate_task. Remove
it and add a deactivate_self flag on activate_task instead, so there's
one entry point for activating another task.

LLMTask now overrides activate_task (mirroring its end() override) to
keep the messages / result_callback hooks that finish an in-progress
tool call before the target is activated. All multi-task examples and
unit tests switch to the new call.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
e8bbb5ee09 Add setup_pipeline_runner hook to PIPECAT_SETUP_FILES
PipelineRunner now picks up an async setup_pipeline_runner(runner) hook
from the same PIPECAT_SETUP_FILES env var that PipelineTask already uses
for setup_pipeline_task. Previously the runner used a separate
PIPECAT_RUNNER_SETUP_FILES variable and a setup_runner function — both
are removed.

A new _setup_files module hosts the loader for both hooks and caches
each setup file's module so a single file defining both hooks (e.g. a
debugger that registers a runner-level task in one hook and a per-task
observer in the other) sees its module-level state preserved across
invocations.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
a2e58044f2 update pyproject.toml and uv.lock 2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
8867426a97 Document sensor-controller example in the multi-task README
Add a Local-section entry with the running instructions, example
questions, and architecture diagram for the new sensor-controller
example.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
d984393213 Make local-handoff builder functions public
Rename ``_build_greeter`` / ``_build_support`` to ``build_greeter`` /
``build_support`` to match the convention used by other multi-task
examples (e.g. ``build_sensor_controller``). They're public factories
the example exposes; the leading underscore was misleading.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
959fb831f1 Drop redundant name strings from create_task calls in bus + proxy
``BaseObject.create_task`` already auto-names the task based on the
coroutine; the explicit ``f"{self}::..."`` strings duplicated that
default and made the call sites noisier. Remove them.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
410190dabb Add sensor-controller multi-task example
A voice agent talking to a worker that owns a simulated temperature
sensor. Demonstrates two ``PipelineTask`` instances side by side
communicating purely via ``BusJobRequestMessage`` /
``BusJobResponseMessage`` — the worker is a plain ``PipelineTask``
(no ``LLMTask`` subclassing, not bridged) whose pipeline runs both an
autonomous sensor tick loop and its own tool-calling LLM:

    SensorReader -> SensorStats -> user_agg -> llm -> assistant_agg

The voice agent's LLM has a single tool, ``ask_controller(question)``,
that forwards the user's request verbatim to the worker and speaks
back the controller's reply. The worker LLM has direct tools to read
the current temperature, inspect rolling stats, set the target, or
change the response rate; the sensor simulation drifts toward the
target with a first-order lag plus Gaussian noise.

Job responses are paired with completed LLM turns via the assistant
aggregator's ``on_assistant_turn_stopped`` event, skipping empty
turn-stopped events that fire between a tool call and its result.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
f22350ce2f Use symmetric spawn-then-run() pattern in multi-task examples
Switch every example to ``await runner.spawn(task)`` followed by
``await runner.run()`` (no task argument), and ``await runner.cancel()``
on client-disconnected instead of ``await task.cancel()``. This makes
the main pipeline task look the same as the worker / proxy tasks
spawned alongside it, and lets ``runner.cancel()`` drive a uniform
shutdown across every root task on the bus.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
5f1b91bb89 Clarify PipelineRunner.run() docstring for the no-task form
Spell out that spawned tasks finishing on their own does not unblock
``runner.run()`` when called without a ``task`` argument. The form is
for hosts (e.g. FastAPI servers) that have no single "main" pipeline
and want to stay up across many spawned sessions; callers who want
the runner to finish when a specific pipeline finishes should pass
that pipeline as ``task``.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
cd22742e10 Default WebSocketProxyClientTask to active=False
``on_activated`` on this task opens the upstream WebSocket connection,
which is almost always something the caller wants to trigger
explicitly (e.g. on local-client-connected). With the BaseTask default
of ``active=True`` the connection was opened twice: once when the task
auto-activated at start, and once again when the caller's
``activate_task("proxy")`` re-fired ``on_activated``. The result on
the remote side was two ``PipelineRunner`` instances per session
instead of one.

Default to ``active=False`` so the activation is a deliberate signal;
pass ``active=True`` explicitly to restore the eager-connect behavior.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
9ecb00d097 Skip pgmq/redis lazy-import tests when their extras are not installed
``test_pgmq_bus_lazy_import`` and ``test_redis_bus_lazy_import``
import ``pipecat.bus.network.pgmq`` / ``redis`` directly, which raises
when the optional ``pgmq`` / ``redis`` packages are missing. Gate each
test with ``@unittest.skipUnless`` on a top-level probe of the
underlying package so they're skipped (not errored) in environments
without the extras. ``test_unknown_attribute_raises`` is unaffected.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
79ae9740cc Skip pgmq/redis bus tests when their extras are not installed
The PGMQ and Redis bus modules raise an ``Exception`` at import time
when the optional ``pgmq`` / ``redis`` packages are missing, which broke
``pytest`` collection in environments without those extras (e.g. CI
that uses ``--no-extra gstreamer --no-extra local``). Wrap the imports
in ``try/except`` and ``raise unittest.SkipTest`` so the whole test
module is skipped cleanly instead of failing collection.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
df704a34f1 Move _wait_tasks_ready into the job-group internals section in BaseTask
``_wait_tasks_ready`` is only called from
``create_job_group_and_request_job``, so it belongs with the other
job-group internals (``_create_job_group``, ``_send_job_request``,
``_task_timeout``, ...) rather than next to the task-readiness
helpers (``_register_ready``, ``_on_watched_task_ready``).
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
7dc2b41412 Drive task end/cancel shutdown from BaseTask by default
``BaseTask._handle_task_end`` and ``_handle_task_cancel`` now call
``stop()`` after propagating to children, so bus-only subclasses
(``WebSocketProxyClientTask``, ``WebSocketProxyServerTask``, custom
worker tasks like ``CodeWorker``) don't need to override these
handlers just to set ``_finished_event``.

Children-propagation is extracted into ``_propagate_end_to_children``
and ``_propagate_cancel_to_children`` so ``PipelineTask`` can call
them directly without invoking ``stop()`` prematurely — the pipeline
still drives its own shutdown through the ``EndFrame`` / ``CancelFrame``
path, which triggers ``on_pipeline_finished`` and ``stop()`` after the
pipeline drains.

Drop the now-redundant overrides from the WebSocket proxy tasks.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
4d9e258e55 Collapse _pipeline_finished_event into BaseTask._finished_event
PipelineTask had its own ``_pipeline_finished_event`` that signalled
"pipeline run has truly finished" — the same role ``BaseTask._finished_event``
plays for bus-only tasks. They were two events with the same intent.

Set ``_finished_event`` directly when the pipeline-end frame propagates
through the sink, drop the now-redundant field, and drop the
``clear()`` after wait so the event stays set for the lifetime of the
task. As a side-effect, ``await pipeline_task.wait()`` from outside now
resolves at the moment the pipeline finishes, matching the semantics
of bus-only tasks.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
de1bd7cb7e code-assistant: work around CancelledError swallow in ClaudeSDKClient
claude_agent_sdk's _AsyncioTaskHandle.wait() uses
`with suppress(asyncio.CancelledError)` to silence the inner read
task's expected cancellation, but it also swallows the outer task's
cancellation if it lands on the same await — causing cancel_task to
time out.

Bypass `async with ClaudeSDKClient` and drive connect/disconnect
ourselves so disconnect() runs in a finally where the outer
CancelledError has already been raised and suspended by Python's
exception machinery, out of reach of the SDK's suppress.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
a5bb9f65de Fix LLMTask._finish_function_call to bypass deferral
self.queue_frame would defer the LLMMessagesAppendFrame because
_finish_function_call always runs inside a tool call. The subsequent
_flush_pipeline() then returned before the goodbye/handoff LLM output
was actually delivered. Use super().queue_frame to push the frame
straight into the pipeline, matching the pattern used in
_flush_pipeline().
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
402cf8dade Port multi-task unit tests from pipecat-subagents
Brings over 215 tests across 15 files covering the new
multi-task framework: BaseTask / PipelineTask bus lifecycle,
job RPC and job groups, the bus message hierarchy and serializers,
TaskBus + AsyncQueueBus + RedisBus + PgmqBus (with direct and
isolated backends), TaskRegistry, the BusBridgeProcessor, the
WebSocket proxy tasks, the LLMTask deferral logic, and the
PipelineRunner spawn-and-attach flow.
2026-05-21 10:13:21 -07:00
Jon Taylor
d757d8d06d Split PgmqBus into orchestrator + pluggable backends
Move the wire-side of PGMQ operations into a new
``pipecat.bus.network.pgmq_backends`` module with a ``PgmqBackend``
Protocol, a ``DirectPgmqBackend`` (peers discovered by queue prefix),
and an ``IsolatedPgmqBackend`` (SECURITY DEFINER ``public.bus_*``
wrappers over an asyncpg pool). ``PgmqBus`` now delegates join,
publish, read, archive, and leave to the configured backend.

Construct ``PgmqBus`` with either ``pgmq=PGMQueue`` (uses
``DirectPgmqBackend``) or ``backend=PgmqBackend`` (any backend); the
two are mutually exclusive.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
a63abc41b6 Add README and env.example for multi-task examples
Adapts the pipecat-subagents `examples/README.md` to the new
layout (`multi-task/` umbrella, `local-handoff/`, `distributed-handoff/`,
`remote-proxy-assistant/`, `parallel-debate/`, `code-assistant/`),
updates the agent→task / job-RPC vocabulary, drops the
single-agent and llm-and-flows examples (gone in the port), and
adds a new section for the PGMQ handoff transport.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
4c5fb85856 Add pgmq and redis extras for the distributed bus implementations
`pipecat.bus.network.pgmq` and `pipecat.bus.network.redis` need
optional dependencies. Adding `pgmq` and `redis` extras so users
can `pip install pipecat-ai[pgmq]` / `pip install pipecat-ai[redis]`
to opt in.
2026-05-21 10:13:18 -07:00
Aleix Conchillo Flaqué
4fbeb5fbcb Add remote-proxy-assistant example
Demonstrates the WebSocket proxy tasks: a local `main.py` voice
bot uses `WebSocketProxyClientTask` to forward bus messages
(including `BusFrameMessage`s) to a remote `assistant.py`
FastAPI server. Each incoming connection spawns a
`WebSocketProxyServerTask` plus an `LLMTask` assistant on a
per-session `PipelineRunner`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
4509caa724 Add distributed-handoff examples (redis and pgmq)
Two transports of the same shape: a main task that hosts the
voice pipeline plus a network-backed `TaskBus` (`RedisBus` or
`PgmqBus`), and a standalone `llm.py` worker process for the
greeter / support LLM. Workers connect to the same bus channel,
register on the shared `TaskRegistry`, and the main task waits
on `runner.registry.watch("greeter", ...)` before sending the
welcome activation so it doesn't fire before the worker is up.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
0f7211d072 Add parallel-debate example
A voice moderator that fans out a debate topic to three worker
tasks (advocate, critic, analyst) via `task.job_group(...)`,
then synthesizes their replies. Workers are `LLMContextTask`s
that keep their own conversation context across rounds and use
the assistant-aggregator's `on_assistant_turn_stopped` event
to ship the completed turn back as a job response.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
7c4294b7f6 Add local-handoff-two-agents-tts example
Variant of the local handoff example with per-task TTS voices.
Each child task wraps the LLM with its own `CartesiaTTSService`
in a custom pipeline override, so the main task has no TTS and
audio comes from whichever child is active over the bus.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
6964686808 Add code-assistant example
Voice code assistant that dispatches questions to a Claude Agent
SDK worker. The main task runs the voice pipeline (STT + LLM + TTS)
and an `ask_code` direct function. `CodeWorker` is a bus-only
`BaseTask` spawned on the runner: it accepts `@job`-style
requests through the bus, queues them onto an asyncio queue, and
runs them sequentially through a persistent Claude SDK session so
follow-ups share context. The example shows the job-RPC surface
(`task.job("code_worker", ...)`), bus-only tasks (no pipeline),
and the `pipeline_task` field on `FunctionCallParams`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
f364c088cf Add local-handoff-two-agents example
Two LLM tasks (greeter and support) handing off to each other over
the local `AsyncQueueBus`. The main task owns the transport
pipeline (STT, TTS, transport I/O) and the child tasks each run
their own LLM behind a `BusBridgeProcessor`. Each child uses
`bridged=()` so `PipelineTask` auto-wraps its pipeline with
the bus edge processors, and `transfer_to_agent` / `end_conversation`
tools demonstrate `handoff_to(...)` and `end(...)`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
42204c4d0f Fix pyright errors in new bus/task/proxy code
- `TaskBus._router_task`: cast the narrowed `SystemFrame` back
  to `BusMessage` for the subscriber callback.
- `bus.network.__init__`: expose `PgmqBus` / `RedisBus` to
  the type-checker via a TYPE_CHECKING block so `__all__` is
  satisfied; runtime path still goes through `__getattr__`.
- `RedisBus`: subscribe through a local before assigning
  `self._pubsub`, and `assert self._pubsub is not None` in
  the reader loop.
- `BaseTask.on_job_error` accepts
  `BusJobResponseMessage | BusJobResponseUrgentMessage` to match
  what is dispatched.
- `JobGroupContext.__aexit__` / `JobContext.__aexit__`: assert
  `self._group is not None` before `wait()`.
- `@task_ready` collector: type handlers dict as `dict[str, Callable]`
  so the `.__name__` read on a duplicate handler typechecks.
- WebSocket proxy client/server: assert the socket is set in
  `_receive_loop`, and decode `str` payloads to bytes before
  handing them to the serializer.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
5f86e39038 Port WebSocket proxies to the new BaseTask API
`WebSocketProxyServerAgent` / `WebSocketProxyClientAgent` are
renamed to `WebSocketProxyServerTask` / `WebSocketProxyClientTask`
and updated for the post-refactor surface:

- Drop `bus=` from the constructor; the bus arrives via
  `BaseTask.attach` from the runner.
- Constructor params `agent_name` / `remote_agent_name` /
  `local_agent_name` → `task_name` / `remote_task_name` /
  `local_task_name` (matching `BusBridgeProcessor`).
- Move setup logic from the now-removed `on_ready` hook into
  `start()`; replace `_stop()` overrides with `stop()`.
- Add `_handle_task_end` / `_handle_task_cancel` overrides that
  set `_finished_event` so `PipelineRunner._cancel_spawned_tasks`
  can drive these bus-only tasks to a clean exit.
- Update the registry-message field reference
  (`agents=`/`message.agents` → `tasks=`/`message.tasks`)
  and `TaskReadyData.task_name` access.
- Tighten the server's `_send_ws` exception handling to only
  catch `WebSocketDisconnect`.
- Update install hints (`pipecat-ai[websockets-base]` for the
  client, `starlette` for the server) and refresh docstrings/
  examples to use `runner.spawn(...)`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
86f8f137a8 Sweep agent->task and task->job in docstrings and identifiers
Cleans up leftover "agent" terminology in module/class/method
docstrings across `pipecat.bus`, `pipecat.registry`,
`pipecat.pipeline`, and `pipecat.tasks.llm`, and renames
job-RPC phrasing ("task request", "task identifier",
"task group execution") to use "job" consistently.

API-visible changes:

- `BusBridgeProcessor(agent_name=, target_agent=)` → `task_name=` /
  `target_task=`.
- `@task_ready` decorator's internal marker
  `fn.agent_ready_name` → `fn.task_ready_name`.
- `@tool` decorator's internal marker
  `fn.is_agent_tool` → `fn.is_llm_tool`.
- `PIPECAT_SUBAGENTS_SETUP_FILES` env var →
  `PIPECAT_RUNNER_SETUP_FILES`.
- pgmq/redis bus install hints point at `pipecat-ai\[extra\]`
  rather than the old `pipecat-ai-subagents\[extra\]` package.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
e546471bef Rename _handle_task_* job dispatchers to _handle_job_*
Fixes a name collision where `_handle_task_cancel` was defined
twice — once for `BusCancelTaskMessage` (task lifecycle) and
once for `BusJobCancelMessage` (job RPC) — the second silently
shadowing the first. Job-side dispatchers are now consistently
named `_handle_job_*` and the internal helpers
`_run_task_handler` / `_send_task_request` become
`_run_job_handler` / `_send_job_request`. Task-lifecycle
handlers (`_handle_task_end`, `_handle_task_cancel`,
`_handle_task_activate`, `_handle_task_deactivate`,
`_handle_task_error`) keep their names.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
6d87765648 LLMTask/LLMContextTask: fix LLMService type 2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
922293ae76 Spawn the main task before setup so attach happens uniformly
`PipelineRunner.run(task)` now calls `spawn(task)` first (which
runs `task.attach()`) and lets `_setup_session` start every
registered entry — main and pre-spawned — through the same path,
instead of relying on `spawn`'s post-running fast-path to start
the main task after setup. The two-branch wait stays for the
`task is None` case but reads the runner_task directly off the
freshly-spawned entry.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
eb4f0ac1ae Add changelog for #4493 2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
7506af5861 Replace set_registry with attach(*, registry, bus) on BaseTask
`BaseTask` no longer takes `bus=` in its constructor. Instead
the runner now hands both the registry and the bus to a task via
`task.attach(registry=..., bus=...)` (called from
`PipelineRunner.spawn()`), and `bus` / `registry` are
properties that raise if accessed before attach. `PipelineTask`,
`LLMTask`, and `LLMContextTask` lose their `bus=` parameters
to match, and `_BusEdgeProcessor` now stores only a task
reference and reads `task.bus` lazily so bridged pipelines work
even though the bus isn't known at construction time.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
ef806163b2 Tighten the pipeline_task contract for processors and tools
`FrameProcessorSetup.pipeline_task` is now mandatory and
`FrameProcessor.pipeline_task` raises if accessed before setup
instead of returning `None`. `FunctionCallParams` gains a
required `pipeline_task` field and `LLMService._run_function_call`
populates it (plus reads `app_resources` directly off the
pipeline task). Tests that build a processor or
`FunctionCallParams` outside a real pipeline stub it with a
`SimpleNamespace`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
7d28c46a5d Add tasks package with LLMTask, LLMContextTask, and proxy stubs
Adds `pipecat.tasks.llm` with `LLMTask` (LLM pipeline + `@tool`
collection + tool-call deferral via `PipelineFlushFrame`),
`LLMContextTask` (LLM + `LLMContextAggregatorPair`), and the
`@tool` decorator. Also includes `pipecat.tasks.proxy.websocket`
client/server stubs that need a follow-up port to the new
`BaseTask` lifecycle.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
befaa9ff27 Rewrite PipelineRunner around bus + spawn
`PipelineRunner` now owns the shared `TaskBus` and
`TaskRegistry` and runs all tasks (the main one plus any
spawned ones) through a unified `_start_task` / `_run_task`
background-task path. Adds `spawn(task)` for fire-and-forget
task registration, threads `end()` / `cancel()` through
`BusEndTaskMessage` / `BusCancelTaskMessage` to all root
tasks, and broadcasts/handles `BusTaskRegistryMessage` for
remote-runner discovery. The runner now wires its own
`TaskManager` via `super().setup(...)` so internal
`create_task` calls go through `BaseObject`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
b5c757ab85 Make PipelineTask inherit BaseTask and support bridged pipelines
`PipelineTask` now extends `BaseTask` so every pipeline task is
also a bus participant. Adds optional `bus`, `bridged`, and
`exclude_frames` parameters: when `bridged` is set, the user's
pipeline is wrapped with `_BusEdgeProcessor` source/sink edges so
frames are mirrored onto the bus. Bridges pipeline lifecycle
events to `start()`/`stop()`, overrides `_handle_task_end` /
`_handle_task_cancel` to drive the pipeline shutdown, subscribes
to the bus in setup, and exposes the `bridged` property to the
registry. Moves `PipelineTaskParams` here and updates the
matching test import.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
6a738bd3a0 Replace BasePipelineTask with BaseTask
Drops the old abstract `BasePipelineTask` and replaces it with
`BaseTask` — the common base for any runtime task. `BaseTask`
subscribes to a `TaskBus`, participates in the shared
`TaskRegistry`, handles activation / deactivation, end / cancel,
and the full `@job` RPC surface (request_job, job, job_group,
send_job_response / update / stream_*, etc.). It ships a default
`run()` for bus-only tasks; subclasses with their own runtime
(e.g. `PipelineTask`) override it.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
c0b2a8c572 Add job context and decorators
Adds `JobContext` / `JobGroupContext` async context managers,
the `JobGroup` / `JobGroupEvent` / `JobGroupResponse` /
`JobGroupError` types, the `@job` decorator (with collector),
and the `@task_ready` decorator (with collector). These power
the bus-driven job RPC between tasks.
2026-05-21 10:12:51 -07:00
Jon Taylor
7e2055b7d0 Add PgmqBus for distributed agents
Adds ``pipecat.bus.network.pgmq.PgmqBus``, a PGMQ-backed
:class:`TaskBus` adapter that implements pub/sub fan-out over
PGMQ's point-to-point queue semantics. Each bus instance owns its
own queue, broadcasts on publish to peers discovered by channel
prefix, and long-polls its queue to dispatch received messages
to local subscribers.

Requires the optional ``pgmq`` extra
(``pip install pipecat-ai[pgmq]``).
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
5d94506265 Add task bus package
Introduces `TaskBus`, the in-process `AsyncQueueBus`, the bus
message hierarchy (lifecycle, jobs, frames, registry), a
priority-aware bus queue, the `BusSubscriber` mixin, and the
`BusBridgeProcessor` / internal `_BusEdgeProcessor` used to
exchange frames between a local pipeline and the bus.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
30df8e4ca5 Add task registry package
Introduces `TaskRegistry` and the supporting `TaskReadyData`,
`TaskErrorData`, and `TaskRegistryEntry` dataclasses used to track
local and remote tasks discovered through the bus.
2026-05-21 10:12:51 -07:00
Mark Backman
780c004168 Merge pull request #4423 from joycech333/feat/inception-llm-service
feat: add Inception LLM service with Mercury 2 support
2026-05-21 12:02:27 -04:00
Mark Backman
28f9203401 Code review fixes 2026-05-21 11:45:17 -04:00
joycech333
77cc314a08 feat: add Inception LLM service with Mercury-2 support
Adds InceptionLLMService, an OpenAI-compatible service for Inception's
Mercury-2 diffusion-based reasoning model. Supports reasoning_effort
(instant/low/medium/high) and realtime mode for reduced TTFT.
2026-05-21 11:23:23 -04:00
Mark Backman
4a8d1d0b5e Merge pull request #4532 from pipecat-ai/mb/cleanup-logging-after-smart-text-handling
Clean up smart text logging
2026-05-21 08:35:46 -04:00
Mark Backman
87f5d60693 Merge pull request #4531 from pipecat-ai/mb/pipecat-prebuilt-1.0.1
chore: bump pipecat-ai-prebuilt to 1.0.1
2026-05-21 08:35:31 -04:00
Mark Backman
c699b31daa Merge pull request #4534 from pipecat-ai/mb/changelog-4521
Add changelog for #4521
2026-05-21 08:35:15 -04:00
Mark Backman
ee674ffb01 Add changelog for #4521 2026-05-20 17:57:43 -04:00
mihafabcic-soniox
86a5710801 Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521) 2026-05-20 17:54:48 -04:00
Mark Backman
4a96b2a9e6 Clean up smart text logging 2026-05-20 15:38:59 -04:00
Mark Backman
105d6f27da Merge pull request #4514 from pipecat-ai/mb/websocket-stt-service-exception-handling
Align websocket STT connection failures
2026-05-20 15:15:35 -04:00
Filipi da Silva Fuchter
e0e3cd336a Merge pull request #4529 from pipecat-ai/filipi/squash_skill
New skill to squash commits.
2026-05-20 16:06:23 -03:00
Mark Backman
9586db5b50 Preserve websocket reconnect failure retries 2026-05-20 14:45:29 -04:00
Mark Backman
a890ab7b21 Add changelog for PR #4531 2026-05-20 12:18:03 -04:00
Mark Backman
c1bf7dbb4a chore: bump pipecat-ai-prebuilt to 1.0.1 2026-05-20 12:15:09 -04:00
Mark Backman
709a0ce839 Merge pull request #4527 from pipecat-ai/mb/fix-elevenlabs-keepalive-1008
Fix ElevenLabs keepalive racing context-init (1008 disconnects)
2026-05-20 11:21:17 -04:00
Mark Backman
be93350eae Merge pull request #4522 from pipecat-ai/mb/stt-latency-smallest
Add P99 latency for Smallest AI, Mistral, XAI STT
2026-05-20 11:21:00 -04:00
Mark Backman
4a96ab7073 Merge pull request #4524 from pipecat-ai/mb/fix-runner-imports
Improve runner optional transport handling
2026-05-20 11:16:16 -04:00
filipi87
c321f50e76 New skill to squash commits. 2026-05-20 10:29:03 -03:00
Filipi da Silva Fuchter
bca337f97e Merge pull request #4380 from pipecat-ai/filipi/smart_text
Smart Text Handling
2026-05-20 10:18:30 -03:00
filipi87
5d9e8c5ac5 Removing debug log. 2026-05-20 10:13:46 -03:00
Mark Backman
70773bce0a Add changelog for PR #4527 2026-05-20 09:08:47 -04:00
filipi87
8bdb49bd1a chore: add changelogs for word-timestamp and frame-ordering fixes 2026-05-20 10:03:30 -03:00
filipi87
81bb81c1d0 test: add automated tests for word tracking, frame sequencing, and Cartesia TTS
Adds tests for AggregatedFrameSequencer, WordCompletionTracker, and
word_timestamp_utils (including CJK language scenarios). Updates existing
Cartesia TTS and TTS frame ordering tests to cover the new behaviours.
2026-05-20 10:03:26 -03:00
filipi87
e1bdee598c fix: preserve raw_text through TTS pipeline for correct LLM context attribution
TTSTextFrame entries were losing their original text structure when word
timestamps were enabled. AggregatedTextFrame now carries a raw_text field with
the original LLM-produced text (including pattern delimiters such as
<card>...</card>). The assistant context receives properly-tagged content
rather than the cleaned words returned by the TTS provider. Also handles words
that straddle two sentence boundaries by splitting and attributing each part
to its correct source frame.
2026-05-20 10:03:21 -03:00
filipi87
185a89bb3b fix: strip Cartesia SSML tags from word timestamp entries
SSML markup (e.g. <spell>, <emotion>, <break>) was leaking into word entries
returned by the Cartesia word-timestamps API. Tags are now stripped before
processing so word-to-text attribution remains accurate when SSML is present
in the TTS input.
2026-05-20 10:03:15 -03:00
filipi87
6b9deefbe3 fix: preserve frame insertion order in BaseOutputTransport for equal PTS values
Frames sharing the same presentation timestamp were being reordered by the
priority queue. Adds a monotonic counter as a tiebreaker so frames with equal
PTS are always emitted in insertion order, preventing subtle audio/text
sequencing bugs.
2026-05-20 10:03:08 -03:00
filipi87
deefc32faf fix: hold skipped TTS frames in position until preceding spoken frames complete
Skipped frames (e.g. code blocks filtered via skip_aggregator_types) were
emitted to the assistant context immediately instead of waiting for preceding
spoken frames to finish. Introduces AggregatedFrameSequencer to hold each
frame's slot and flush only after all earlier spoken sentences are complete,
keeping context ordering correct.
2026-05-20 10:03:03 -03:00
Mark Backman
a5e6886b80 Fix ElevenLabs keepalive racing context-init (1008 disconnects)
The keepalive could fire for a new turn's context before that context's
voice_settings context-init was sent, making the keepalive the context's
first message (no voice_settings) and causing ElevenLabs to reject the
later init with a 1008 policy violation. The keepalive now only targets a
context once its context-init has been sent (tracked in _context_init_sent).
2026-05-20 08:59:01 -04:00
Mark Backman
d11a4ba0cd Use shared telephony route availability checks 2026-05-20 08:57:48 -04:00
Mark Backman
38407e091d Add p99 values for Mistral and XAI 2026-05-19 22:51:33 -04:00
Mark Backman
82cd931efa Merge pull request #4306 from YFortin/fix/azure-tts-last-word-race
fix(azure-tts): Route completion through word boundary queue to prevent last word from being missed
2026-05-19 22:27:50 -04:00
Mark Backman
33e5d1f89b Add changelog for PR #4522 2026-05-19 18:33:58 -04:00
Mark Backman
861dd23873 Add changelog for runner updates 2026-05-19 17:31:07 -04:00
Mark Backman
b825dd779e Clarify runner startup banner 2026-05-19 17:31:07 -04:00
Mark Backman
1487da53a9 Improve runner optional transport handling 2026-05-19 17:03:16 -04:00
Mark Backman
aff84a5d9e Add P99 latency for Smallest AI STT 2026-05-19 11:05:15 -04:00
Mark Backman
c09f6d5adb Merge pull request #4052 from Vonage/vonage_video_connector_transport
Vonage WebRTC Transport Integration
2026-05-19 10:56:20 -04:00
asilvestre
e2d249e5d9 adding uv.lock 2026-05-19 16:33:38 +02:00
asilvestre
956b39b0dc remove extraenous await in cleanup 2026-05-19 16:33:04 +02:00
Mark Backman
e298491068 Add changelog for websocket STT failure handling 2026-05-18 12:41:56 -04:00
Mark Backman
97b00042df Align websocket STT connection failures 2026-05-18 12:35:01 -04:00
asilvestre
bc769eaa82 Changing the example to use OpenAI 2026-05-18 14:40:56 +02:00
asilvestre
ee5aa4dc71 SubscribeSettings to be pydantic and comment fixes 2026-05-18 14:40:56 +02:00
asilvestre
dd38fbc735 add documentation entry 2026-05-18 14:40:56 +02:00
asilvestre
a1c40df471 add documentation entry 2026-05-18 14:40:56 +02:00
asilvestre
c4ff9300c9 fix linting and typechecking 2026-05-18 14:40:56 +02:00
asilvestre
cab4585cbb added changelog 2026-05-18 14:40:56 +02:00
Antoni Silvestre
18368d047e Linting and changes to adapt to v1.0 2026-05-18 14:40:56 +02:00
asilvestre
e3abb4b6d7 apply suggestions in PR 2026-05-18 14:40:56 +02:00
Antoni Silvestre
0fd971d59d Update src/pipecat/runner/types.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-18 14:40:56 +02:00
asilvestre
c61672194d Vonage Video Connector Transport 2026-05-18 14:40:49 +02:00
Filipi da Silva Fuchter
c51a817efa Merge pull request #4442 from pipecat-ai/filipi/runner_all_transports
Unified start route to make all transports available
2026-05-18 09:27:44 -03:00
Bismeet singh
d85eda6da8 Merge pull request #4507 from BismeetSingh/fix/elevenlabs-stt-service-crash-language
Fix/elevenlabs stt service crash language
2026-05-17 10:17:07 -04:00
Aleix Conchillo Flaqué
71feb42711 Merge pull request #4503 from pipecat-ai/changelog-1.2.1
Release 1.2.1 - Changelog Update
2026-05-15 15:19:55 -07:00
aconchillo
6b93ca0cb6 Update changelog for version 1.2.1 2026-05-15 22:18:46 +00:00
Aleix Conchillo Flaqué
b6ecce754b Merge pull request #4501 from pipecat-ai/aleix/fix-filter-incomplete-tool-calls
Fix filter-incomplete + function-calling deadlock
2026-05-15 15:11:45 -07:00
Aleix Conchillo Flaqué
d39e6bf921 Add changelog for #4501 2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
63064860ef Move OpenAITTSService instructions into Settings in the example
Mirrors the deprecation in ``OpenAITTSService.__init__``: ``instructions``
is now a Settings field. The constructor still accepts it for backward
compatibility but the canonical path is through ``Settings``.
2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
f5158d51e7 Add filter-incomplete + function-calling turn-management example
A copy of ``turn-management-filter-incomplete-turns.py`` extended with
a ``get_weather(location)`` direct function. Exercises the path where
the LLM responds to a complete user turn by calling a tool — used to
reproduce (and now verify the fix for) the ``_user_speaking`` gating
bug between filter-incomplete and function calls.
2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
94dbd2fa68 Broadcast UserTurnInferenceCompletedFrame on tool calls in filter-incomplete
With ``filter_incomplete_user_turns`` enabled, an LLM that responded to
a user turn by calling a tool (without first emitting a ✓ marker)
never finalized the user turn. ``UserStoppedSpeakingFrame`` stayed
deferred, the assistant aggregator kept ``_user_speaking=True``, and
when ``FunctionCallResultFrame`` arrived its ``not self._user_speaking``
gate dropped the context push — the LLM continuation never ran and
the call hung silently.

Broadcast ``UserTurnInferenceCompletedFrame`` on
``FunctionCallsStartedFrame`` (i.e. the moment the LLM commits to a
tool call, before the function dispatches), gated by a new
``_turn_completion_broadcasted`` flag so the ✓ path and the tool-call
path don't both fire. The flag resets in ``_turn_reset`` alongside
the other per-turn state.

Emitting on the start frame rather than ``LLMFullResponseEndFrame``
also shrinks the race window — ``UserStoppedSpeakingFrame`` (a
``SystemFrame``) has the maximum possible head start over the
``FunctionCallResultFrame`` (``DataFrame``) that follows.
2026-05-15 14:50:35 -07:00
Mark Backman
c6ea6c6522 Merge pull request #4500 from pipecat-ai/mb/update-gradium-endpoints
Update Gradium STT/TTS endpoints to region-neutral URLs
2026-05-15 15:59:14 -04:00
Mark Backman
58a22aeeb1 Add changelog for #4500 2026-05-15 15:19:39 -04:00
Mark Backman
5403aa56e4 Remove Gradium endpoint overrides from voice example
Drop the explicit US-region URLs so the example picks up the new
region-neutral defaults in GradiumSTTService and GradiumTTSService.
2026-05-15 15:17:12 -04:00
Mark Backman
0e0d76d020 Update Gradium endpoints to region-neutral URLs
Drop the EU-region default from the STT/TTS WebSocket URLs in favor of
the generic api.gradium.ai endpoint, and remove the explicit overrides
from the examples so they pick up the new defaults.
2026-05-15 15:02:05 -04:00
filipi87
b493ed8d3a Removing the websocket transport from elevenlabs example. 2026-05-15 10:11:38 -03:00
filipi87
c3338667b1 Mounting the prebuilt frontend UI and root redirect for all transports. 2026-05-15 10:06:47 -03:00
Aleix Conchillo Flaqué
ea296babe9 Merge pull request #4498 from pipecat-ai/changelog-1.2.0
Release 1.2.0 - Changelog Update
2026-05-14 14:47:47 -07:00
aconchillo
b13af2b053 Update changelog for version 1.2.0 2026-05-14 21:45:36 +00:00
Aleix Conchillo Flaqué
7b6d878f07 update uv.lock 2026-05-14 14:41:38 -07:00
Aleix Conchillo Flaqué
8e405f15aa changelog: fix 4446.change.md file name 2026-05-14 14:38:54 -07:00
Aleix Conchillo Flaqué
44a40e8eb2 Merge pull request #4497 from pipecat-ai/aleix/fix-tts-context-id-fallback
Fall back to _turn_context_id in get_active_audio_context_id
2026-05-14 13:34:34 -07:00
Aleix Conchillo Flaqué
ea97cb1a78 Add changelog for #4497 2026-05-14 13:22:50 -07:00
Aleix Conchillo Flaqué
22650b1b56 Move QwenLLMService model into Settings in the qwen example
Mirrors the deprecation in ``QwenLLMService.__init__``: ``model`` should
be passed via ``settings=QwenLLMService.Settings(model=...)`` instead of
as a direct constructor arg.
2026-05-14 13:22:07 -07:00
Aleix Conchillo Flaqué
b76831e677 Fall back to _turn_context_id in get_active_audio_context_id
TTS services whose wire protocol does not echo the context_id back on
incoming audio (Sarvam, Smallest, Soniox, Inworld, ...) call
``get_active_audio_context_id()`` to tag each chunk. That accessor
returned only ``_playing_context_id`` — the playback-side cursor set
asynchronously by ``_audio_context_task_handler`` when it pops a context
off the serialization queue.

Result: incoming audio that arrived in the gap between contexts or at
the very start of a turn (before the playback loop popped) had
``context_id=None`` and was dropped with
``unable to append audio to context: no context ID provided``.

Fall back to ``_turn_context_id`` (the synthesis-side cursor, set as
soon as the turn's context is created) so the gap is covered without
prematurely nulling the playback cursor.
2026-05-14 13:22:00 -07:00
Mark Backman
b57111743f Merge pull request #4495 from pipecat-ai/mb/soniox-stt-lang-counter 2026-05-14 15:57:31 -04:00
Mark Backman
dcbb0070c9 Add changelog for Soniox language selection 2026-05-14 15:42:43 -04:00
Mark Backman
73278d3309 Use majority language for Soniox transcripts 2026-05-14 15:18:43 -04:00
filipi87
c8efe319b3 Adding the changelog for the changes. 2026-05-14 11:10:33 -03:00
Mark Backman
49bda11ae8 Merge pull request #4482 from pipecat-ai/mb/soniox-stt-token-language
Propagate Soniox token language
2026-05-13 16:28:56 -04:00
Aleix Conchillo Flaqué
07640582ce Merge pull request #4467 from pipecat-ai/aleix/fix-tts-ttfb-tracing
Fix metrics.ttfb and partial output on TTS/STT/LLM OpenTelemetry spans
2026-05-13 13:10:52 -07:00
Mark Backman
078af6969a Merge pull request #4473 from timofey-TK/inworld-tts-v2
Add support for Inworld TTS v2 fields
2026-05-13 15:32:16 -04:00
Mark Backman
9f40ba21c2 Add changelog for Soniox language fix 2026-05-13 15:26:10 -04:00
Mark Backman
82f0896d6a Propagate Soniox token language 2026-05-13 15:23:22 -04:00
kompfner
7e4cd23de4 Merge pull request #4474 from pipecat-ai/pk/inworld-realtime-tools
Extend cancel_on_interruption=False to Inworld Realtime (best-effort + warning)
2026-05-13 15:12:34 -04:00
TimTk
97f50c8aa2 Address review: use resolve_language, narrow delivery_mode type, update changelog
- Replace custom LANGUAGE_MAP fallback in language_to_inworld_language with
  resolve_language(language, LANGUAGE_MAP, use_base_code=False) to match the
  pattern used by other services and restore the unverified-language warning
- Tighten delivery_mode type from str to Literal["STABLE", "BALANCED", "CREATIVE"]
- Update changelog entry to mention delivery_mode and language normalization
2026-05-13 21:43:02 +03:00
Mark Backman
08680732f6 Merge pull request #4475 from pipecat-ai/mb/cartesia-korean-fix
Fix Cartesia CJK timestamp spacing
2026-05-13 13:20:42 -04:00
Mark Backman
064b68aa01 Fix Cartesia CJK timestamp spacing 2026-05-13 13:13:40 -04:00
Filipi da Silva Fuchter
b0f8ea7e28 Merge pull request #4477 from pipecat-ai/filipi/nvidia_sagemaker_follow_up
NVidia TTS Sagemaker: Buffering audio to avoid glitches.
2026-05-13 14:06:44 -03:00
filipi87
ad50c8d5d5 Buffering audio to avoid glitches. 2026-05-13 14:01:03 -03:00
Timofey
39e7f9e354 Fix Inworld TTS v2 request fields 2026-05-13 11:17:31 +03:00
Aleix Conchillo Flaqué
7cc7968abb Fix pyright errors in service_decorators.py 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
52d8008783 Add LLM interruption changelog entry for #4467 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
a3ce963b54 Capture partial LLM output on interruption
traced_llm only attached the aggregated ``output`` attribute to the
span after the wrapped function returned successfully. When the LLM
call was cancelled mid-stream (e.g. interruption during generation),
the accumulated text was discarded — the span had no ``output``.

Moved the attribute assignment into the ``finally`` block alongside
the existing TTFB write so the partial text we already captured via
the patched ``push_frame`` lands on the span regardless of whether
``f`` returned normally, raised, or was cancelled.
2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
e70ee603b2 Add STT changelog entry for #4467 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
111e59a7b1 Apply the same span-scope fix to traced_stt
@traced_stt had the same root issue as @traced_tts: the span lifetime
was tied to a per-transcript handler call, which doesn't match the
operation we want to trace. Now uses the __set_name__ pattern to
install:

- A push_frame wrapper that drives one STT span per finalized
  TranscriptionFrame. The span is anchored at speech start
  (VADUserStartedSpeakingFrame.timestamp - start_secs) but lazy-opened
  on the first TranscriptionFrame. Opening earlier (on VAD or
  UserStartedSpeakingFrame) races with TurnTraceObserver._handle_turn_started,
  which runs as a background task via _call_event_handler (sync=False),
  so the span would end up parented to the previous turn. Deferring
  the open to the first TranscriptionFrame avoids that race because
  STT only emits transcripts well after the turn observer has set
  the current turn's context.

- A stop_ttfb_metrics wrapper that closes the span on the TTFB-timeout
  path (called with end_time != None from stt_service.py:566). The
  span is marked stt.timed_out=True and its end_time is pinned to
  the timeout's end_time (= _last_transcript_time) so the duration
  reflects when STT actually stopped responding, not when the timeout
  fired.

Span lifecycle:
- Open: lazy on first TranscriptionFrame of a segment.
- Close (success): finalized=True attaches metrics.ttfb and closes
  the span. Multiple finalized transcripts in a single turn produce
  multiple spans.
- Close (timeout): stop_ttfb_metrics(end_time=...) closes with
  stt.timed_out=True.
- Close (orphan): UserStoppedSpeakingFrame closes any still-open
  span with stt.incomplete=True (covers turns where no finalized
  transcript and no timeout fired).

No changes required outside service_decorators.py — stt_service.py
and every per-service file are untouched.
2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
079282d140 Add changelog for #4467 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
0ccdd808e6 Fix traced_tts so metrics.ttfb reflects the real TTFB
Previously @traced_tts scoped the span to the lifetime of run_tts(). For
streaming TTS services run_tts() returns as soon as the synthesis request
is sent, long before audio chunks arrive, so:

- The span duration measured the WebSocket-send time, not synthesis time.
- The first synthesis recorded the WS-send duration as metrics.ttfb (via
  the in-progress fallback in FrameProcessorMetrics.ttfb).
- Subsequent syntheses recorded the previous call's TTFB on the current
  span (off-by-one).

The decorator now uses a __set_name__ descriptor to wrap the owning
class's setup() at class definition time. setup() installs per-instance
patches on create_audio_context, append_to_audio_context,
remove_audio_context, on_audio_context_completed, and
reset_active_audio_context. These patches own the span lifetime:

- create_audio_context: open span, set baseline attributes.
- append_to_audio_context: record metrics.ttfb on the first
  TTSAudioRawFrame (when stop_ttfb_metrics has produced a real value),
  end span on appended TTSStoppedFrame.
- on_audio_context_completed: end span on natural completion (handles
  services that auto-push TTSStoppedFrame via push_frame, bypassing
  append_to_audio_context).
- remove_audio_context: safety net for explicit removal paths.
- reset_active_audio_context: interruption hook (always reached from
  _handle_interruption); marks the span tts.interrupted=true only when
  nothing else has closed it.

The run_tts wrapper now only attaches per-call attributes (text,
metrics.character_count) to the already-open span. No changes required
in tts_service.py or in any of the per-service files.
2026-05-12 20:10:43 -07:00
Paul Kompfner
863a1bf177 Add changelog for #4474 2026-05-12 16:04:12 -04:00
Paul Kompfner
58333b2705 Extend cancel_on_interruption=False to InworldRealtimeLLMService (best-effort)
Same async-tool routing approach as #4441: detect async-tool messages in
the LLM context, deliver the final result via the formal tool-result
channel.

Caveat: as of this writing, Inworld Realtime doesn't appear to handle
the resulting delayed tool result reliably, so the routing is
best-effort and the service emits a one-time warning when async-tool
messages are seen. Streamed intermediate results remain unsupported.

Also adds function calling to the realtime-inworld.py example, and
softens the Inworld mention in the #4447 changelog now that the
exclusion is being closed.
2026-05-12 16:03:34 -04:00
TimTk
ecaff1d1eb Fix changelog fragment number 2026-05-12 22:21:59 +03:00
TimTk
9b55d4ddd4 Add support for Inworld TTS v2 fields 2026-05-12 22:13:09 +03:00
filipi87
d6655e7a5e Fixing ruff format. 2026-05-12 10:40:09 -03:00
filipi87
33b73df6ec Changing the websocket route to return the same data as PCC. 2026-05-12 10:38:15 -03:00
filipi87
c9f0172e9f Example supporting plain websocket. 2026-05-08 09:46:18 -03:00
filipi87
2638885c62 Adding support for the plain websocket transport. 2026-05-08 09:37:07 -03:00
filipi87
cb426cbb14 Fixing format. 2026-05-07 16:04:43 -03:00
filipi87
d39beff817 Fixing format. 2026-05-07 16:01:54 -03:00
filipi87
1eade184f1 Creating a status endpoint to return the available transports. 2026-05-07 15:53:15 -03:00
filipi87
3fa193b983 Unified start route to make all transports available. 2026-05-07 15:34:32 -03:00
Yan Fortin
6feeee515f chore: rename changelog fragment to match PR #4306 2026-04-14 18:49:35 -04:00
Yan Fortin
55fb4b0845 fix(azure-tts): route completion through word boundary queue to prevent last word from being missed
The Azure TTS _handle_completed callback was putting the audio stream
completion signal (None) directly into _audio_queue while the last word
was still pending in _word_boundary_queue. This caused a race condition
where run_tts could exit and TTSStoppedFrame could be emitted before the
word processor task had a chance to process and emit the final word's
TTSTextFrame.

The fix routes the completion signal through _word_boundary_queue as a
None sentinel. The word processor task now recognizes this sentinel and
only signals _audio_queue after all pending words have been drained.
This guarantees the last word's TTSTextFrame is always emitted before
TTSStoppedFrame.

The cancellation/interruption path (_handle_canceled) is unchanged and
still signals _audio_queue directly, which is correct since word ordering
does not matter when speech is interrupted.
2026-04-14 18:48:40 -04:00
615 changed files with 49964 additions and 6121 deletions

View File

@@ -0,0 +1,91 @@
---
name: squash-commits
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
---
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
## Instructions
### 1. Safety check
```bash
git status --short
```
If there are uncommitted changes, stop and tell the user to commit or stash them first.
### 2. Inspect the branch
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
```
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
### 3. Create a backup branch
```bash
git branch backup/<current-branch-name>
```
Tell the user the backup exists so they can recover if needed.
### 4. Soft-reset to main and unstage everything
```bash
git reset --soft main
git restore --staged .
```
All branch changes are now in the working tree, unstaged. No content has changed.
### 5. Plan the logical groups
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
- Core feature or fix (new source files + modified core files)
- Secondary features or fixes (each as its own commit if distinct)
- Refactoring or renames
- Tests
- Changelogs / docs
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
Present the proposed grouping to the user and ask for confirmation before committing.
### 6. Commit in logical groups
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
```bash
git add <file1> <file2> ...
git commit -m "..."
```
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
### 7. Verify
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
git status --short
```
Confirm:
- Commit count is small and each message is meaningful
- The set of changed files vs `main` is identical to before
- Working tree is clean
### 8. Remind about force-push
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
## Rules
- Never change file contents. If you find yourself editing a file, stop.
- Never skip the backup branch step.
- Never force-push without explicit user instruction.
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.

View File

@@ -41,7 +41,9 @@ jobs:
--extra google \
--extra langchain \
--extra livekit \
--extra pgmq \
--extra piper \
--extra redis \
--extra runner \
--extra sagemaker \
--extra tracing \

View File

@@ -45,7 +45,9 @@ jobs:
--extra google \
--extra langchain \
--extra livekit \
--extra pgmq \
--extra piper \
--extra redis \
--extra runner \
--extra sagemaker \
--extra tracing \

View File

@@ -7,6 +7,515 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- towncrier release notes start -->
## [1.2.1] - 2026-05-15
### Changed
- Changed the default WebSocket endpoints for `GradiumSTTService` and
`GradiumTTSService` to the region-neutral
`wss://api.gradium.ai/api/speech/asr` and
`wss://api.gradium.ai/api/speech/tts`. Gradium now automatically routes
traffic to the nearest endpoint. Override the url to pin to a specific
region.
(PR [#4500](https://github.com/pipecat-ai/pipecat/pull/4500))
### Fixed
- Fixed bot hangs when `filter_incomplete_user_turns` was enabled and the LLM
responded by calling a tool. The user turn never finalized, so the assistant
aggregator gated the tool-result context push and the LLM continuation never
ran. Tool calls now finalize the turn the moment they start, before the
function dispatches.
(PR [#4501](https://github.com/pipecat-ai/pipecat/pull/4501))
## [1.2.0] - 2026-05-14
### Added
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a
per-session identifier in local development the same way they can in Pipecat
Cloud. The development runner now mints a UUID at every construction site,
and paths that already returned a `sessionId` to the caller (Daily `/start`,
dial-in webhook) share that same UUID with the runner args instead of
generating two. The SmallWebRTC `/api/offer` endpoint also accepts an
optional `session_id` query parameter so the `/sessions/{session_id}/...`
proxy can thread it through.
(PR [#4385](https://github.com/pipecat-ai/pipecat/pull/4385))
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService`
for controlling Cartesia's server-side text buffering. When unset, Pipecat
picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE`
mode (custom buffering — avoids stacking client-side aggregation on top of
Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode
(Cartesia's managed buffering applies). Pass an explicit value (05000ms) to
override.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and
`DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
Improvement Program. When set, the value is forwarded to Deepgram as a query
parameter on the speak request. Defaults to `None`, which preserves the
existing behavior. See https://dpgr.am/deepgram-mip for pricing implications
before enabling.
(PR [#4400](https://github.com/pipecat-ai/pipecat/pull/4400))
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set
via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that
appends a developer-role message to the context whenever `LLMSetToolsFrame`
changes the set of advertised standard tools. Helps the LLM stay coherent
across mid-conversation tool changes, mitigating several flavors of
tool-call-related hallucination: calling tools that have been removed,
avoiding tools that have been re-added, and hallucinating output (made-up
answers or tool-call-shaped non-tool-calls) when tools are unavailable.
(PR [#4404](https://github.com/pipecat-ai/pipecat/pull/4404))
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in
`pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the
inference-triggered event and suppresses `on_user_turn_stopped`, leaving
finalization to another strategy in the chain such as
`LLMTurnCompletionUserTurnStopStrategy`.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop`
a generic stop strategy that finalizes the user turn whenever a
`UserTurnInferenceCompletedFrame` arrives, regardless of which component
produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base;
future producers (Flux, custom end-of-turn classifiers, etc.) can use the
base directly or subclass it to add producer-specific setup.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `on_user_turn_inference_triggered`, a new event on the user turn
controller, processor, aggregator and stop strategies that fires when a
strategy has enough signal to start LLM inference. By default it fires
together with `on_user_turn_stopped`; a gating strategy can fire only the
inference-triggered event and defer finalization to a peer.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `FilterIncompleteUserTurnStrategies` in
`pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization
that wraps the detector chain with `deferred(...)` and appends
`LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case:
`user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass
`config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`.
When installed, the strategy gates `on_user_turn_stopped` on a
`UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by
any component that can judge turn completeness — e.g. the
`UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout`
provides a safety net if no completion frame ever arrives.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Added first-class RTVI support for the UI Agent Protocol:
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server
messages, plus `ui-command` and `ui-task` server-to-client messages, with
paired `*Data` / `*Message` pydantic models.
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`,
`Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching
default handlers live in `@pipecat-ai/client-react`.
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`,
and `ui-cancel-task` messages.
- Adds five UI pipeline frames, mirroring the `client-message`
frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` /
`RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` /
`UITaskMessage` envelopes, while the processor pushes inbound
`RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame`
alongside `on_ui_message`.
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.
(PR [#4407](https://github.com/pipecat-ai/pipecat/pull/4407))
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore
processor now resolve credentials via the standard boto3 provider chain (EC2
instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
`~/.aws/credentials`) when explicit credentials and `AWS_*` environment
variables are absent. Services running with IAM roles no longer need to
export static credentials.
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can
bias transcription for both file-based and realtime transcription.
(PR [#4426](https://github.com/pipecat-ai/pipecat/pull/4426))
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and
`DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum
silence duration before the watchdog sends a silence packet to prevent
dangling turns. The actual threshold is `max(chunk_duration * 2,
watchdog_min_timeout)`, so it also adapts automatically to the audio chunk
size in use.
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on
models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini
2.x); the conversation now continues while the tool runs. On models that
don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time
warning explaining the limitation. (Note: an intermittent 1008 error can
occasionally fire on Gemini 2.5 during long-running tool calls; we
auto-reconnect.)
(PR [#4448](https://github.com/pipecat-ai/pipecat/pull/4448))
- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition
using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint.
Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is
VAD-aware, and automatically reconnects on error.
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
- Added NVIDIA Magpie TTS services via AWS SageMaker:
`NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM
back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream
with full interruption support via `InterruptibleTTSService`).
(PR [#4464](https://github.com/pipecat-ai/pipecat/pull/4464))
- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`,
for use with reasoning-capable Realtime models such as `gpt-realtime-2`.
(PR [#4470](https://github.com/pipecat-ai/pipecat/pull/4470))
- Inworld TTS updates:
- Added `delivery_mode` setting (`STABLE`/`BALANCED`/`CREATIVE`) to
`InworldTTSService` and `InworldHttpTTSService`, enabling the
stability-vs-creativity tradeoff in `inworld-tts-2`.
- Added language support to `InworldTTSService` and
`InworldHttpTTSService`. The `language` setting is now forwarded to the API,
and a new `language_to_inworld_language()` helper normalizes Pipecat
`Language` enums to Inworld's BCP-47 locale tags.
(PR [#4473](https://github.com/pipecat-ai/pipecat/pull/4473))
### Changed
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the
generally available `tts-rt-v1`.
(PR [#4386](https://github.com/pipecat-ai/pipecat/pull/4386))
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16`
to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the
`use_normalized_timestamps` and `max_buffer_delay_ms` fields.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead
of the deprecated `use_original_timestamps` field. Word timestamps now
reflect what was actually spoken (post text-normalization and
pronunciation-dictionary substitution), matching the convention Pipecat uses
for ElevenLabs. This is a behavior change for `sonic-3` users, who were
previously receiving timestamps tied to the input transcript.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Broadened `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s. Three
changes: a rename (`tool_resources``app_resources`), a new `app_resources`
property on `PipelineTask`, and a new `pipeline_task` property on
`FrameProcessor`. Tool handlers now read `params.app_resources`; custom
processors read `self.pipeline_task.app_resources`. The previous
`tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and
`FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit
`DeprecationWarning`s.
(PR [#4395](https://github.com/pipecat-ai/pipecat/pull/4395))
- Lowered the per-message log in
`SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App
messages can be high-frequency and were noisy at debug level; set the loguru
level to `TRACE` to see them again.
(PR [#4397](https://github.com/pipecat-ai/pipecat/pull/4397))
- Changed the default model for `GrokRealtimeLLMService` to
`grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The
previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is
being removed.
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to
`inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`,
`InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing
users can pin the prior model explicitly via the `model`/`tts_model`
argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid
model IDs.
(PR [#4422](https://github.com/pipecat-ai/pipecat/pull/4422))
- Changed the default model for `GrokLLMService` from `grok-3` to
`grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.
(PR [#4429](https://github.com/pipecat-ai/pipecat/pull/4429))
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic:
`max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms.
This prevents false silence injections when large audio chunks are sent at
lower frequency.
(PR [#4430](https://github.com/pipecat-ai/pipecat/pull/4430))
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the
turn is complete (on `on_turn_context_completed`) rather than waiting until
all audio has finished playing back. The `isFinal` message from ElevenLabs is
now used to signal `TTSStoppedFrame` and clean up the audio context,
improving turn transition timing.
(PR [#4433](https://github.com/pipecat-ai/pipecat/pull/4433))
- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio
encoding by default, which returns audio bytes without headers.
(PR [#4446](https://github.com/pipecat-ai/pipecat/pull/4446))
- Moved `create_task`, `cancel_task`, the `task_manager` property, and
`setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom
`BaseObject` subclasses (turn strategies, controllers, etc.) now inherit
these methods directly instead of reimplementing the task manager wiring.
Owners propagate the task manager to their child `BaseObject`s via `await
child.setup(task_manager)`.
(PR [#4449](https://github.com/pipecat-ai/pipecat/pull/4449))
- Changed the default OpenAI Realtime input audio transcription model from
`gpt-4o-transcribe` to `gpt-realtime-whisper` for both
`OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does
not accept the `prompt` parameter; if a prompt is supplied alongside
`gpt-realtime-whisper`, it is dropped automatically and a warning is logged.
To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or
`"gpt-4o-mini-transcribe"`).
(PR [#4450](https://github.com/pipecat-ai/pipecat/pull/4450))
- Updated the default model for `CartesiaTTSService` and
`CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.
(PR [#4462](https://github.com/pipecat-ai/pipecat/pull/4462))
- Changed the default model for `OpenAIRealtimeLLMService` from
`gpt-realtime-1.5` to `gpt-realtime-2`.
(PR [#4472](https://github.com/pipecat-ai/pipecat/pull/4472))
### Deprecated
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use
`user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add
`LLMTurnCompletionUserTurnStopStrategy` to a custom
`user_turn_strategies.stop`) instead. Setting the legacy flag still works for
one release: the aggregator emits a `DeprecationWarning` and rewires the
strategies as if you had passed `FilterIncompleteUserTurnStrategies`
directly.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the
`create_file_resampler()` / `create_stream_resampler()` factories).
Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class
will be removed in Pipecat 2.0 along with the default `resampy` and `numba`
dependencies.
(PR [#4428](https://github.com/pipecat-ai/pipecat/pull/4428))
### Fixed
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as
`ErrorFrame`s. The latest API emits a `flush_done` per transcript when
server-side buffering is disabled; Pipecat now consumes them silently since
each turn already has its own `context_id`.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`,
`VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance
(e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both
the class and an instance.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200
response — one with the API's error text and a second, less informative
"Unknown error" frame from the outer exception handler. It now pushes a
single frame that includes the HTTP status code and returns cleanly.
(PR [#4390](https://github.com/pipecat-ai/pipecat/pull/4390))
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally
for user turn stop strategies. It is now only imported when
`default_user_turn_stop_strategies()` is called. This improves startup time
and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning
when the default stop strategies are not used.
(PR [#4393](https://github.com/pipecat-ai/pipecat/pull/4393))
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was
stored in `Settings` but never sent to xAI, so every session silently fell
back to xAI's server-side default. The model is now passed via the `?model=`
query parameter on the WebSocket URL as xAI's Voice Agent API requires.
(PR [#4401](https://github.com/pipecat-ai/pipecat/pull/4401))
- Fixed `on_user_turn_stopped` firing prematurely when
`filter_incomplete_user_turns` was enabled. The event now fires only after
the LLM confirms the user turn is complete (`✓`); previously the smart-turn
detector's tentative stop was bubbling up before the LLM had a chance to veto
it, causing observers, transcript appenders and UI indicators to receive an
early — and sometimes duplicated — signal.
(PR [#4405](https://github.com/pipecat-ai/pipecat/pull/4405))
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting
across two assistant messages in the LLM context and not surfacing in
`on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted
at the end of a TTS context now carries a PTS just past the last word so it
can't overtake clock-queued `TTSTextFrame`s in the transport's output, and
`LLMAssistantAggregator` now triggers
`on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the
frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting
transcripts).
(PR [#4414](https://github.com/pipecat-ai/pipecat/pull/4414))
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged
words (e.g. `bookLook`) when using Flash models. Flash often splits sentences
mid-stream into alignment chunks that begin with a real inter-word space, but
the previous fix unconditionally stripped that space from every chunk.
Leading spaces are now stripped only on the first alignment chunk of an
utterance, so subsequent chunks correctly flush partial words across
boundaries.
(PR [#4415](https://github.com/pipecat-ai/pipecat/pull/4415))
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor
erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
was set in the environment. The half-populated kwargs are no longer forwarded
to aioboto3; partial env-var configurations now fall through to the boto3
credential chain like fully-unset configurations do.
(PR [#4416](https://github.com/pipecat-ai/pipecat/pull/4416))
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing
romanized/normalized text to the LLM context. With non-Latin input (e.g.,
Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao
!` instead of `你好!`), which then degraded subsequent LLM turns. The services
now consume `alignment` by default and only switch to `normalizedAlignment` /
`normalized_alignment` when `pronunciation_dictionary_locators` is configured
(where `alignment` has overlapping restarts that produce duplicated/garbled
words, per #4316). Both fields are read with preferred-with-fallback
semantics since each is nullable per the API schema.
(PR [#4424](https://github.com/pipecat-ai/pipecat/pull/4424))
- Fixed a deadlock in `TTSService` that could permanently stall pipeline
processing when all three conditions occurred together:
`pause_frame_processing=True`, an interruption arrived before any TTS audio
was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`,
`FunctionCallResultFrame`) was in the processing queue at that moment. The
process task would block on `__process_event.wait()` indefinitely because
`BotStoppedSpeakingFrame` never arrives (no audio was played) and the
interruption handler did not resume processing. Affects services using
`pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and
ResembleAI.
(PR [#4431](https://github.com/pipecat-ai/pipecat/pull/4431))
- Fixed interruptions being delayed when a slow non-uninterruptible frame was
processing and an uninterruptible frame was waiting in the queue. The bot
would stall until the slow frame finished instead of cancelling it
immediately on interruption.
(PR [#4434](https://github.com/pipecat-ai/pipecat/pull/4434))
- Fixed `TTSService` dropping uninterruptible frames (e.g.
`FunctionCallResultFrame`) from its internal serialization queue when an
interruption occurs. Previously, the queue was recreated on every
interruption, silently discarding any queued frames. The queue is now reset
instead of recreated, preserving uninterruptible frames so they are always
delivered downstream.
(PR [#4435](https://github.com/pipecat-ai/pipecat/pull/4435))
- Fixed a race condition in the Daily transport that caused `AttributeError:
'NoneType' object has no attribute 'send_app_message'` when tearing down a
pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the
same `DailyTransportClient` and both call `cleanup()`, which was releasing
the underlying `CallClient` on the first call — leaving the second caller
with a `None` client.
(PR [#4440](https://github.com/pipecat-ai/pipecat/pull/4440))
- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService`
and `OpenAIRealtimeLLMService`. These services previously honored the flag by
simply not cancelling in-flight function calls on interruption; the
introduction of the new async-tool mechanism (which threads
started/intermediate/final messages through the LLM context) broke that path
because the realtime services didn't know how to interpret those messages.
Note that new-style streamed intermediate results
(`FunctionCallResultProperties(is_final=False)`) are not supported on these
realtime services. Similar fixes for other impacted realtime services are
forthcoming.
(PR [#4441](https://github.com/pipecat-ai/pipecat/pull/4441))
- Fixed two misspelled Gemini TTS voice names in
`GeminiTTSService.AVAILABLE_VOICES`.
(PR [#4443](https://github.com/pipecat-ai/pipecat/pull/4443))
- Extended the `cancel_on_interruption=False` regression fix to
`GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and
`UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in
#4441 (each service detects async-tool messages in the LLM context and routes
the final result to its formal tool-result channel; Azure inherits
transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different
approach because its API freezes the conversation between
`client_tool_invocation` and the matching `client_tool_result` — for
async-registered functions it now ships a placeholder `client_tool_result`
immediately when the function is invoked (to unfreeze the conversation), then
injects the real result as user-side text once the tool finishes. Streamed
intermediate results (`FunctionCallResultProperties(is_final=False)`) are
still not supported on any of these realtime services. `GeminiLiveLLMService`
and `InworldRealtimeLLMService` are excluded for now: Gemini Live's
async-tool path needs deeper investigation, and Inworld tool calling needs to
be sorted out first.
(PR [#4447](https://github.com/pipecat-ai/pipecat/pull/4447))
- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses
(observed with `gpt-realtime-2`). A single response can now contain more than
one audio item, and the first item's `audio.done` may arrive after the second
item's deltas have started. Deltas still arrive strictly in playback order,
so we continue to forward them as received (matching OpenAI's reference
implementation). The fix removes spurious warnings, ensures truncation always
targets the latest audio item, and emits a single bracketing
`TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is
now pushed on `response.done`).
(PR [#4465](https://github.com/pipecat-ai/pipecat/pull/4465))
- Fixed missing `output` attribute on LLM OpenTelemetry spans when the LLM call
is interrupted mid-stream.
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
- Fixed incorrect `metrics.ttfb` on STT OpenTelemetry spans, and parented them
to the current turn span.
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
- Fixed incorrect `metrics.ttfb` on TTS OpenTelemetry spans for streaming
services.
(PR [#4467](https://github.com/pipecat-ai/pipecat/pull/4467))
- Extended the `cancel_on_interruption=False` regression fix to
`InworldRealtimeLLMService`. Uses the same approach as in #4441 (the service
detects async-tool messages in the LLM context and routes the final result to
its formal tool-result channel). Note: as of this writing, Inworld Realtime
doesn't appear to handle the resulting delayed tool result reliably — the
routing is best-effort and the service surfaces a one-time warning when
async-tool messages are seen. Streamed intermediate results
(`FunctionCallResultProperties(is_final=False)`) are still not supported on
this realtime service. (Inworld was excluded from #4447 pending resolution of
an unrelated tool-calling issue, which turned out to be an account-level
matter.)
(PR [#4474](https://github.com/pipecat-ai/pipecat/pull/4474))
- Fixed Cartesia TTS Korean word timestamps to use normal spacing rules,
preserving word boundaries and per-word timestamp alignment during downstream
aggregation.
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
- Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve
provider text spacing, avoiding artificial spaces when timestamp groups are
reassembled downstream.
(PR [#4475](https://github.com/pipecat-ai/pipecat/pull/4475))
- Fixed `SonioxSTTService` final transcription frames missing detected language
metadata when Soniox returns token-level language annotations.
(PR [#4482](https://github.com/pipecat-ai/pipecat/pull/4482))
- Fixed Soniox final transcription language detection to use the most common
recognized token language, avoiding mislabeling an utterance when the last
token is tagged with a different language.
(PR [#4495](https://github.com/pipecat-ai/pipecat/pull/4495))
- Fixed dropped audio in streaming TTS services whose wire protocol doesn't
echo `context_id` back on incoming audio (Sarvam, Smallest, Soniox, Inworld,
and others). Previously, audio that arrived between contexts or at the very
start of a turn was tagged with `context_id=None` and silently dropped with
an "unable to append audio to context: no context ID provided" debug log.
`TTSService.get_active_audio_context_id()` now falls back to the
synthesis-side `_turn_context_id` when the playback cursor isn't set yet.
(PR [#4497](https://github.com/pipecat-ai/pipecat/pull/4497))
### Security
- Fixed a path traversal issue in the development runner's
`/files/{filename:path}` download endpoint. Previously, when the runner was
started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could
escape the configured folder because `%2F`-encoded separators bypassed
Starlette's path normalisation. The endpoint now resolves the joined path and
rejects any filename that escapes the allowed base with a 403, and also
returns 404 (instead of an implicit `null` 200) when `--folder` is unset.
(PR [#4417](https://github.com/pipecat-ai/pipecat/pull/4417))
## [1.1.0] - 2026-04-27
### Added

View File

@@ -92,10 +92,10 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |

1
changelog/4052.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.

1
changelog/4306.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.

View File

@@ -0,0 +1 @@
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.

View File

@@ -0,0 +1 @@
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.

View File

@@ -0,0 +1 @@
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.

1
changelog/4380.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.

View File

@@ -1 +0,0 @@
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned a `sessionId` to the caller (Daily `/start`, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC `/api/offer` endpoint also accepts an optional `session_id` query parameter so the `/sessions/{session_id}/...` proxy can thread it through.

View File

@@ -1 +0,0 @@
- Updated the default `SonioxTTSService` model from `tts-rt-v1-preview` to the generally available `tts-rt-v1`.

View File

@@ -1 +0,0 @@
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService` for controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE` mode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode (Cartesia's managed buffering applies). Pass an explicit value (05000ms) to override.

View File

@@ -1 +0,0 @@
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16` to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the `use_normalized_timestamps` and `max_buffer_delay_ms` fields.

View File

@@ -1 +0,0 @@
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead of the deprecated `use_original_timestamps` field. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change for `sonic-3` users, who were previously receiving timestamps tied to the input transcript.

View File

@@ -1 +0,0 @@
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly.

View File

@@ -1 +0,0 @@
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`, `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both the class and an instance.

View File

@@ -1 +0,0 @@
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as `ErrorFrame`s. The latest API emits a `flush_done` per transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its own `context_id`.

View File

@@ -1 +0,0 @@
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally for user turn stop strategies. It is now only imported when `default_user_turn_stop_strategies()` is called. This improves startup time and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used.

View File

@@ -1 +0,0 @@
- Broadened `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Three changes: a rename (`tool_resources``app_resources`), a new `app_resources` property on `PipelineTask`, and a new `pipeline_task` property on `FrameProcessor`. Tool handlers now read `params.app_resources`; custom processors read `self.pipeline_task.app_resources`. The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.

View File

@@ -1 +0,0 @@
- Lowered the per-message log in `SmallWebRTCInputTransport._handle_app_message` from `debug` to `trace`. App messages can be high-frequency and were noisy at debug level; set the loguru level to `TRACE` to see them again.

View File

@@ -1 +0,0 @@
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults to `None`, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling.

View File

@@ -1 +0,0 @@
- Changed the default model for `GrokRealtimeLLMService` to `grok-voice-think-fast-1.0`, xAI's recommended Voice Agent model. The previous default of `grok-voice-fast-1.0` has been deprecated by xAI and is being removed.

View File

@@ -1 +0,0 @@
- Fixed `GrokRealtimeLLMService` ignoring the configured model. The model was stored in `Settings` but never sent to xAI, so every session silently fell back to xAI's server-side default. The model is now passed via the `?model=` query parameter on the WebSocket URL as xAI's Voice Agent API requires.

View File

@@ -1 +0,0 @@
- Added an opt-in `add_tool_change_messages` flag to the LLM aggregators (set via `LLMContextAggregatorPair(..., add_tool_change_messages=True)`) that appends a developer-role message to the context whenever `LLMSetToolsFrame` changes the set of advertised standard tools. Helps the LLM stay coherent across mid-conversation tool changes, mitigating several flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, and hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable.

View File

@@ -1 +0,0 @@
- Added `LLMTurnCompletionUserTurnStopStrategy` in `pipecat.turns.user_stop`. When installed, the strategy gates `on_user_turn_stopped` on a `UserTurnInferenceCompletedFrame` (a new fieldless system frame emitted by any component that can judge turn completeness — e.g. the `UserTurnCompletionLLMServiceMixin` on `✓`). A `finalization_timeout` provides a safety net if no completion frame ever arrives.

View File

@@ -1 +0,0 @@
- Added `deferred(strategy)` and `DeferredUserTurnStopStrategy` in `pipecat.turns.user_stop`. Wraps a stop strategy so it fires only the inference-triggered event and suppresses `on_user_turn_stopped`, leaving finalization to another strategy in the chain such as `LLMTurnCompletionUserTurnStopStrategy`.

View File

@@ -1 +0,0 @@
- Added `FilterIncompleteUserTurnStrategies` in `pipecat.turns.user_turn_strategies` — a `UserTurnStrategies` specialization that wraps the detector chain with `deferred(...)` and appends `LLMTurnCompletionUserTurnStopStrategy` as the finalizer. Common case: `user_turn_strategies=FilterIncompleteUserTurnStrategies()`. Pass `config=UserTurnCompletionConfig(...)` to customize timeouts and prompts.

View File

@@ -1 +0,0 @@
- Added `ExternalUserTurnCompletionStopStrategy` in `pipecat.turns.user_stop` — a generic stop strategy that finalizes the user turn whenever a `UserTurnInferenceCompletedFrame` arrives, regardless of which component produced it. `LLMTurnCompletionUserTurnStopStrategy` now extends this base; future producers (Flux, custom end-of-turn classifiers, etc.) can use the base directly or subclass it to add producer-specific setup.

View File

@@ -1 +0,0 @@
- Added `on_user_turn_inference_triggered`, a new event on the user turn controller, processor, aggregator and stop strategies that fires when a strategy has enough signal to start LLM inference. By default it fires together with `on_user_turn_stopped`; a gating strategy can fire only the inference-triggered event and defer finalization to a peer.

View File

@@ -1 +0,0 @@
- Deprecated `LLMUserAggregatorParams.filter_incomplete_user_turns`. Use `user_turn_strategies=FilterIncompleteUserTurnStrategies()` (or add `LLMTurnCompletionUserTurnStopStrategy` to a custom `user_turn_strategies.stop`) instead. Setting the legacy flag still works for one release: the aggregator emits a `DeprecationWarning` and rewires the strategies as if you had passed `FilterIncompleteUserTurnStrategies` directly.

View File

@@ -1 +0,0 @@
- Fixed `on_user_turn_stopped` firing prematurely when `filter_incomplete_user_turns` was enabled. The event now fires only after the LLM confirms the user turn is complete (`✓`); previously the smart-turn detector's tentative stop was bubbling up before the LLM had a chance to veto it, causing observers, transcript appenders and UI indicators to receive an early — and sometimes duplicated — signal.

View File

@@ -1,6 +0,0 @@
- Added first-class RTVI support for the UI Agent Protocol:
- Adds `ui-event`, `ui-snapshot`, and `ui-cancel-task` client-to-server messages, plus `ui-command` and `ui-task` server-to-client messages, with paired `*Data` / `*Message` pydantic models.
- Adds built-in command payload models for `Toast`, `Navigate`, `ScrollTo`, `Highlight`, `Focus`, `Click`, `SetInputValue`, and `SelectText`; matching default handlers live in `@pipecat-ai/client-react`.
- Adds `RTVIProcessor.on_ui_message` for inbound `ui-event`, `ui-snapshot`, and `ui-cancel-task` messages.
- Adds five UI pipeline frames, mirroring the `client-message` frame-and-event pattern: downstream code pushes `RTVIUICommandFrame` / `RTVIUITaskFrame` for the observer to wrap into outbound `UICommandMessage` / `UITaskMessage` envelopes, while the processor pushes inbound `RTVIUIEventFrame`, `RTVIUISnapshotFrame`, and `RTVIUICancelTaskFrame` alongside `on_ui_message`.
- Bumps the RTVI `PROTOCOL_VERSION` from `1.2.0` to `1.3.0`.

View File

@@ -1 +0,0 @@
- Fixed `TTSSpeakFrame(append_to_context=True)` greetings sometimes splitting across two assistant messages in the LLM context and not surfacing in `on_assistant_turn_stopped`. The `LLMAssistantPushAggregationFrame` emitted at the end of a TTS context now carries a PTS just past the last word so it can't overtake clock-queued `TTSTextFrame`s in the transport's output, and `LLMAssistantAggregator` now triggers `on_assistant_turn_started`/`on_assistant_turn_stopped` when it receives the frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting transcripts).

View File

@@ -1 +0,0 @@
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` producing merged words (e.g. `bookLook`) when using Flash models. Flash often splits sentences mid-stream into alignment chunks that begin with a real inter-word space, but the previous fix unconditionally stripped that space from every chunk. Leading spaces are now stripped only on the first alignment chunk of an utterance, so subsequent chunks correctly flush partial words across boundaries.

View File

@@ -1 +0,0 @@
- AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor now resolve credentials via the standard boto3 provider chain (EC2 instance profiles, EKS pod roles / IRSA, ECS task roles, SSO, `~/.aws/credentials`) when explicit credentials and `AWS_*` environment variables are absent. Services running with IAM roles no longer need to export static credentials.

View File

@@ -1 +0,0 @@
- Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor erroring out when only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` was set in the environment. The half-populated kwargs are no longer forwarded to aioboto3; partial env-var configurations now fall through to the boto3 credential chain like fully-unset configurations do.

View File

@@ -1 +0,0 @@
- Fixed a path traversal issue in the development runner's `/files/{filename:path}` download endpoint. Previously, when the runner was started with `--folder`, a request like `/files/..%2F..%2Fetc%2Fpasswd` could escape the configured folder because `%2F`-encoded separators bypassed Starlette's path normalisation. The endpoint now resolves the joined path and rejects any filename that escapes the allowed base with a 403, and also returns 404 (instead of an implicit `null` 200) when `--folder` is unset.

View File

@@ -1 +0,0 @@
- Changed the default Inworld TTS model from `inworld-tts-1.5-max` to `inworld-tts-2` (Realtime TTS-2) across `InworldHttpTTSService`, `InworldTTSService`, and the `InworldRealtimeLLMService` cascade. Existing users can pin the prior model explicitly via the `model`/`tts_model` argument; both `inworld-tts-1.5-max` and `inworld-tts-1.5-mini` remain valid model IDs.

1
changelog/4423.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.

View File

@@ -1 +0,0 @@
- Fixed `ElevenLabsTTSService` and `ElevenLabsHttpTTSService` writing romanized/normalized text to the LLM context. With non-Latin input (e.g., Chinese), the assistant transcript was getting populated with pinyin (`Ni Hao !` instead of `你好!`), which then degraded subsequent LLM turns. The services now consume `alignment` by default and only switch to `normalizedAlignment` / `normalized_alignment` when `pronunciation_dictionary_locators` is configured (where `alignment` has overlapping restarts that produce duplicated/garbled words, per #4316). Both fields are read with preferred-with-fallback semantics since each is nullable per the API schema.

View File

@@ -1 +0,0 @@
- Added `keyterms` support to ElevenLabs STT services so Scribe V2 callers can bias transcription for both file-based and realtime transcription.

View File

@@ -1 +0,0 @@
- Deprecated `ResampyResampler` in favor of `SOXRAudioResampler` (or the `create_file_resampler()` / `create_stream_resampler()` factories). Instantiating `ResampyResampler` now emits a `DeprecationWarning`. The class will be removed in Pipecat 2.0 along with the default `resampy` and `numba` dependencies.

View File

@@ -1 +0,0 @@
- Changed the default model for `GrokLLMService` from `grok-3` to `grok-4.20-non-reasoning`. xAI is retiring `grok-3` on May 15, 2026.

View File

@@ -1 +0,0 @@
- Added `watchdog_min_timeout` parameter to `DeepgramFluxSTT` and `DeepgramFluxSageMakerSTT` (default `0.5` seconds) to control the minimum silence duration before the watchdog sends a silence packet to prevent dangling turns. The actual threshold is `max(chunk_duration * 2, watchdog_min_timeout)`, so it also adapts automatically to the audio chunk size in use.

View File

@@ -1 +0,0 @@
- `DeepgramFluxSTT` watchdog silence threshold is now dynamic: `max(chunk_duration * 2, watchdog_min_timeout)` instead of a fixed 500 ms. This prevents false silence injections when large audio chunks are sent at lower frequency.

View File

@@ -1 +0,0 @@
- Fixed a deadlock in `TTSService` that could permanently stall pipeline processing when all three conditions occurred together: `pause_frame_processing=True`, an interruption arrived before any TTS audio was played, and an `UninterruptibleFrame` (e.g. `TTSUpdateSettingsFrame`, `FunctionCallResultFrame`) was in the processing queue at that moment. The process task would block on `__process_event.wait()` indefinitely because `BotStoppedSpeakingFrame` never arrives (no audio was played) and the interruption handler did not resume processing. Affects services using `pause_frame_processing=True` such as ElevenLabs, Rime, AsyncAI, Gradium, and ResembleAI.

View File

@@ -1 +0,0 @@
- `ElevenLabsTTSService` now sends `close_context` to the server as soon as the turn is complete (on `on_turn_context_completed`) rather than waiting until all audio has finished playing back. The `isFinal` message from ElevenLabs is now used to signal `TTSStoppedFrame` and clean up the audio context, improving turn transition timing.

View File

@@ -1 +0,0 @@
- Fixed interruptions being delayed when a slow non-uninterruptible frame was processing and an uninterruptible frame was waiting in the queue. The bot would stall until the slow frame finished instead of cancelling it immediately on interruption.

View File

@@ -1 +0,0 @@
- Fixed `TTSService` dropping uninterruptible frames (e.g. `FunctionCallResultFrame`) from its internal serialization queue when an interruption occurs. Previously, the queue was recreated on every interruption, silently discarding any queued frames. The queue is now reset instead of recreated, preserving uninterruptible frames so they are always delivered downstream.

View File

@@ -1 +0,0 @@
- Fixed a race condition in the Daily transport that caused `AttributeError: 'NoneType' object has no attribute 'send_app_message'` when tearing down a pipeline. Both `DailyInputTransport` and `DailyOutputTransport` share the same `DailyTransportClient` and both call `cleanup()`, which was releasing the underlying `CallClient` on the first call — leaving the second caller with a `None` client.

View File

@@ -1 +0,0 @@
- Restored `cancel_on_interruption=False` support for `AWSNovaSonicLLMService` and `OpenAIRealtimeLLMService`. These services previously honored the flag by simply not cancelling in-flight function calls on interruption; the introduction of the new async-tool mechanism (which threads started/intermediate/final messages through the LLM context) broke that path because the realtime services didn't know how to interpret those messages. Note that new-style streamed intermediate results (`FunctionCallResultProperties(is_final=False)`) are not supported on these realtime services. Similar fixes for other impacted realtime services are forthcoming.

View File

@@ -0,0 +1 @@
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).

1
changelog/4442.added.md Normal file
View File

@@ -0,0 +1 @@
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.

View File

@@ -0,0 +1 @@
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.

View File

@@ -1 +0,0 @@
- Fixed two misspelled Gemini TTS voice names in `GeminiTTSService.AVAILABLE_VOICES`.

View File

@@ -1 +0,0 @@
- Updated `InworldHttpTTSService` and `InworldTTSService` to use PCM audio encoding by default, which returns audio bytes without headers.

View File

@@ -1 +0,0 @@
- Extended the `cancel_on_interruption=False` regression fix to `GrokRealtimeLLMService`, `AzureRealtimeLLMService`, and `UltravoxRealtimeLLMService`. Grok and Azure use the same approach as in #4441 (each service detects async-tool messages in the LLM context and routes the final result to its formal tool-result channel; Azure inherits transitively from `OpenAIRealtimeLLMService`). Ultravox needed a different approach because its API freezes the conversation between `client_tool_invocation` and the matching `client_tool_result` — for async-registered functions it now ships a placeholder `client_tool_result` immediately when the function is invoked (to unfreeze the conversation), then injects the real result as user-side text once the tool finishes. Streamed intermediate results (`FunctionCallResultProperties(is_final=False)`) are still not supported on any of these realtime services. `GeminiLiveLLMService` and `InworldRealtimeLLMService` are excluded for now: Gemini Live's async-tool path needs deeper investigation, and Inworld appears to have a pre-existing problem with even simple tool calling on its Realtime API.

View File

@@ -1 +0,0 @@
- Added `cancel_on_interruption=False` support for `GeminiLiveLLMService` on models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini 2.x); the conversation now continues while the tool runs. On models that don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time warning explaining the limitation. (Note: an intermittent 1008 error can occasionally fire on Gemini 2.5 during long-running tool calls; we auto-reconnect.)

View File

@@ -1 +0,0 @@
- Moved `create_task`, `cancel_task`, the `task_manager` property, and `setup(task_manager)` up from `FrameProcessor` to `BaseObject`. Custom `BaseObject` subclasses (turn strategies, controllers, etc.) now inherit these methods directly instead of reimplementing the task manager wiring. Owners propagate the task manager to their child `BaseObject`s via `await child.setup(task_manager)`.

View File

@@ -1 +0,0 @@
- Changed the default OpenAI Realtime input audio transcription model from `gpt-4o-transcribe` to `gpt-realtime-whisper` for both `OpenAIRealtimeSTTService` and `OpenAIRealtimeLLMService`. The new model does not accept the `prompt` parameter; if a prompt is supplied alongside `gpt-realtime-whisper`, it is dropped automatically and a warning is logged. To keep using prompt hints, explicitly pin `model="gpt-4o-transcribe"` (or `"gpt-4o-mini-transcribe"`).

View File

@@ -1 +0,0 @@
- Updated the default model for `CartesiaTTSService` and `CartesiaHttpTTSService` from `sonic-3` to `sonic-3.5`.

View File

@@ -1 +0,0 @@
- Added NVIDIA Magpie TTS services via AWS SageMaker: `NvidiaSageMakerHTTPTTSService` (single HTTP invocation, streams raw PCM back) and `NvidiaSageMakerWebsocketTTSService` (persistent HTTP/2 bidi-stream with full interruption support via `InterruptibleTTSService`).

View File

@@ -1 +0,0 @@
- Added `NvidiaSageMakerWebsocketSTTService` for streaming speech recognition using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint. Produces `InterimTranscriptionFrame` and `TranscriptionFrame` frames, is VAD-aware, and automatically reconnects on error.

View File

@@ -1 +0,0 @@
- Fixed `OpenAIRealtimeLLMService` handling of multi-output-item responses (observed with `gpt-realtime-2`). A single response can now contain more than one audio item, and the first item's `audio.done` may arrive after the second item's deltas have started. Deltas still arrive strictly in playback order, so we continue to forward them as received (matching OpenAI's reference implementation). The fix removes spurious warnings, ensures truncation always targets the latest audio item, and emits a single bracketing `TTSStartedFrame`/`TTSStoppedFrame` pair per assistant turn (the Stopped is now pushed on `response.done`).

View File

@@ -1 +0,0 @@
- Added support for `reasoning` configuration on `OpenAIRealtimeLLMService`, for use with reasoning-capable Realtime models such as `gpt-realtime-2`.

View File

@@ -1 +0,0 @@
- Changed the default model for `OpenAIRealtimeLLMService` from `gpt-realtime-1.5` to `gpt-realtime-2`.

View File

@@ -1 +0,0 @@
- Added `wait_for_transcript_to_end_user_turn` on `LLMUserAggregatorParams` for pipelines where local turn detection drives a realtime service like Gemini Live. Set it to False to avoid unnecessary latency from transcript delay — the realtime service consumes user audio directly, so we don't need user transcripts in context before it can respond. The option makes it so that (1) turn strategies do not consider user transcripts, letting the user turn end sooner, and (2) user transcripts are then handled by the aggregator: a simple timer gives it time to gather those transcripts after the user turn ends, and once gathered, the aggregator emits a new `on_user_turn_message_finalized` event with the new user context message. The new event also fires in the default mode (coinciding with `on_user_turn_stopped`), so consumers that want the populated user transcript can subscribe to it uniformly. See `examples/realtime/realtime-gemini-live-local-vad.py` for the full pattern.

1
changelog/4493.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `pipecat.workers`, a worker-based agent framework folded in from the standalone `pipecat-subagents` package. Workers inherit from `BaseWorker`, share a `WorkerBus`, register in a `WorkerRegistry`, and exchange typed work via `@job` handlers. `LLMWorker` and `LLMContextWorker` provide ready-made LLM-driven workers. `PipelineRunner.spawn(worker)` registers fire-and-forget workers alongside the main pipeline worker.

View File

@@ -0,0 +1 @@
- ⚠️ `FrameProcessorSetup.pipeline_worker` and `FunctionCallParams.pipeline_worker` are now mandatory fields, and `FrameProcessor.pipeline_worker` raises if read before `setup()` instead of returning `None`. Real-world code (frame processors set up by `PipelineWorker`, tool handlers invoked by `LLMService`) is unaffected; only callers that construct these dataclasses by hand (typically tests) now have to supply a `pipeline_worker` reference.

View File

@@ -0,0 +1 @@
- `PipelineWorker` now inherits from `BaseWorker`, so every pipeline worker is also a bus participant. It accepts a new optional `bridged=()` parameter that auto-wraps the pipeline with bus edge processors, letting the worker exchange frames with other bridged workers over the shared `WorkerBus`. The bus is supplied by `PipelineRunner` via `worker.attach(registry=..., bus=...)` instead of through the constructor.

1
changelog/4507.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.

1
changelog/4514.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.

1
changelog/4521.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.

View File

@@ -0,0 +1 @@
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.

View File

@@ -0,0 +1 @@
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.

View File

@@ -0,0 +1 @@
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.

View File

@@ -0,0 +1 @@
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.

1
changelog/4524.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.

1
changelog/4527.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.

View File

@@ -0,0 +1 @@
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.

View File

@@ -0,0 +1 @@
- Added `LLMService.append_system_instruction(...)`, which composes durable text onto a user-provided system instruction (alongside the turn-completion and async-tool-cancellation instructions) so it is prepended on every inference and survives context-message resets.

3
changelog/xxxx.added.md Normal file
View File

@@ -0,0 +1,3 @@
- Added `pipecat.workers.ui.UIWorker`, an `LLMContextWorker` that observes and drives a client GUI over the RTVI UI channel: it stores live accessibility snapshots, auto-injects `<ui_state>` into the LLM context before every inference (via the LLM's `on_before_process_frame` hook), dispatches client events to `@on_ui_event` handlers, and sends UI commands (`scroll_to`, `highlight`, `select_text`, `click`, `set_input_value`) back to the client. The optional `ReplyToolMixin` exposes a bundled `reply` tool, and `user_job_group(...)` surfaces fan-out work to the client as cancellable task cards. A native RTVI⇄bus UI bridge is built into `PipelineWorker` (active whenever RTVI is enabled), so no decorator or manual wiring is needed: inbound UI messages are broadcast on the bus as `BusUIEventMessage`, and outbound `BusUICommandMessage` / `BusUITask*` carriers are translated into RTVI frames for the client.
- `UIWorker` auto-injects the UI wire-format guide (`UI_STATE_PROMPT_GUIDE`) into its LLM's system instruction by default, via a `prompt_guide` parameter — pass your own string to override the guide, or `None` to disable. Apps no longer need to concatenate `UI_STATE_PROMPT_GUIDE` into the LLM's `system_instruction` by hand.

View File

@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
HUME_API_KEY=...
HUME_VOICE_ID=...
# Inception
INCEPTION_API_KEY=...
# Inworld
INWORLD_API_KEY=...
@@ -211,6 +214,11 @@ TWILIO_AUTH_TOKEN=...
# Ultravox Realtime
ULTRAVOX_API_KEY=...
# Vonage
VONAGE_APPLICATION_ID=...
VONAGE_SESSION_ID=...
VONAGE_TOKEN=...
# WhatsApp
WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...

View File

@@ -16,7 +16,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, MixerEnableFrame, MixerUpdateSettingsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -105,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -120,27 +120,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Listening for background sound for a bit...")
await asyncio.sleep(5.0)
logger.info(f"Reducing volume...")
await task.queue_frame(MixerUpdateSettingsFrame({"volume": 0.5}))
await worker.queue_frame(MixerUpdateSettingsFrame({"volume": 0.5}))
await asyncio.sleep(5.0)
logger.info(f"Disabling background sound for a bit...")
await task.queue_frame(MixerEnableFrame(False))
await worker.queue_frame(MixerEnableFrame(False))
await asyncio.sleep(5.0)
logger.info(f"Re-enabling background sound and starting bot...")
await task.queue_frame(MixerEnableFrame(True))
await worker.queue_frame(MixerEnableFrame(True))
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -54,7 +54,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -146,7 +146,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -161,12 +161,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Start recording audio
await audiobuffer.start_recording()
# Start conversation - empty prompt to let LLM follow system instructions
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
# Handler for merged audio
@audiobuffer.event_handler("on_audio_data")
@@ -191,7 +191,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
await save_audio_file(bot_audio, bot_filename, sample_rate, 1)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -20,7 +20,7 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.worker import PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -144,7 +144,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@@ -153,17 +153,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frame(TTSSpeakFrame("Hi, I'm listening!"))
await worker.queue_frame(TTSSpeakFrame("Hi, I'm listening!"))
await transport.send_audio(sounds["ding1.wav"])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -26,7 +26,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent
from pipecat.processors.aggregators.llm_response_universal import (
@@ -198,7 +198,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -214,16 +214,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -24,7 +24,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent
from pipecat.processors.aggregators.llm_response_universal import (
@@ -159,7 +159,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -175,16 +175,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -26,7 +26,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, LLMSummarizeContextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -133,7 +133,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -149,16 +149,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -24,7 +24,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent
from pipecat.processors.aggregators.llm_response_universal import (
@@ -159,7 +159,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -175,16 +175,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -56,7 +56,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, LLMSetToolsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import NOT_GIVEN, LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -163,7 +163,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
@@ -185,13 +185,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
"=== Phase 1: weather tool REMOVED. Keep asking about the weather "
"to exercise hallucination scenarios. ==="
)
await task.queue_frame(LLMSetToolsFrame(tools=NOT_GIVEN))
await worker.queue_frame(LLMSetToolsFrame(tools=NOT_GIVEN))
elif user_turn_count == READD_AT_TURN - 1:
logger.info(
"=== Phase 2: weather tool RE-ADDED. Ask for the weather again — "
"does the LLM call it, or keep refusing? (THIS IS THE TEST.) ==="
)
await task.queue_frame(LLMSetToolsFrame(tools=weather_tools))
await worker.queue_frame(LLMSetToolsFrame(tools=weather_tools))
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
@@ -209,15 +209,15 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
),
}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -4,27 +4,27 @@
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Example demonstrating ``PipelineTask(app_resources=...)``.
"""Example demonstrating ``PipelineWorker(app_resources=...)``.
``app_resources`` is an application-defined bag of anything your
application code may want to share across a session: database handles,
HTTP clients, feature flags, per-user state, observability clients,
in-memory caches — whatever fits your app. Pipecat passes it through
untouched and exposes it as ``task.app_resources``, so any code with a
handle on the task can read or mutate it.
untouched and exposes it as ``worker.app_resources``, so any code with a
handle on the worker can read or mutate it.
Two of the convenience aliases exercised below:
- Tool handlers read it from ``FunctionCallParams.app_resources``.
- Custom ``FrameProcessor`` subclasses read it from
``self.pipeline_task.app_resources``.
``self.pipeline_worker.app_resources``.
This example uses two small loggers as stand-ins for that "shared thing":
``ToolCallLogger`` (written from tool handlers) and
``TranscriptionLogger`` (written from a custom ``FrameProcessor`` that
sits in the pipeline). A real app might just as easily pass a Postgres
pool, a Redis client, a Stripe SDK instance, or any combination thereof.
The mechanics shown here — construct once, hand to the task, read it
The mechanics shown here — construct once, hand to the worker, read it
from each site, inspect it after the session — are the same regardless
of what you put in.
@@ -50,7 +50,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import Frame, LLMRunFrame, TranscriptionFrame, TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -131,7 +131,7 @@ class AppResources:
get autocomplete and refactor safety:
- In tools: ``cast(AppResources, params.app_resources)``.
- In custom processors: ``cast(AppResources, self.pipeline_task.app_resources)``.
- In custom processors: ``cast(AppResources, self.pipeline_worker.app_resources)``.
"""
tool_call_logger: ToolCallLogger
@@ -155,8 +155,8 @@ class TranscriptionLoggingProcessor(FrameProcessor):
Demonstrates the second read site for ``app_resources``: any custom
``FrameProcessor`` can reach the same bag every tool handler sees by
going through ``self.pipeline_task.app_resources``. ``pipeline_task``
is ``None`` until the task sets the processor up, so we guard against
going through ``self.pipeline_worker.app_resources``. ``pipeline_worker``
is ``None`` until the worker sets the processor up, so we guard against
that case.
"""
@@ -164,8 +164,8 @@ class TranscriptionLoggingProcessor(FrameProcessor):
"""Forward all frames; log final user transcriptions on the way through."""
await super().process_frame(frame, direction)
if isinstance(frame, TranscriptionFrame) and self.pipeline_task is not None:
resources = cast(AppResources, self.pipeline_task.app_resources)
if isinstance(frame, TranscriptionFrame) and self.pipeline_worker is not None:
resources = cast(AppResources, self.pipeline_worker.app_resources)
resources.transcription_logger.log_transcription(frame.text)
await self.push_frame(frame, direction)
@@ -282,7 +282,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
transcription_logger=transcription_logger,
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -299,16 +299,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
# The session has ended; read whatever state the handlers built up.
logger.info(f"Tool calls logged during session:\n{tool_call_logger.dump()}")

View File

@@ -14,7 +14,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import DataFrame, LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -97,7 +97,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -124,16 +124,18 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
{"role": "developer", "content": "Please introduce yourself to the user."}
)
# Custom frames are pushed in order so they can be used for synchronization purposes.
await task.queue_frames([CustomBeforeProcessFrame(), LLMRunFrame(), CustomAfterPushFrame()])
await worker.queue_frames(
[CustomBeforeProcessFrame(), LLMRunFrame(), CustomAfterPushFrame()]
)
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -15,7 +15,7 @@ from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -130,7 +130,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -149,16 +149,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
groq_context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -21,7 +21,7 @@ from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.parallel_pipeline import ParallelPipeline
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -141,7 +141,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -160,16 +160,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
evaluator_context.add_message(
{"role": "developer", "content": "Ready to evaluate user messages."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -17,7 +17,7 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -128,7 +128,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -144,16 +144,16 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -14,7 +14,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -95,7 +95,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -112,7 +112,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
# Handle "latency-ping" messages. The client will send app messages that look like
# this:
@@ -128,13 +128,13 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.debug(f"Received latency ping app message: {message}")
ts = message["latency-ping"]["ts"]
# Send immediately
await task.queue_frame(
await worker.queue_frame(
DailyOutputTransportMessageUrgentFrame(
message={"latency-pong-msg-handler": {"ts": ts}}, participant_id=sender
)
)
# And push to the pipeline for the Daily transport.output to send
await task.queue_frame(
await worker.queue_frame(
DailyOutputTransportMessageFrame(
message={"latency-pong-pipeline-delivery": {"ts": ts}},
participant_id=sender,
@@ -146,11 +146,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -14,7 +14,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import TTSSpeakFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -99,7 +99,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -111,7 +111,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info(f"Client connected")
await task.queue_frames(
await worker.queue_frames(
[
TTSSpeakFrame(
text="Hello, welcome to live translation. Everything you say will be automatically translated to Spanish. Let's begin!",
@@ -123,11 +123,11 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

Some files were not shown because too many files have changed in this diff Show More