Restyle from a bespoke dark theme to the light theme the other UI demos
share: the canonical :root tokens (--border, --muted, --highlight), the
#fafafa/#18181b body, the sticky white header with the light/red Connect
button, the fixed bottom-right #status toast, and the amber
ui-highlight-pulse keyframe. index.html drops the custom topbar wrapper for
the standard <header> plus a standalone #status element.
Demonstrates the 'every input acts, may speak' pattern without bridging: a
standard voice pipeline (STT → LLM → TTS) whose LLM only converses, plus a
separate UIWorker that does all the list work. The voice pipeline's user
aggregator fires on_user_turn_stopped each turn and dispatches the transcript
to the UIWorker as a respond job (a bus message); the UIWorker reads the
auto-injected <ui_state> snapshot and drives the list silently via add_item /
set_checked / remove_item commands (plus the standard highlight). Items are
checkboxes whose label and checked state the snapshot exposes.
Includes a vanilla-JS client following the existing UI-demo client style.
Move <ui_state> snapshot injection out of respond_with_llm into a
cross-cutting on_before_process_frame handler on the UIWorker's LLM, so it
appends the current snapshot to the context the request is built from, just
before each inference. Injection is gated to the user-turn-initiating
inference so a tool-calling turn never stacks duplicate <ui_state> blocks;
respond_with_llm no longer injects manually.
Also drop the bridged parameter from UIWorker: there is no viable way to
bridge a UIWorker between workers — a shared, teed context would be polluted
by the injection, and per-worker turn detection off teed frames isn't
supported. Other workers keep their PipelineWorker bridging.
Synthesis example: a ReplyToolMixin UIWorker adds a start_review tool that fans
out to clarity/tone peers via start_user_job_group, translates each reviewer
response into an add_note command in on_job_response, handles a client
note_click event via @on_ui_event, and keeps history across turns.
A UIWorker with a custom reply tool fans research out to three BaseWorker peers
via start_user_job_group; their progress streams to the client as ui-task cards
and the user can cancel a group mid-flight.
A ReplyToolMixin UIWorker that fills inputs (fills) and toggles checkboxes /
presses submit (click) by voice — the state-changing half of the standard
action set.
A ReplyToolMixin UIWorker that grounds in the user's text selection (the
<selection> block in the snapshot) and points back via select_text — both
directions of deictic reference.
The voice LLM delegates to a ReplyToolMixin UIWorker that scrolls offscreen
items into view and highlights the phones it names — exercising the scroll_to /
highlight UI commands and the [offscreen] state tag.
Smallest UIWorker demo: a voice LLM in the main pipeline delegates
screen-relevant utterances to a UIWorker via a respond job; the UIWorker
auto-injects the current <ui_state> and answers grounded in what's on screen.
Includes a vanilla-JS client that streams accessibility snapshots over RTVI.
UIWorker is an LLMContextWorker that observes and drives a client GUI over the
RTVI UI channel: it stores accessibility snapshots, auto-injects <ui_state> at
the start of each respond job, dispatches client events to @on_ui_event
handlers, sends UI commands back to the client, and surfaces fan-out work as
cancellable task cards via user_job_group(). The optional ReplyToolMixin exposes
a bundled reply tool.
The prompt_guide parameter auto-appends the UI wire-format guide to the LLM's
system instruction (default UI_STATE_PROMPT_GUIDE; override with a string or
disable with None), so the LLM can parse the injected <ui_state> / <ui_event>
messages without the app concatenating the guide by hand.
When RTVI is enabled, PipelineWorker now republishes inbound ui-event /
ui-snapshot / ui-cancel-task messages onto the bus as a broadcast
BusUIEventMessage, and translates outbound BusUICommandMessage / BusUITask*
carriers into the matching RTVI frames. This lets a UIWorker on the bus observe
and drive the client UI with no decorator or manual wiring; when no UIWorker is
present the events are simply unconsumed.
The BusUI* carriers live in the bus layer so both pipeline and workers can
reference them without an import cycle.
Composes durable text onto a user-provided system instruction (alongside the
turn-completion and async-tool-cancellation addons) so it is prepended on every
inference and survives context-message resets. The user's base prompt is now
snapshotted once and the effective instruction is always rebuilt from it,
replacing the prior lazy capture/restore logic with a single invariant.
Lets callers register multiple workers in a single call instead of
awaiting add_worker() repeatedly. Updates all examples, docs, tests,
and proxy worker docstrings to use the new API.
PipelineWorker.__init__ was only forwarding `name` to BaseWorker, so
the `active` flag (the other BaseWorker constructor arg) wasn't
reachable from PipelineWorker callers. Add `active: bool = True` to
the signature and pass it through.
BusMessage was a mixin tacked onto DataFrame / SystemFrame so the bus
could reuse the frame priority machinery. That made every bus message
also a Frame, which is misleading — bus messages travel on the bus, not
through pipelines. If a worker actually needs to ship a frame, it wraps
it in BusFrameMessage.
BusMessage is now a plain dataclass base carrying source/target.
BusDataMessage and BusSystemMessage are empty subclasses that exist
only as priority markers. The bus router and the priority queue check
``isinstance(item, BusSystemMessage)`` directly instead of
``isinstance(item, SystemFrame)``.
The serializer test that round-tripped DataFrame.name (a non-init
field) is rewritten against a local _MessageWithNonInit(BusDataMessage)
subclass so the serializer's init=False path stays covered.
Symmetric with send_bus_message; "send_bus_error" on its own reads
ambiguously (sounds like an error about the bus, à la SIGBUS) and the
underlying types are BusTaskErrorMessage / BusTaskLocalErrorMessage,
so keeping "_message" in the name matches what's actually sent.
Mirrors on_bus_message and makes it explicit that the call goes out on
the task bus, not on a transport (transports have their own
send_message for client/peer messaging).
BaseTask.handoff_to was just deactivate_self + activate_task. Remove
it and add a deactivate_self flag on activate_task instead, so there's
one entry point for activating another task.
LLMTask now overrides activate_task (mirroring its end() override) to
keep the messages / result_callback hooks that finish an in-progress
tool call before the target is activated. All multi-task examples and
unit tests switch to the new call.
PipelineRunner now picks up an async setup_pipeline_runner(runner) hook
from the same PIPECAT_SETUP_FILES env var that PipelineTask already uses
for setup_pipeline_task. Previously the runner used a separate
PIPECAT_RUNNER_SETUP_FILES variable and a setup_runner function — both
are removed.
A new _setup_files module hosts the loader for both hooks and caches
each setup file's module so a single file defining both hooks (e.g. a
debugger that registers a runner-level task in one hook and a per-task
observer in the other) sees its module-level state preserved across
invocations.
Rename ``_build_greeter`` / ``_build_support`` to ``build_greeter`` /
``build_support`` to match the convention used by other multi-task
examples (e.g. ``build_sensor_controller``). They're public factories
the example exposes; the leading underscore was misleading.
``BaseObject.create_task`` already auto-names the task based on the
coroutine; the explicit ``f"{self}::..."`` strings duplicated that
default and made the call sites noisier. Remove them.
A voice agent talking to a worker that owns a simulated temperature
sensor. Demonstrates two ``PipelineTask`` instances side by side
communicating purely via ``BusJobRequestMessage`` /
``BusJobResponseMessage`` — the worker is a plain ``PipelineTask``
(no ``LLMTask`` subclassing, not bridged) whose pipeline runs both an
autonomous sensor tick loop and its own tool-calling LLM:
SensorReader -> SensorStats -> user_agg -> llm -> assistant_agg
The voice agent's LLM has a single tool, ``ask_controller(question)``,
that forwards the user's request verbatim to the worker and speaks
back the controller's reply. The worker LLM has direct tools to read
the current temperature, inspect rolling stats, set the target, or
change the response rate; the sensor simulation drifts toward the
target with a first-order lag plus Gaussian noise.
Job responses are paired with completed LLM turns via the assistant
aggregator's ``on_assistant_turn_stopped`` event, skipping empty
turn-stopped events that fire between a tool call and its result.
Switch every example to ``await runner.spawn(task)`` followed by
``await runner.run()`` (no task argument), and ``await runner.cancel()``
on client-disconnected instead of ``await task.cancel()``. This makes
the main pipeline task look the same as the worker / proxy tasks
spawned alongside it, and lets ``runner.cancel()`` drive a uniform
shutdown across every root task on the bus.
Spell out that spawned tasks finishing on their own does not unblock
``runner.run()`` when called without a ``task`` argument. The form is
for hosts (e.g. FastAPI servers) that have no single "main" pipeline
and want to stay up across many spawned sessions; callers who want
the runner to finish when a specific pipeline finishes should pass
that pipeline as ``task``.
``on_activated`` on this task opens the upstream WebSocket connection,
which is almost always something the caller wants to trigger
explicitly (e.g. on local-client-connected). With the BaseTask default
of ``active=True`` the connection was opened twice: once when the task
auto-activated at start, and once again when the caller's
``activate_task("proxy")`` re-fired ``on_activated``. The result on
the remote side was two ``PipelineRunner`` instances per session
instead of one.
Default to ``active=False`` so the activation is a deliberate signal;
pass ``active=True`` explicitly to restore the eager-connect behavior.
``test_pgmq_bus_lazy_import`` and ``test_redis_bus_lazy_import``
import ``pipecat.bus.network.pgmq`` / ``redis`` directly, which raises
when the optional ``pgmq`` / ``redis`` packages are missing. Gate each
test with ``@unittest.skipUnless`` on a top-level probe of the
underlying package so they're skipped (not errored) in environments
without the extras. ``test_unknown_attribute_raises`` is unaffected.
The PGMQ and Redis bus modules raise an ``Exception`` at import time
when the optional ``pgmq`` / ``redis`` packages are missing, which broke
``pytest`` collection in environments without those extras (e.g. CI
that uses ``--no-extra gstreamer --no-extra local``). Wrap the imports
in ``try/except`` and ``raise unittest.SkipTest`` so the whole test
module is skipped cleanly instead of failing collection.
``_wait_tasks_ready`` is only called from
``create_job_group_and_request_job``, so it belongs with the other
job-group internals (``_create_job_group``, ``_send_job_request``,
``_task_timeout``, ...) rather than next to the task-readiness
helpers (``_register_ready``, ``_on_watched_task_ready``).
``BaseTask._handle_task_end`` and ``_handle_task_cancel`` now call
``stop()`` after propagating to children, so bus-only subclasses
(``WebSocketProxyClientTask``, ``WebSocketProxyServerTask``, custom
worker tasks like ``CodeWorker``) don't need to override these
handlers just to set ``_finished_event``.
Children-propagation is extracted into ``_propagate_end_to_children``
and ``_propagate_cancel_to_children`` so ``PipelineTask`` can call
them directly without invoking ``stop()`` prematurely — the pipeline
still drives its own shutdown through the ``EndFrame`` / ``CancelFrame``
path, which triggers ``on_pipeline_finished`` and ``stop()`` after the
pipeline drains.
Drop the now-redundant overrides from the WebSocket proxy tasks.
PipelineTask had its own ``_pipeline_finished_event`` that signalled
"pipeline run has truly finished" — the same role ``BaseTask._finished_event``
plays for bus-only tasks. They were two events with the same intent.
Set ``_finished_event`` directly when the pipeline-end frame propagates
through the sink, drop the now-redundant field, and drop the
``clear()`` after wait so the event stays set for the lifetime of the
task. As a side-effect, ``await pipeline_task.wait()`` from outside now
resolves at the moment the pipeline finishes, matching the semantics
of bus-only tasks.
claude_agent_sdk's _AsyncioTaskHandle.wait() uses
`with suppress(asyncio.CancelledError)` to silence the inner read
task's expected cancellation, but it also swallows the outer task's
cancellation if it lands on the same await — causing cancel_task to
time out.
Bypass `async with ClaudeSDKClient` and drive connect/disconnect
ourselves so disconnect() runs in a finally where the outer
CancelledError has already been raised and suspended by Python's
exception machinery, out of reach of the SDK's suppress.
self.queue_frame would defer the LLMMessagesAppendFrame because
_finish_function_call always runs inside a tool call. The subsequent
_flush_pipeline() then returned before the goodbye/handoff LLM output
was actually delivered. Use super().queue_frame to push the frame
straight into the pipeline, matching the pattern used in
_flush_pipeline().
Brings over 215 tests across 15 files covering the new
multi-task framework: BaseTask / PipelineTask bus lifecycle,
job RPC and job groups, the bus message hierarchy and serializers,
TaskBus + AsyncQueueBus + RedisBus + PgmqBus (with direct and
isolated backends), TaskRegistry, the BusBridgeProcessor, the
WebSocket proxy tasks, the LLMTask deferral logic, and the
PipelineRunner spawn-and-attach flow.
Move the wire-side of PGMQ operations into a new
``pipecat.bus.network.pgmq_backends`` module with a ``PgmqBackend``
Protocol, a ``DirectPgmqBackend`` (peers discovered by queue prefix),
and an ``IsolatedPgmqBackend`` (SECURITY DEFINER ``public.bus_*``
wrappers over an asyncpg pool). ``PgmqBus`` now delegates join,
publish, read, archive, and leave to the configured backend.
Construct ``PgmqBus`` with either ``pgmq=PGMQueue`` (uses
``DirectPgmqBackend``) or ``backend=PgmqBackend`` (any backend); the
two are mutually exclusive.
Adapts the pipecat-subagents `examples/README.md` to the new
layout (`multi-task/` umbrella, `local-handoff/`, `distributed-handoff/`,
`remote-proxy-assistant/`, `parallel-debate/`, `code-assistant/`),
updates the agent→task / job-RPC vocabulary, drops the
single-agent and llm-and-flows examples (gone in the port), and
adds a new section for the PGMQ handoff transport.
`pipecat.bus.network.pgmq` and `pipecat.bus.network.redis` need
optional dependencies. Adding `pgmq` and `redis` extras so users
can `pip install pipecat-ai[pgmq]` / `pip install pipecat-ai[redis]`
to opt in.
Demonstrates the WebSocket proxy tasks: a local `main.py` voice
bot uses `WebSocketProxyClientTask` to forward bus messages
(including `BusFrameMessage`s) to a remote `assistant.py`
FastAPI server. Each incoming connection spawns a
`WebSocketProxyServerTask` plus an `LLMTask` assistant on a
per-session `PipelineRunner`.
Two transports of the same shape: a main task that hosts the
voice pipeline plus a network-backed `TaskBus` (`RedisBus` or
`PgmqBus`), and a standalone `llm.py` worker process for the
greeter / support LLM. Workers connect to the same bus channel,
register on the shared `TaskRegistry`, and the main task waits
on `runner.registry.watch("greeter", ...)` before sending the
welcome activation so it doesn't fire before the worker is up.
A voice moderator that fans out a debate topic to three worker
tasks (advocate, critic, analyst) via `task.job_group(...)`,
then synthesizes their replies. Workers are `LLMContextTask`s
that keep their own conversation context across rounds and use
the assistant-aggregator's `on_assistant_turn_stopped` event
to ship the completed turn back as a job response.
Variant of the local handoff example with per-task TTS voices.
Each child task wraps the LLM with its own `CartesiaTTSService`
in a custom pipeline override, so the main task has no TTS and
audio comes from whichever child is active over the bus.
Voice code assistant that dispatches questions to a Claude Agent
SDK worker. The main task runs the voice pipeline (STT + LLM + TTS)
and an `ask_code` direct function. `CodeWorker` is a bus-only
`BaseTask` spawned on the runner: it accepts `@job`-style
requests through the bus, queues them onto an asyncio queue, and
runs them sequentially through a persistent Claude SDK session so
follow-ups share context. The example shows the job-RPC surface
(`task.job("code_worker", ...)`), bus-only tasks (no pipeline),
and the `pipeline_task` field on `FunctionCallParams`.
Two LLM tasks (greeter and support) handing off to each other over
the local `AsyncQueueBus`. The main task owns the transport
pipeline (STT, TTS, transport I/O) and the child tasks each run
their own LLM behind a `BusBridgeProcessor`. Each child uses
`bridged=()` so `PipelineTask` auto-wraps its pipeline with
the bus edge processors, and `transfer_to_agent` / `end_conversation`
tools demonstrate `handoff_to(...)` and `end(...)`.
- `TaskBus._router_task`: cast the narrowed `SystemFrame` back
to `BusMessage` for the subscriber callback.
- `bus.network.__init__`: expose `PgmqBus` / `RedisBus` to
the type-checker via a TYPE_CHECKING block so `__all__` is
satisfied; runtime path still goes through `__getattr__`.
- `RedisBus`: subscribe through a local before assigning
`self._pubsub`, and `assert self._pubsub is not None` in
the reader loop.
- `BaseTask.on_job_error` accepts
`BusJobResponseMessage | BusJobResponseUrgentMessage` to match
what is dispatched.
- `JobGroupContext.__aexit__` / `JobContext.__aexit__`: assert
`self._group is not None` before `wait()`.
- `@task_ready` collector: type handlers dict as `dict[str, Callable]`
so the `.__name__` read on a duplicate handler typechecks.
- WebSocket proxy client/server: assert the socket is set in
`_receive_loop`, and decode `str` payloads to bytes before
handing them to the serializer.
`WebSocketProxyServerAgent` / `WebSocketProxyClientAgent` are
renamed to `WebSocketProxyServerTask` / `WebSocketProxyClientTask`
and updated for the post-refactor surface:
- Drop `bus=` from the constructor; the bus arrives via
`BaseTask.attach` from the runner.
- Constructor params `agent_name` / `remote_agent_name` /
`local_agent_name` → `task_name` / `remote_task_name` /
`local_task_name` (matching `BusBridgeProcessor`).
- Move setup logic from the now-removed `on_ready` hook into
`start()`; replace `_stop()` overrides with `stop()`.
- Add `_handle_task_end` / `_handle_task_cancel` overrides that
set `_finished_event` so `PipelineRunner._cancel_spawned_tasks`
can drive these bus-only tasks to a clean exit.
- Update the registry-message field reference
(`agents=`/`message.agents` → `tasks=`/`message.tasks`)
and `TaskReadyData.task_name` access.
- Tighten the server's `_send_ws` exception handling to only
catch `WebSocketDisconnect`.
- Update install hints (`pipecat-ai[websockets-base]` for the
client, `starlette` for the server) and refresh docstrings/
examples to use `runner.spawn(...)`.
Cleans up leftover "agent" terminology in module/class/method
docstrings across `pipecat.bus`, `pipecat.registry`,
`pipecat.pipeline`, and `pipecat.tasks.llm`, and renames
job-RPC phrasing ("task request", "task identifier",
"task group execution") to use "job" consistently.
API-visible changes:
- `BusBridgeProcessor(agent_name=, target_agent=)` → `task_name=` /
`target_task=`.
- `@task_ready` decorator's internal marker
`fn.agent_ready_name` → `fn.task_ready_name`.
- `@tool` decorator's internal marker
`fn.is_agent_tool` → `fn.is_llm_tool`.
- `PIPECAT_SUBAGENTS_SETUP_FILES` env var →
`PIPECAT_RUNNER_SETUP_FILES`.
- pgmq/redis bus install hints point at `pipecat-ai\[extra\]`
rather than the old `pipecat-ai-subagents\[extra\]` package.
Fixes a name collision where `_handle_task_cancel` was defined
twice — once for `BusCancelTaskMessage` (task lifecycle) and
once for `BusJobCancelMessage` (job RPC) — the second silently
shadowing the first. Job-side dispatchers are now consistently
named `_handle_job_*` and the internal helpers
`_run_task_handler` / `_send_task_request` become
`_run_job_handler` / `_send_job_request`. Task-lifecycle
handlers (`_handle_task_end`, `_handle_task_cancel`,
`_handle_task_activate`, `_handle_task_deactivate`,
`_handle_task_error`) keep their names.
`PipelineRunner.run(task)` now calls `spawn(task)` first (which
runs `task.attach()`) and lets `_setup_session` start every
registered entry — main and pre-spawned — through the same path,
instead of relying on `spawn`'s post-running fast-path to start
the main task after setup. The two-branch wait stays for the
`task is None` case but reads the runner_task directly off the
freshly-spawned entry.
`BaseTask` no longer takes `bus=` in its constructor. Instead
the runner now hands both the registry and the bus to a task via
`task.attach(registry=..., bus=...)` (called from
`PipelineRunner.spawn()`), and `bus` / `registry` are
properties that raise if accessed before attach. `PipelineTask`,
`LLMTask`, and `LLMContextTask` lose their `bus=` parameters
to match, and `_BusEdgeProcessor` now stores only a task
reference and reads `task.bus` lazily so bridged pipelines work
even though the bus isn't known at construction time.
`FrameProcessorSetup.pipeline_task` is now mandatory and
`FrameProcessor.pipeline_task` raises if accessed before setup
instead of returning `None`. `FunctionCallParams` gains a
required `pipeline_task` field and `LLMService._run_function_call`
populates it (plus reads `app_resources` directly off the
pipeline task). Tests that build a processor or
`FunctionCallParams` outside a real pipeline stub it with a
`SimpleNamespace`.
Adds `pipecat.tasks.llm` with `LLMTask` (LLM pipeline + `@tool`
collection + tool-call deferral via `PipelineFlushFrame`),
`LLMContextTask` (LLM + `LLMContextAggregatorPair`), and the
`@tool` decorator. Also includes `pipecat.tasks.proxy.websocket`
client/server stubs that need a follow-up port to the new
`BaseTask` lifecycle.
`PipelineRunner` now owns the shared `TaskBus` and
`TaskRegistry` and runs all tasks (the main one plus any
spawned ones) through a unified `_start_task` / `_run_task`
background-task path. Adds `spawn(task)` for fire-and-forget
task registration, threads `end()` / `cancel()` through
`BusEndTaskMessage` / `BusCancelTaskMessage` to all root
tasks, and broadcasts/handles `BusTaskRegistryMessage` for
remote-runner discovery. The runner now wires its own
`TaskManager` via `super().setup(...)` so internal
`create_task` calls go through `BaseObject`.
`PipelineTask` now extends `BaseTask` so every pipeline task is
also a bus participant. Adds optional `bus`, `bridged`, and
`exclude_frames` parameters: when `bridged` is set, the user's
pipeline is wrapped with `_BusEdgeProcessor` source/sink edges so
frames are mirrored onto the bus. Bridges pipeline lifecycle
events to `start()`/`stop()`, overrides `_handle_task_end` /
`_handle_task_cancel` to drive the pipeline shutdown, subscribes
to the bus in setup, and exposes the `bridged` property to the
registry. Moves `PipelineTaskParams` here and updates the
matching test import.
Drops the old abstract `BasePipelineTask` and replaces it with
`BaseTask` — the common base for any runtime task. `BaseTask`
subscribes to a `TaskBus`, participates in the shared
`TaskRegistry`, handles activation / deactivation, end / cancel,
and the full `@job` RPC surface (request_job, job, job_group,
send_job_response / update / stream_*, etc.). It ships a default
`run()` for bus-only tasks; subclasses with their own runtime
(e.g. `PipelineTask`) override it.
Adds `JobContext` / `JobGroupContext` async context managers,
the `JobGroup` / `JobGroupEvent` / `JobGroupResponse` /
`JobGroupError` types, the `@job` decorator (with collector),
and the `@task_ready` decorator (with collector). These power
the bus-driven job RPC between tasks.
Adds ``pipecat.bus.network.pgmq.PgmqBus``, a PGMQ-backed
:class:`TaskBus` adapter that implements pub/sub fan-out over
PGMQ's point-to-point queue semantics. Each bus instance owns its
own queue, broadcasts on publish to peers discovered by channel
prefix, and long-polls its queue to dispatch received messages
to local subscribers.
Requires the optional ``pgmq`` extra
(``pip install pipecat-ai[pgmq]``).
Introduces `TaskBus`, the in-process `AsyncQueueBus`, the bus
message hierarchy (lifecycle, jobs, frames, registry), a
priority-aware bus queue, the `BusSubscriber` mixin, and the
`BusBridgeProcessor` / internal `_BusEdgeProcessor` used to
exchange frames between a local pipeline and the bus.
Introduces `TaskRegistry` and the supporting `TaskReadyData`,
`TaskErrorData`, and `TaskRegistryEntry` dataclasses used to track
local and remote tasks discovered through the bus.
Adds InceptionLLMService, an OpenAI-compatible service for Inception's
Mercury-2 diffusion-based reasoning model. Supports reasoning_effort
(instant/low/medium/high) and realtime mode for reduced TTFT.
Adds tests for AggregatedFrameSequencer, WordCompletionTracker, and
word_timestamp_utils (including CJK language scenarios). Updates existing
Cartesia TTS and TTS frame ordering tests to cover the new behaviours.
TTSTextFrame entries were losing their original text structure when word
timestamps were enabled. AggregatedTextFrame now carries a raw_text field with
the original LLM-produced text (including pattern delimiters such as
<card>...</card>). The assistant context receives properly-tagged content
rather than the cleaned words returned by the TTS provider. Also handles words
that straddle two sentence boundaries by splitting and attributing each part
to its correct source frame.
SSML markup (e.g. <spell>, <emotion>, <break>) was leaking into word entries
returned by the Cartesia word-timestamps API. Tags are now stripped before
processing so word-to-text attribution remains accurate when SSML is present
in the TTS input.
Frames sharing the same presentation timestamp were being reordered by the
priority queue. Adds a monotonic counter as a tiebreaker so frames with equal
PTS are always emitted in insertion order, preventing subtle audio/text
sequencing bugs.
Skipped frames (e.g. code blocks filtered via skip_aggregator_types) were
emitted to the assistant context immediately instead of waiting for preceding
spoken frames to finish. Introduces AggregatedFrameSequencer to hold each
frame's slot and flush only after all earlier spoken sentences are complete,
keeping context ordering correct.
The keepalive could fire for a new turn's context before that context's
voice_settings context-init was sent, making the keepalive the context's
first message (no voice_settings) and causing ElevenLabs to reject the
later init with a 1008 policy violation. The keepalive now only targets a
context once its context-init has been sent (tracked in _context_init_sent).
Mirrors the deprecation in ``OpenAITTSService.__init__``: ``instructions``
is now a Settings field. The constructor still accepts it for backward
compatibility but the canonical path is through ``Settings``.
A copy of ``turn-management-filter-incomplete-turns.py`` extended with
a ``get_weather(location)`` direct function. Exercises the path where
the LLM responds to a complete user turn by calling a tool — used to
reproduce (and now verify the fix for) the ``_user_speaking`` gating
bug between filter-incomplete and function calls.
With ``filter_incomplete_user_turns`` enabled, an LLM that responded to
a user turn by calling a tool (without first emitting a ✓ marker)
never finalized the user turn. ``UserStoppedSpeakingFrame`` stayed
deferred, the assistant aggregator kept ``_user_speaking=True``, and
when ``FunctionCallResultFrame`` arrived its ``not self._user_speaking``
gate dropped the context push — the LLM continuation never ran and
the call hung silently.
Broadcast ``UserTurnInferenceCompletedFrame`` on
``FunctionCallsStartedFrame`` (i.e. the moment the LLM commits to a
tool call, before the function dispatches), gated by a new
``_turn_completion_broadcasted`` flag so the ✓ path and the tool-call
path don't both fire. The flag resets in ``_turn_reset`` alongside
the other per-turn state.
Emitting on the start frame rather than ``LLMFullResponseEndFrame``
also shrinks the race window — ``UserStoppedSpeakingFrame`` (a
``SystemFrame``) has the maximum possible head start over the
``FunctionCallResultFrame`` (``DataFrame``) that follows.
Drop the EU-region default from the STT/TTS WebSocket URLs in favor of
the generic api.gradium.ai endpoint, and remove the explicit overrides
from the examples so they pick up the new defaults.
Mirrors the deprecation in ``QwenLLMService.__init__``: ``model`` should
be passed via ``settings=QwenLLMService.Settings(model=...)`` instead of
as a direct constructor arg.
TTS services whose wire protocol does not echo the context_id back on
incoming audio (Sarvam, Smallest, Soniox, Inworld, ...) call
``get_active_audio_context_id()`` to tag each chunk. That accessor
returned only ``_playing_context_id`` — the playback-side cursor set
asynchronously by ``_audio_context_task_handler`` when it pops a context
off the serialization queue.
Result: incoming audio that arrived in the gap between contexts or at
the very start of a turn (before the playback loop popped) had
``context_id=None`` and was dropped with
``unable to append audio to context: no context ID provided``.
Fall back to ``_turn_context_id`` (the synthesis-side cursor, set as
soon as the turn's context is created) so the gap is covered without
prematurely nulling the playback cursor.
- Replace custom LANGUAGE_MAP fallback in language_to_inworld_language with
resolve_language(language, LANGUAGE_MAP, use_base_code=False) to match the
pattern used by other services and restore the unverified-language warning
- Tighten delivery_mode type from str to Literal["STABLE", "BALANCED", "CREATIVE"]
- Update changelog entry to mention delivery_mode and language normalization
traced_llm only attached the aggregated ``output`` attribute to the
span after the wrapped function returned successfully. When the LLM
call was cancelled mid-stream (e.g. interruption during generation),
the accumulated text was discarded — the span had no ``output``.
Moved the attribute assignment into the ``finally`` block alongside
the existing TTFB write so the partial text we already captured via
the patched ``push_frame`` lands on the span regardless of whether
``f`` returned normally, raised, or was cancelled.
@traced_stt had the same root issue as @traced_tts: the span lifetime
was tied to a per-transcript handler call, which doesn't match the
operation we want to trace. Now uses the __set_name__ pattern to
install:
- A push_frame wrapper that drives one STT span per finalized
TranscriptionFrame. The span is anchored at speech start
(VADUserStartedSpeakingFrame.timestamp - start_secs) but lazy-opened
on the first TranscriptionFrame. Opening earlier (on VAD or
UserStartedSpeakingFrame) races with TurnTraceObserver._handle_turn_started,
which runs as a background task via _call_event_handler (sync=False),
so the span would end up parented to the previous turn. Deferring
the open to the first TranscriptionFrame avoids that race because
STT only emits transcripts well after the turn observer has set
the current turn's context.
- A stop_ttfb_metrics wrapper that closes the span on the TTFB-timeout
path (called with end_time != None from stt_service.py:566). The
span is marked stt.timed_out=True and its end_time is pinned to
the timeout's end_time (= _last_transcript_time) so the duration
reflects when STT actually stopped responding, not when the timeout
fired.
Span lifecycle:
- Open: lazy on first TranscriptionFrame of a segment.
- Close (success): finalized=True attaches metrics.ttfb and closes
the span. Multiple finalized transcripts in a single turn produce
multiple spans.
- Close (timeout): stop_ttfb_metrics(end_time=...) closes with
stt.timed_out=True.
- Close (orphan): UserStoppedSpeakingFrame closes any still-open
span with stt.incomplete=True (covers turns where no finalized
transcript and no timeout fired).
No changes required outside service_decorators.py — stt_service.py
and every per-service file are untouched.
Previously @traced_tts scoped the span to the lifetime of run_tts(). For
streaming TTS services run_tts() returns as soon as the synthesis request
is sent, long before audio chunks arrive, so:
- The span duration measured the WebSocket-send time, not synthesis time.
- The first synthesis recorded the WS-send duration as metrics.ttfb (via
the in-progress fallback in FrameProcessorMetrics.ttfb).
- Subsequent syntheses recorded the previous call's TTFB on the current
span (off-by-one).
The decorator now uses a __set_name__ descriptor to wrap the owning
class's setup() at class definition time. setup() installs per-instance
patches on create_audio_context, append_to_audio_context,
remove_audio_context, on_audio_context_completed, and
reset_active_audio_context. These patches own the span lifetime:
- create_audio_context: open span, set baseline attributes.
- append_to_audio_context: record metrics.ttfb on the first
TTSAudioRawFrame (when stop_ttfb_metrics has produced a real value),
end span on appended TTSStoppedFrame.
- on_audio_context_completed: end span on natural completion (handles
services that auto-push TTSStoppedFrame via push_frame, bypassing
append_to_audio_context).
- remove_audio_context: safety net for explicit removal paths.
- reset_active_audio_context: interruption hook (always reached from
_handle_interruption); marks the span tts.interrupted=true only when
nothing else has closed it.
The run_tts wrapper now only attaches per-call attributes (text,
metrics.character_count) to the already-open span. No changes required
in tts_service.py or in any of the per-service files.
Same async-tool routing approach as #4441: detect async-tool messages in
the LLM context, deliver the final result via the formal tool-result
channel.
Caveat: as of this writing, Inworld Realtime doesn't appear to handle
the resulting delayed tool result reliably, so the routing is
best-effort and the service emits a one-time warning when async-tool
messages are seen. Streamed intermediate results remain unsupported.
Also adds function calling to the realtime-inworld.py example, and
softens the Inworld mention in the #4447 changelog now that the
exclusion is being closed.
A single Realtime API response can now contain more than one audio item
(observed with gpt-realtime-2), and the first item's audio.done can
arrive after deltas from the second have started arriving. Deltas still
arrive strictly in playback order across items, so we keep forwarding
them as received — matching OpenAI's reference implementation.
Adjusted OpenAIRealtimeLLMService so a multi-item response is treated as
one continuous TTS turn:
- _handle_evt_audio_delta: on item switch, advance the tracked item in
place (reset total_size) without emitting another TTSStartedFrame.
Truncation now always targets the latest item.
- _handle_evt_audio_done: debug-trace only; no longer pushes
TTSStoppedFrame.
- _handle_evt_response_done: pushes a single TTSStoppedFrame per turn,
bookending the audio with the Started pushed on the first delta.
Added tests covering single-item, overlapping multi-item, non-overlapping
multi-item, and interrupt-during-multi-item (last-item-wins truncation).
`collections.abc.Coroutine` doesn't expose `cr_code`/`co_name`; only
native coroutine objects do. Use `getattr` chains so pyright is happy
and any non-native awaitable falls back to a generic task name instead
of crashing.
TaskObserver previously took a TaskManager in __init__ and reached into
it directly. Since BaseObject now provides task_manager / create_task /
cancel_task, drop the constructor argument and call
`observer.setup(task_manager)` from PipelineTask._setup() before
starting it.
PipelineTask owns its TaskManager but is itself a BaseObject, so it
inherits create_task/cancel_task. Replace the explicit
self._task_manager.create_task(coro, f"{self}::name") call sites with
self.create_task(coro, "name") for consistency with other BaseObject
subclasses.
PipelineTask owns its TaskManager (still constructed in __init__ since
TaskObserver needs it eagerly). Adding the explicit
`await super().setup(self._task_manager)` in `_setup()` formalizes the
BaseObject lifecycle so any future wiring added to BaseObject.setup is
picked up automatically.
PipelineTask owns its TaskManager but is itself a BaseObject, so it
inherits create_task/cancel_task. Replace the explicit
self._task_manager.create_task(coro, f"{self}::name") call sites with
self.create_task(coro, "name") for consistency with other BaseObject
subclasses.
Lift the task manager wiring (`_task_manager`, `task_manager` property,
`create_task`, `cancel_task`, and `setup(task_manager)`) up to
`BaseObject`. Owners propagate the task manager to their child
`BaseObject`s via `await child.setup(task_manager)`, matching the
existing convention.
Removes duplicated `_task_manager` / `task_manager` property / setup
implementations from `FrameProcessor`, `FrameProcessorMetrics`,
`UserIdleController`, `UserTurnController`,
`BaseUserTurnStartStrategy`, and `BaseUserTurnStopStrategy`.
GeminiLiveVertexLLMService overrides _supports_non_blocking_tools to
return False — Vertex AI's Gemini Live endpoint doesn't yet accept the
NON_BLOCKING behavior field on function declarations or the scheduling
field on FunctionResponse, and sending either breaks tool calling.
Effect: function declarations sent to Vertex no longer carry
NON_BLOCKING; FunctionResponses no longer carry scheduling: WHEN_IDLE.
Users registering a function with cancel_on_interruption=False against
Vertex get the same one-time logger.error + push_error the base class
surfaces on Gemini 3.x.
Mirrors the same change applied to AWSNovaSonicLLMService and
OpenAIRealtimeLLMService in #4441 / GrokRealtimeLLMService in #4447:
replaces the implicit "final happens last" pattern in
_process_completed_function_calls with an explicit
`if async_payload.kind == "final":` block, plus a trailing defensive
`continue` so async-tool messages with an unrecognized kind don't fall
through to the regular tool-result handling block.
Honors cancel_on_interruption=False on Gemini Live for models that support
Gemini's NON_BLOCKING tool mechanism (Gemini 2.x at the time of writing).
Function declarations registered via register_function(...,
cancel_on_interruption=False) are sent with behavior: NON_BLOCKING so the
conversation continues while the tool runs; the matching FunctionResponse
carries scheduling: WHEN_IDLE so the result lands at a graceful pause
rather than mid-sentence. Synchronous (default) tools stay BLOCKING —
applying NON_BLOCKING uniformly produced filler responses like "let me
look that up for you" on regular calls, since the model knew it would
have an opportunity to keep talking while waiting.
A new _supports_non_blocking_tools property gates the flow. On models
that don't support it (currently Gemini 3.x), the service falls back to
plain blocking behavior and surfaces a one-time error + ErrorFrame the
moment async-tool messages first appear in the context, explaining that
the flag's intent is not achievable.
Caveat (Gemini 2.5): an intermittent server-side 1008 "Operation is not
implemented" error can fire when realtime input arrives during a pending
tool call. We auto-reconnect, but the user may need to repeat what they
were saying. The proposed mitigation
(https://discuss.ai.google.dev/t/gemini-live-api-websocket-error-1008-operation-is-not-implemented-or-supported-or-enabled/114644/56)
of gating realtime input during pending tool calls is fundamentally
incompatible with NON_BLOCKING tool calling, so we don't apply it.
Lays groundwork for cancel_on_interruption=False support on Gemini Live by
restructuring _process_completed_function_calls to match the shape used by
AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass
forward iteration over raw context messages that detects async-tool
messages via async_tool_messages.parse_message and routes them — started
skipped silently, intermediate logged-as-error and surfaced via push_error,
final delivered via the formal FunctionResponse channel.
Replaces the prior two-pass structure that went through the adapter for
sync results — the service now uses a lightweight self._tool_call_id_to_name
map (populated when the model issues tool calls) for the name lookup the
adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict
static method for the dict-coercion logic that wraps non-dict tool returns
as {value: <result>} for Gemini's FunctionResponse.response field; the
adapter's existing inline copy in _from_standard_message uses it too.
Example consolidation:
- Folds realtime-gemini-live-function-calling.py into the base
realtime-gemini-live.py example so the base exercises function calling
out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py).
- Renames realtime-gemini-live-vertex-function-calling.py to
realtime-gemini-live-vertex.py, mirroring the consolidation.
- Adds realtime-gemini-live-async-tool.py.
- Updates scripts/evals/run-release-evals.py for the renames.
This commit alone doesn't make cancel_on_interruption=False fully work on
Gemini Live — additional investigation is pending. This is foundational
work to be built on.
Renames _ASYNC_TOOL_PLACEHOLDER_RESULT to _ASYNC_TOOL_STARTED_RESULT to
match the kind names from async_tool_messages, and lifts the inline
"[Async tool result for tool_call_id=...] {result}" into a sibling
_ASYNC_TOOL_FINAL_RESULT_TEMPLATE constant for the same reason.
Replaces the prior "log a warning and skip" approach with actual handling
of async-tool messages on Ultravox.
The catch with Ultravox is that its API freezes the conversation between
client_tool_invocation and the matching client_tool_result — there's no
"keep talking while the tool runs" channel like NON_BLOCKING on Gemini
or function_call_output-without-blocking on OpenAI Realtime. So:
- When the model invokes an async-registered function (cancel_on_inter
ruption=False), the service immediately ships a placeholder
client_tool_result that tells the model "the actual result isn't
ready yet; a follow-up will arrive shortly; keep the conversation
going". This unfreezes the conversation. The placeholder is sent
from _handle_tool_invocation, since the started async-tool message
doesn't reach the context-frame path until later.
- When the real tool finishes, the final async-tool message lands in
the context. _handle_context now forward-iterates and routes
async-tool messages: started is a no-op (placeholder already sent),
intermediate is logged-as-error and dropped (matching the other
realtime services), and final is injected as user-side text via
user_text_message with bracketed framing — the only mechanism
Ultravox offers for adding non-tool input mid-conversation.
Hoists the registry-lookup helper to LLMService as
_function_is_async(name) so future services can use the same pattern
without re-implementing it.
Adds an async-tool example file for Ultravox modeled on the existing
ones for the other realtime services.
Mirrors the same change applied to AWSNovaSonicLLMService and
OpenAIRealtimeLLMService in #4441: replaces the implicit "final happens
last" pattern in _process_completed_function_calls with an explicit
`if async_payload.kind == "final":` block, plus a trailing defensive
`continue` so async-tool messages with an unrecognized kind don't fall
through to the regular tool-result handling block.
Applies the same async-tool message routing introduced for AWSNovaSonicLLMService
and OpenAIRealtimeLLMService to additional realtime LLM services where the
flag's intent ("keep talking while the tool runs") is achievable:
- GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok
alias since it re-exports the xAI module)
- AzureRealtimeLLMService picks up the fix transitively by inheriting from
OpenAIRealtimeLLMService — no code change needed.
GrokRealtimeLLMService's _process_completed_function_calls now matches
the canonical pattern: skip LLMSpecificMessage, detect async-tool messages
via parse_message and route them — started skipped silently, intermediate
logged as an error and surfaced via push_error, final delivered through
the same channel as a synchronous result.
UltravoxRealtimeLLMService instead gets a one-time warning when async-tool
messages appear in the context. The Ultravox API freezes the conversation
during tool execution
(https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the
flag's "keep talking while the tool runs" intent isn't achievable there —
applying the same code pattern would mislead users into expecting a UX
Ultravox can't deliver. Surfacing a clear warning is the right behavior
until Ultravox grows true async tool support.
Adds async-tool example files for Grok and Azure modeled on the existing
Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather
tool registered with cancel_on_interruption=False).
Two services remain excluded:
- GeminiLiveLLMService — the async-tool path needs deeper investigation.
- InworldRealtimeLLMService — appears to have a pre-existing problem
with even simple synchronous tool calling on its Realtime API (the
request reaches the server fine, but response generation fails with a
generic server_error).
Replaces the implicit "final happens last" pattern in
_process_completed_function_calls with an explicit
`if async_payload.kind == "final":` block in both AWSNovaSonicLLMService
and OpenAIRealtimeLLMService. Adds a trailing defensive `continue` so
async-tool messages with an unrecognized kind don't fall through to the
regular tool-result handling block — clearer at the call site, and safer
against future additions to AsyncToolMessageKind.
Before the new async-tool mechanism landed, AWSNovaSonicLLMService and
OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply
not cancelling in-flight function calls on interruption — the eventual
result then flowed through the same channel as any synchronous tool
result. The new mechanism (which appends started/intermediate/final
messages to the LLM context as the underlying task progresses) broke
that path: the realtime services didn't know how to interpret those
messages, and the eventual result was never delivered to the provider.
Restore the flag's behavior by teaching both services to detect
async-tool messages in the context and route them appropriately:
- started → skipped silently. The provider already issued the tool call
and natively awaits a result; nothing to send for the started marker.
- final → delivered via the formal tool-result channel. Same path as a
synchronous tool result, just delayed.
Streamed intermediate results (FunctionCallResultProperties(is_final=
False)) are not supported on these realtime services. An intermediate
result is logged as an error and surfaced via push_error, then dropped.
Use a non-realtime LLM service if a tool needs to stream intermediate
results. (Docstrings on register_function, register_direct_function, and
FunctionCallResultProperties.is_final updated to call this out.)
A new shared module pipecat.processors.aggregators.async_tool_messages
is the single source of truth for the on-the-wire payload shape: the
aggregator uses its build_*_message functions when injecting messages,
and the realtime services use parse_message when scanning the context.
Adds two example files exercising a network-delayed weather tool with
each service. The plain realtime-aws-nova-sonic.py example is also
reverted to a synchronous tool call now that the async variant lives in
its own file.
Similar fixes for other realtime services are forthcoming.
Aligns deprecation docstrings on LLMUserAggregatorParams and
LLMAssistantAggregatorParams with CONTRIBUTING.md conventions:
present-tense parameter descriptions plus a `.. deprecated:: 1.2.0`
directive noting replacement and 2.0.0 removal. Also adds a runtime
DeprecationWarning for `user_turn_completion_config`, which previously
had no warning despite being deprecated.
The old name overlapped semantically with `UserStoppedSpeakingFrame`:
both could be read as "the user's turn is done." They're at different
layers — `UserStoppedSpeakingFrame` is the acoustic stop signal,
while this frame is the post-judgment "inference about the turn is
now complete (turn is semantically final)" signal emitted by the LLM
mixin (on ✓), an end-of-turn classifier, or a custom producer.
The new name pairs naturally with the existing
`on_user_turn_inference_triggered` event vocabulary and removes the
ambiguity with `UserStoppedSpeakingFrame`.
Wrap the detector chain with `deferred(...)` and append the LLM
completion gate via a `UserTurnStrategies` specialization rather than
a free-standing helper, mirroring the existing
`ExternalUserTurnStrategies` pattern. The class lives next to other
strategy containers in `pipecat.turns.user_turn_strategies`, so users
discover it where they're already configuring `user_turn_strategies`.
The deprecated `filter_incomplete_user_turns` flag now rewires
through `FilterIncompleteUserTurnStrategies` under the hood, keeping
the migration path identical to before. `deferred(...)` stays public
as the explicit escape hatch for non-default compositions.
When a stop-strategy chain splits inference-triggered from
finalization (e.g. `LLMTurnCompletionUserTurnStopStrategy` gating a
deferred detector), more than one inference can fire inside a single
user turn — each adds the new transcription segment to the context.
Previously each inference overwrote `_pending_user_turn_aggregation`,
so the eventual `on_user_turn_stopped` event surfaced only the
segment from the last inference, dropping anything the user said
before it.
Concatenate each segment into `_full_user_turn_aggregation` instead
of overwriting, and combine that running buffer with any post-final-
inference segment when emitting the public event.
Add an `LLMMarkerFrame(DataFrame)` for sideband LLM markers that need
to be persisted to context but should not flow through the standard
text path (TTS, transcript). The frame carries an
`append_to_context_immediately` flag so the assistant aggregator can
either commit the marker as a stand-alone message (○ / ◐) or merge it
with the upcoming aggregation as a prefix on the response (✓).
`UserTurnCompletionLLMServiceMixin` now emits `LLMMarkerFrame` instead
of pushing the marker as `LLMTextFrame(skip_tts=True)`, which fixes
the case where an incomplete-turn marker (○ / ◐) was aggregated by
the assistant aggregator but never committed to the context because
the assistant turn lifecycle didn't run to completion (no spoken
response, no `LLMFullResponseEndFrame`-driven `push_aggregation`).
The frame is intentionally generic so other components — STT services
with built-in turn signals, end-of-turn classifiers, custom
annotations — can use the same mechanism to inject sideband signals
into the assistant context.
`LLMTurnCompletionUserTurnStopStrategy` previously bundled two
concerns: pushing `LLMUpdateSettingsFrame` on `StartFrame`, and
finalizing the turn on `UserTurnCompletedFrame`. The latter is
producer-agnostic — any component that emits `UserTurnCompletedFrame`
(STT with built-in turn detection, dedicated end-of-turn classifiers,
custom code) can drive finalization the same way.
Move the frame-handling half into a new
`ExternalUserTurnCompletionStopStrategy`. The LLM-specific subclass
now only adds the settings-frame push and inherits finalization. Mirrors
the existing `ExternalUserTurnStopStrategy` naming pattern.
Fixes a real bug: with `filter_incomplete_user_turns` enabled, the
smart-turn detector's tentative stop was firing `on_user_turn_stopped`
before the LLM had a chance to veto it. Observers, transcript
appenders and UI indicators received an early — and sometimes
duplicated — signal.
Decomposes the single stop concern into two events:
- `on_user_turn_inference_triggered` fires when a stop strategy has
enough signal to start LLM inference. The aggregator pushes the
context here, kicking off the LLM call.
- `on_user_turn_stopped` fires only when the user turn is semantically
final. Built-in strategies fire both events at the same call site,
preserving today's behavior for the common case.
Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates
finalization on a `UserTurnCompletedFrame` (a fieldless system frame
emitted by any component judging turn completeness — currently the
`UserTurnCompletionLLMServiceMixin` on `✓`).
Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin
wrapper that forwards an inner strategy's events except
`on_user_turn_stopped`. Use this to install a stop strategy as an
inference trigger only, leaving finalization to a peer (e.g. the LLM
completion strategy).
Adds `llm_completion_user_turn_stop_strategies()` for the common
case:
UserTurnStrategies(
stop=llm_completion_user_turn_stop_strategies(),
)
Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`.
The aggregator emits a `DeprecationWarning`, wraps existing stop
strategies with `deferred(...)`, and appends
`LLMTurnCompletionUserTurnStopStrategy` automatically.
Adds an explicit Code Style bullet for the `.. deprecated::` Sphinx
directive (forbidding inline `[DEPRECATED]` tags) and extends the
Docstring Example with a Pydantic params class showing the directive
inside a `Parameters:` block — the context CONTRIBUTING.md's existing
example didn't cover.
Replaces the inline `[DEPRECATED]` tag with a `.. deprecated:: 1.1.0`
directive per CONTRIBUTING.md docstring conventions, so the deprecation
shows up properly in the rendered docs.
When a non-uninterruptible frame was being processed slowly and an
uninterruptible frame was waiting in the queue, _start_interruption
skipped task cancellation. This caused interruptions to stall until
the slow frame finished, even though it had no reason to block them.
The fix: only skip cancellation when the *current* frame is
uninterruptible. Uninterruptible frames already in the queue are
preserved regardless, because __create_process_task calls
__reset_process_queue internally, which always retains them.
Fixes: https://github.com/pipecat-ai/pipecat/issues/4412
grok-3 is being retired from the xAI API on May 15, 2026. Switch the
default to grok-4.20-non-reasoning, which xAI recommends for non-reasoning
workloads and is appropriate for real-time voice AI.
PR #4344 unconditionally switched to normalizedAlignment to fix garbled
words with pronunciation dictionaries (#4316). But normalizedAlignment
returns the post-normalized form of what was spoken - including
romanization of non-Latin scripts (Chinese rendered as pinyin), which
ends up in the LLM context and degrades subsequent turns.
Gate the switch on pronunciation_dictionary_locators being configured.
Adds a _select_alignment helper with preferred-with-fallback (both
fields are nullable per the API schema), used by both the WebSocket
and HTTP services. Tests cover dictionary mode, default mode, fallback
when preferred is missing or null, and HTTP field-name variants.
``examples/function-calling/function-calling-missing-handler.py``
demonstrates the missing-handler path by deliberately advertising a
tool to the LLM without registering its handler — what happens when a
developer forgets to call ``register_function``. Exercises the new
``logger.error`` severity end-to-end without needing to coax the LLM
into hallucinating.
When tools change mid-conversation, LLMs can produce a few different
flavors of tool-call-related hallucination: calling tools that have
been removed, avoiding tools that have been re-added, or hallucinating
output (made-up answers or tool-call-shaped non-tool-calls) when tools
are unavailable.
This change introduces an opt-in ``add_tool_change_messages`` flag on
the LLM aggregators (preferred entry point: ``LLMContextAggregatorPair(
..., add_tool_change_messages=True)``) that appends a developer-role
message to the context whenever ``LLMSetToolsFrame`` changes the set
of advertised standard tools. Helps the LLM stay coherent across tool
changes by spelling out exactly what just became available or
unavailable. Both aggregators participate; whichever handles the
frame first wins, and the other (if any) sees an empty diff against
the shared context and stays silent — order-independent regardless of
whether the frame flows downstream or upstream.
Also tightens the existing missing-handler path (introduced in #4301):
- Reworded the terminal tool result to a neutral "The function
``X`` is not currently available." (overridable via
``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``). Previously
read "Error: function 'X' is not registered."
- Logs at the call site now distinguish developer error (tool
advertised but no handler registered → ``logger.error``) from
hallucination (tool not advertised → ``logger.warning``).
Includes a manual validation harness
(``examples/features/features-add-tool-change-messages.py``) that
exercises the new ``add_tool_change_messages`` mitigation by flipping
tool availability on a turn counter so its effect can be observed
end-to-end with the flag on vs. off.
Flip the default Inworld TTS model from inworld-tts-1.5-max to
inworld-tts-2 across:
- InworldHttpTTSService (HTTP)
- InworldTTSService (WebSocket)
- InworldRealtimeLLMService (cascade Realtime)
inworld-tts-1.5-max and inworld-tts-1.5-mini remain valid options;
existing users can pin the prior model explicitly via the model
setting. Docstring examples updated to reference the new default.
Polly TTS, Bedrock LLM, and AgentCore previously did
`arg or os.getenv("AWS_...")` and handed the result straight to
aioboto3. When only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
was set, aioboto3 received a half-populated kwarg and errored instead of
falling through to the boto3 credential provider chain (instance
profiles, IRSA, ECS task roles, SSO, etc.).
Route credential resolution through the shared `resolve_credentials()`
helper introduced for AWS Transcribe so all four services follow the
same `explicit → env → boto3 chain` fallback. Add an
`AWSCredentials.to_boto_kwargs()` method to bridge the dataclass field
names (`access_key`, `secret_key`) to the aioboto3 kwargs
(`aws_access_key_id`, `aws_secret_access_key`).
No public API changes. Behaviour is identical for fully-explicit and
fully-env-var configurations; partial env vars now correctly trigger
the chain instead of erroring.
Resolve and contain the user-supplied filename before serving it from
the runner's /files endpoint. Also raise a 404 (instead of returning
None) when the downloads folder is unset, and use the resolved
basename for Content-Disposition.
AWS Transcribe STT previously only supported credentials via explicit
parameters or environment variables. Services running with IAM roles
(EKS pod roles, IRSA, ECS task roles, EC2 instance profiles) or SSO
couldn't use Transcribe without exporting static credentials.
Changes:
- Add resolve_credentials() to utils.py providing a standard fallback
chain: explicit params → environment variables → boto3 credential
provider chain (instance profiles, IRSA, pod roles, SSO, etc.)
- Add AWSCredentials dataclass for type-safe credential passing
- Update AWSTranscribeSTTService to use resolve_credentials() instead
of manual os.getenv() calls
- The boto3 fallback is only attempted when both access key and secret
key are unresolved, avoiding replacement of explicitly provided creds
- boto3 is imported lazily inside the function to avoid hard dependency
for services that don't need the fallback chain
- Add 7 unit tests covering the credential resolution chain
The Bedrock LLM and Polly TTS services already support the full
credential chain via aioboto3.Session() and are not modified.
Related to #4197
Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to
silently lose their trailing words and never fire on_assistant_turn_stopped:
- LLMAssistantPushAggregationFrame was emitted without a PTS, so the
transport routed it through the audio (sync) queue while word-level
TTSTextFrames travel through the clock queue. The aggregation could reach
the assistant aggregator before the final words, leaving them orphaned
in the buffer. Stamp the frame with `_word_last_pts + 1` when there are
word timestamps so it can't overtake them.
- The aggregator's LLMAssistantPushAggregationFrame handler called
push_aggregation() directly, bypassing _trigger_assistant_turn_stopped.
For TTS-only flows there is no LLMFullResponseStartFrame, so the turn
start timestamp was never set and on_assistant_turn_stopped never fired.
Open a turn (if needed) and trigger stopped from the handler.
Fixes#4264.
The UI Agent Protocol lets server-side AI agents observe and drive
a GUI app on the client side through structured RTVI messages.
Five new top-level RTVI types in kebab-case, in line with the rest
of the protocol:
ui-event client → server (named event with payload)
ui-command server → client (named command with payload)
ui-snapshot client → server (accessibility tree of the page)
ui-cancel-task client → server (cancel an in-flight task group)
ui-task server → client (task lifecycle envelope)
Each ships paired ``*Data`` / ``*Message`` pydantic models in
``rtvi.models``, following the existing RTVI envelope convention
(``BotReady`` / ``BotReadyData``, ``Error`` / ``ErrorData``, etc.).
Built-in command payload models (``Toast``, ``Navigate``,
``ScrollTo``, ``Highlight``, ``Focus``, ``Click``, ``SetInputValue``,
``SelectText``) ship alongside; matching default React handlers
live in ``@pipecat-ai/client-react``.
Bumps the RTVI ``PROTOCOL_VERSION`` from ``1.2.0`` to ``1.3.0``.
Purely additive: only new top-level message types are introduced;
no existing wire shapes are changed. The major-version
compatibility check on ``client-ready`` still passes for older
1.x clients, so old clients continue to connect without warning;
they simply will not exercise the new types.
The ``RTVIProcessor`` registers a new ``on_ui_message`` event
handler that fires for inbound ``ui-event`` / ``ui-snapshot`` /
``ui-cancel-task`` with the parsed Message envelope, mirroring how
``on_client_message`` works for ``client-message``.
Five new pipeline frames let pipeline observers and processors see
UI traffic the same way they see other RTVI messages, mirroring
the frame-and-event pattern used by ``client-message``:
RTVIUICommandFrame(command_name, payload)
Pushed by downstream code (e.g. ``pipecat-ai-subagents``'s
bridge) to send a UI command to the client. Wrapped by the
observer into a ``UICommandMessage`` envelope.
RTVIUITaskFrame(data: UITaskData)
Same shape but for ``ui-task``; wrapped into ``UITaskMessage``.
``UITaskData`` is a discriminated union of the four lifecycle
kinds (group_started / task_update / task_completed /
group_completed).
RTVIUIEventFrame(msg_id, event_name, payload)
RTVIUISnapshotFrame(msg_id, tree)
RTVIUICancelTaskFrame(msg_id, task_id, reason)
Pushed by ``RTVIProcessor._handle_message`` whenever the
matching inbound message arrives, alongside firing
``on_ui_message``. Pipeline observers and processors can match
on the frame; subscribers like the subagents bridge keep using
the event handler.
The data layer is the canonical authority for the wire format:
higher-level frameworks like ``pipecat-ai-subagents`` build the
agent abstractions on top, and single-LLM Pipecat apps can target
the same wire format directly via custom tools that emit these
typed messages.
The pyright job in `format.yaml` previously installed only `--extra
daily --extra tracing`. That was sufficient when most optional-dep-
using files were in the pyright ignore list, but as this PR has
cleared dozens of files, those files now reference symbols from
optional-dep modules (`aiortc.RTCIceServer` via `IceServer`,
`google.genai.types.HttpOptions`, etc.). `reportMissingImports: false`
tolerates the failed imports themselves, but the imported names
become `Unknown` and using them as type expressions trips
`reportInvalidTypeForm` / `reportAttributeAccessIssue` — errors
that aren't gated by that flag.
Switch to `--all-extras --no-extra gstreamer --no-extra local`
(matching the dev setup in README.md), so pyright sees the same
dependency set the code is intended to be type-checked against and
the install-set scales naturally as more files leave the ignore list.
Also reconcile CLAUDE.md's setup command, which only excluded
`gstreamer`. README.md is canonical and additionally excludes
`local` (pyaudio requires `portaudio` native libs that aren't
installed by default on a clean Ubuntu CI runner).
A fourth pass over low-error-count files. Drops 8 files (57 → 49) and
full-pyright errors from 525 → 496. Default pyright stays clean.
Optional access on transport/client receivers (4 files). Same fix
shape as #4359 — a receiver typed `X | None` accessed without a
guard. For "should never happen" cases (caller's lifecycle ensures
the field is non-None when the method runs), used `assert` rather
than silent early-return so an invariant violation surfaces loudly:
- `transports/whatsapp/client.py` (5 errors): `_validate_whatsapp_webhook_request`
was typed `bytes` / `str` but called with `bytes | None` / `str | None`.
Widened the helper signature and pushed the explicit None-check
inside (matching its existing empty-string check). Also handled
`pipecat_connection.get_answer()` returning `None` — would have
crashed at `.get("sdp")` before.
- `transports/websocket/client.py` (5 errors): four are the deprecated
`websockets.WebSocketClientProtocol` alias (same `# pyright: ignore[reportAttributeAccessIssue]`
as the `services/websocket_service.py` fix from earlier in this PR).
The fifth was `async for message in self._websocket` — traced the
call chain and confirmed `_client_task` is created only after
`self._websocket` is assigned and cancelled before it's cleared, so
the field is never None when `_client_task_handler` runs. Used `assert`.
- `services/openai/stt.py` (4 errors): same pattern. `_receive_messages`
is started by `_connect()` only when `self._websocket` is set, and
the reconnect loop in `WebsocketService._receive_task_handler`
re-establishes it before each retry. `assert` at entry. Plus L478/L483:
the `try`/`except ModuleNotFoundError` import-guard makes
`websocket_connect` and `State` `<type> | None`; `__init__` already
raises `ImportError` if either is None, so an `assert` at the
`_connect_websocket` use site is honest. Plus an L538 `Language | str`
cast (same shape as last batch).
- `services/deepgram/flux/base.py` (2 errors): `event = data.get("event")`
flowed into `_handle_turn_resumed(event: str)` as `Any | None`.
Tightened with an `isinstance(event, str)` guard before the
`FluxEventType(event)` lookup. The other error (`average_confidence > min_confidence`
where `min_confidence: float | None`) was a latent crash on missing
confidence data — restored the original `not min_confidence` (which
treats both `None` and `0.0` as "no filter") and added an explicit
drop-on-missing-confidence-data branch.
`gemini_live` Settings/InputParams (vertex). The deprecated `InputParams`
declares `modalities: GeminiModalities | None` and `media_resolution: GeminiMediaResolution | None`,
but their downstream usage at `services/google/gemini_live/llm.py:952,959`
calls `.value` on each — `None` would crash. Rather than touching the
deprecated input model, translate `None` to the canonical defaults
(`GeminiModalities.AUDIO`, `GeminiMediaResolution.UNSPECIFIED`) at the
assignment site in `vertex/llm.py`. Also fixed an unrelated annotation
bug: `_get_credentials` was annotated `-> str` but actually returns
`service_account.Credentials` (used correctly by the caller — only
the annotation was wrong).
`moondream/vision.py` (3 errors). `frame.format` is `str | None` but
`Image.frombytes(mode, ...)` requires `str`; raise instead of crashing
on missing format. The other two errors are pyright thinking the
moondream2-custom `encode_image` and `query` methods are `Tensor`
(rather than callables) — those are provided by the model code via
`trust_remote_code=True` and aren't visible to pyright on the base
`AutoModelForCausalLM` type. Scoped `# pyright: ignore[reportCallIssue]`
on the two call sites.
`transports/base_output.py` (3 errors). Two are `self._mixer.mix(...)`
calls in `with_mixer`, a closure invoked only when `self._mixer` is
truthy at the call site — captured the mixer to a local variable
inside the closure with an `assert`, then used that. Third is the
PIL `frombytes(mode, ...)` shape — `frame.format is None` early-
return guard at the top of `resize_frame` so the main resize logic
reads cleanly.
`elevenlabs/tts.py` (4 errors). The payload-building dict at L1271
was typed `dict[str, str | dict[str, float | bool]]` — an aspirational
shape that matched only the first two assignments. Subsequent code
assigned `list[dict[...]]` (pronunciation locators) and bools, all
violating the annotation. Same pattern at L926 (the WebSocket-init
`msg`). Both widened to `dict[str, Any]`, which is the honest shape
for a JSON request payload and what similar code uses elsewhere.
Files dropped from the ignore list (57 → 49):
services/deepgram/flux/base.py, services/elevenlabs/tts.py,
services/google/gemini_live/vertex/llm.py,
services/moondream/vision.py, services/openai/stt.py,
transports/base_output.py, transports/websocket/client.py,
transports/whatsapp/client.py.
A third pass over low-error-count files in the ignore list. Drops 10
files (67 → 57) and full-pyright errors from 555 → 525. Default
pyright stays clean.
Optional access guards (4 files). The same fix shape as 9e9b1f39e:
a receiver typed `X | None` accessed without a guard, fixed with a
local-var capture or an early return.
- `mistral/stt.py`: `_connection.send_audio` could crash if
`_connect()` swallowed an exception and left `_connection` unset;
drop the audio chunk with a warning instead. `_receive_events`
iterating `_connection.events()` got the same defensive narrowing.
- `deepgram/flux/stt.py`: `_websocket_url` is set in `_connect`
before `_connect_websocket` is called, but pyright doesn't track
that across methods — assert at the use site. `websocket.response`
is `Response | None` in the websockets stubs even though it's
always populated post-handshake; guarded with a fallback.
- `audio/filters/rnnoise_filter.py`: the module-level import sets
`RNNoise` to `None` if `pyrnnoise` isn't installed; raise
`ImportError` explicitly instead of relying on the existing try-
block to catch the `None(...)` call. Also gated `filter()` with
`or self._rnnoise is None` so pyright sees the narrowing.
- `transports/smallwebrtc/request_handler.py`: `get_answer()`
legitimately returns `None`; raise instead of crashing on three
subscript accesses.
`TTSService` `audio-context` API tightening. Mirroring the
`append_to_audio_context` fix from the previous batch:
`remove_audio_context` was typed `str` but is called with `str | None`
from `get_active_audio_context_id()` results. Widened to `str | None`
and the `None` handling lives in the function body (early debug log
+ return) — matching `append_to_audio_context`'s shape.
`audio_context_available` keeps its narrow `str` signature; asking
"is `None` available?" isn't a meaningful question (`_audio_contexts`
is `dict[str, asyncio.Queue]`). The internal call site in
`on_turn_context_completed` narrows `_turn_context_id` explicitly
before passing it. Side effect: deepgram/tts.py's L307 error clears
without local changes.
`deepgram/tts.py` (4 errors → 0): the same `push_error(ErrorFrame(...))`
latent bug we fixed in resembleai earlier in this PR — `push_error`
takes a string; there's a separate `push_error_frame` for frames.
Two sites switched. The Optional `_websocket.response` access is
guarded the same way as deepgram/flux/stt.py. The `remove_audio_context`
error was cleared by the tightening above.
`aws/utils.py` (3 errors → 0): `AWSTranscribePresignedURL` declared
`session_token: str` but the dict source is `str | None` (AWS
supports long-term IAM creds without a session token). Same for
`vocabulary_name`/`vocabulary_filter_name` on `get_request_url`,
which were typed `str = ""` even though the body uses truthy checks
to skip them. Widened to `str | None = None` — matches actual
runtime semantics.
`audio/dtmf/utils.py` (2 errors → 0): `files("...").joinpath(...)`
returns a `Traversable`, but `aiofiles.open` wants a real path. For
regular pip installs this worked in practice (Traversable was a
`Path`), but it would fail for zipped distributions (zipapp,
zipimport) where the resource isn't on disk. Wrapped in
`importlib.resources.as_file(...)` — the canonical bridge that
extracts to a temp file when the resource isn't already on the
filesystem. Validated end-to-end: regular install still reads bytes;
ad-hoc zipapp test confirmed `as_file` extracts the resource and
returns a real Path.
`openai/image.py` (2 errors → 0): the `size` arg to
`images.generate` is `Literal[...] | None` in the SDK but our
settings field is `str | None`. Mirrored the `groq/tts.py`
hint-not-constraint pattern from the previous batch: defined a
module-level `OpenAIImageSize = Literal[...]` alias with a comment
attributing the upstream symbol and documenting the cast contract
(callers can pass any string; invalid values surface as an OpenAI
API error). Also guarded `image.data[0]` (response.data is
`list[Image] | None`).
`processors/frameworks/{langchain,strands_agents}.py` (4 + 4 → 0):
both processors do `messages[-1]["content"]` on a value typed
`LLMStandardMessage | LLMSpecificMessage` (the latter is a dataclass,
not a dict, so `__getitem__` errors). Historically these only
handled plain-text user messages, so the fix is two explicit guards
(skip if the last message isn't a dict; skip if `content` isn't a
string) plus a TODO noting that other shapes (multi-modal content,
provider-specific messages) aren't supported yet. langchain's
`__get_token_value` also got a small fix where `AIMessageChunk.content`
is `str | list[parts]` but the function declares `-> str`; stringify
the list case. strands_agents' surfaced two unrelated narrows: a
`graph_exit_node: str | None` arg gated by an `__init__`-time assert,
and `agent.stream_async` reached only when we're not in graph mode.
Files dropped from the ignore list (67 → 57):
audio/dtmf/utils.py, audio/filters/rnnoise_filter.py,
processors/frameworks/langchain.py,
processors/frameworks/strands_agents.py, services/aws/utils.py,
services/deepgram/flux/stt.py, services/deepgram/tts.py,
services/mistral/stt.py, services/openai/image.py,
transports/smallwebrtc/request_handler.py.
A second pass over the low-error-count files in the ignore list. Drops
10 files (77 → 67) and full-pyright errors from 580 → 555. Default
pyright stays clean.
Three coherent shapes plus a handful of one-offs:
`Language | str | None` → `Language | None` at STT frame boundaries.
`assert_given(self._settings.language)` returns `Language | str | None`
(strips `_NotGiven`, keeps the rest), but `TranscriptionFrame.language`
expects `Language | None`. In practice both `_settings.language` and
SDK-supplied codes resolve to a `Language` enum value, but technically
they could be raw strings — and `Language` is a StrEnum, so downstream
consumers (which mostly compare/serialize as strings) handle either.
Used `cast("Language | None", ...)` at each call site rather than a
runtime-validating helper, so an unrecognised code (e.g. one we
haven't added to the enum yet) still flows through unchanged. Cleared
azure/stt.py, aws/stt.py, gradium/stt.py; mistral/stt.py keeps the
cast at the SDK boundary (storing under `_detected_language: Language
| None`) but stays in the ignore list because of two unrelated
Optional-access errors.
aiobotocore `async with` stub gap. `aioboto3.Session().client(...)`
is an async context manager at runtime but its stubs don't advertise
`__aenter__`/`__aexit__` to pyright. Scoped
`# pyright: ignore[reportGeneralTypeIssues]` on the two affected
sites: aws/agent_core.py and aws/tts.py. aws/tts.py also had a latent
bug on the no-`AudioStream` path: the original code set
`audio_data = None` and then crashed in `resample(...)` and
`len(audio_data)` below; replaced with an early `return` after
logging — matches the convention elsewhere (OpenAI TTS, etc.) of not
recording usage metrics on the error path.
heygen `event_id: str | None` → `str` at transport→client boundary.
Three call sites in transports/heygen/transport.py passed `self._event_id`
(`str | None`) into client methods that take `str`. Added a guard at
each: `agent_speak_end` and `interrupt` only fire when `_event_id` is
set; `write_audio_frame` warn-and-drops when there's no active bot
event rather than sending a malformed message.
`OpenAIResponsesLLMInvocationParams` TypedDict.
`get_llm_invocation_params` always sets both `input` and `tools` in
the same dict literal, but the TypedDict was `total=False` so direct
subscript access (`invocation_params["input"]`) tripped
`reportTypedDictNotRequiredAccess` in services/openai/responses/llm.py.
Marked both keys `Required[...]`; `instructions` stays non-required
since it's only added when a system instruction is present.
Latent bug in heygen/api_interactive_avatar.py: the code accessed
`request_data.voice.voiceId` and `request_data.voice.elevenlabsSettings`,
but those names are Pydantic *aliases*; the actual attribute names
(used for attribute access) are `voice_id` and `elevenlabs_settings`.
Switched to the field names — those camelCase accesses would have
raised AttributeError at runtime if `voice` was set.
Other small fixes:
- assemblyai/stt.py: the deprecated `connection_params=` init path
was reading `formatted_finals` and `word_finalization_max_wait_time`
off `AssemblyAIConnectionParams`, but those fields were never on
the deprecated input model — they were added to Settings later.
Removed the reads (with a comment noting they're only available
via the canonical `settings=...` API); the deprecated input model
is unchanged.
- rtvi/processor.py: two `about: Mapping[str, Any] = None` parameter
signatures — declared `Mapping`, defaulted to `None`, and both
function bodies already handled the None case. Widened to
`Mapping[str, Any] | None = None`.
- aws/stt.py: `subprotocols=["mqtt"]` failed against websockets'
`Sequence[Subprotocol] | None` (Subprotocol is a NewType wrapper).
Wrapped: `subprotocols=[Subprotocol("mqtt")]`.
Files dropped from the ignore list (77 → 67):
processors/frameworks/rtvi/processor.py, services/assemblyai/stt.py,
services/aws/agent_core.py, services/aws/stt.py, services/aws/tts.py,
services/azure/stt.py, services/gradium/stt.py,
services/heygen/api_interactive_avatar.py,
services/openai/responses/llm.py, transports/heygen/transport.py.
Several adjacent fix shapes that together drop 19 files from the
pyrightconfig.json ignore list (96 → 77) and full-pyright errors from
605 → 580. Default pyright stays clean.
TTS voice/context_id None handling — most files in this batch had a
single error of the shape "value typed `T | None` passed where `T` is
required" coming out of `assert_given(self._settings.voice)` (which
strips `_NotGiven` but not `None`) or `get_active_audio_context_id()`.
Two patterns:
- For services where a missing voice means the request can't proceed
(hume, openai, xtts, groq, kokoro, piper), added an explicit None
check. Inside `run_tts` we yield an `ErrorFrame` and return — matching
each service's existing error-emission style (a few wrap `Exception`
broadly and were fine; openai/hume/xtts had narrower or no try blocks
so a bare `raise ValueError` would have escaped uncaught). Piper
validates in `__init__`, where failing fast at construction is the
right shape. OpenAI also gained a `voice not in VALID_VOICES` guard
with a clear message listing supported voices.
- For services where a missing audio context just means "skip this
message" (fish, lmnt, smallest, sarvam, neuphonic), widened
`TTSService.append_to_audio_context`'s `context_id` signature to
`str | None`. The function body already explicitly handled the None
case with a debug log + early return, so the prior `str` annotation
was a lie; making it honest cleared call sites without local guards.
inworld's `_close_context` got the same treatment.
google.genai imports — switched `from google import genai` to
`import google.genai as genai` in google/image.py and google/llm.py.
The dotted form sidesteps a PEP 420 namespace-package stub gap (the
`google` namespace stubs come from a different distribution and don't
declare `genai`), which means pyright now resolves `genai` to the
real module rather than `Unknown`. IDE autocomplete on `genai.<x>`
works for the first time. In image.py this surfaced three latent
bugs that the `Unknown` resolution had been hiding (model was
`str | _NotGiven | None` not narrowed before passing to the SDK; two
spots accessed `.image_bytes` on an `Image | None` without a guard) —
all fixed. llm.py's dotted import surfaced 8 errors (Content-list
typing nuances, internal `_api_client` access, a few small Optionals);
deferred to a future pass since they're outside this commit's scope,
so the file stays in the ignore list with the dotted import.
Latent bug fixes spotted along the way:
- resembleai/tts.py was calling `push_error(ErrorFrame(...))`, but
`push_error` takes a string — there's a separate `push_error_frame`
for the frame case. Switched to the right method.
- openai/base_llm.py: `max_completion_tokens` was the only sibling
field on `OpenAILLMSettings` missing `| None` in its type, which
caused the assignment in openai/llm.py from `params.max_completion_tokens`
(`int | None`) to fail. Added `| None` for consistency with
`max_tokens` etc.
- heygen/base_api.py: `livekit_url: str = None` and `ws_url: str = None`
declared `str` while defaulting to `None`. Removed the bogus
defaults — both fields are required at construction in every
in-tree call site, and the previous `str = None` was a Pydantic
footgun.
Other small ones: gladia/stt.py needed a None guard on `_session_url`
before `websocket_connect`; openrouter/llm.py's
`build_chat_completion_params` override widened to `dict[str, Any]`
diverging from the parent's `OpenAILLMInvocationParams` — restored
the parent's type; neuphonic/tts.py guarded the receive loop's
`async for message in self._websocket` with a local-variable narrowing
matching the pattern from 9e9b1f39e.
groq/tts.py: tightened `output_format`'s typing to
`Literal["flac","mp3","mulaw","ogg","wav"] | str = "wav"`. The literal
side gives IDE autocomplete hints for the currently-supported set;
the `| str` side keeps callers unblocked if groq adds a new format
before this list is updated. A `cast` at the API boundary satisfies
groq's stricter `Literal` parameter type. The literal alias mirrors
the inlined Literal on `groq.resources.audio.speech.AsyncSpeech.create`'s
`response_format` (the SDK doesn't export it as a named symbol).
websocket_service.py: scoped `# pyright: ignore[reportAttributeAccessIssue]`
on `websockets.WebSocketClientProtocol`. That alias is now a deprecated
re-export from the legacy submodule and pyright doesn't surface it
on the top-level `websockets` namespace; runtime is fine. Migrating
to `websockets.ClientConnection` is a separate piece of work
(transports/websocket/client.py uses the same alias four times) and
left for a future commit.
Files dropped from the ignore list: fish/tts.py, gladia/stt.py,
google/image.py, groq/tts.py, heygen/base_api.py, hume/tts.py,
inworld/tts.py, kokoro/tts.py, lmnt/tts.py, neuphonic/tts.py,
openai/llm.py, openai/tts.py, openrouter/llm.py, piper/tts.py,
resembleai/tts.py, sarvam/tts.py, smallest/tts.py,
websocket_service.py, xtts/tts.py.
Same approach as the previous round — apply boundary casts where the
code does dict-style mutation on TypedDict-typed values, narrow at
return sites, and document the LLMSpecificMessage limitation in
realtime adapters that pack history into a single text message.
aws_nova_sonic_adapter.py — pure typing + small narrowing fixes:
- Filter LLMSpecific items in `_from_universal_context_messages`
(documented).
- `_from_universal_context_message` now declared
`-> AWSNovaSonicConversationHistoryMessage | None` (it already had
paths returning None implicitly).
- `get_messages_for_logging` returns `dict[str, Any]` per element
via `dataclasses.asdict`, matching the declared return type.
- Use a local `role` variable so pyright keeps the narrowing across
the truthy-content guard.
grok_realtime_adapter.py / inworld_realtime_adapter.py — same shape
of fix as `open_ai_realtime_adapter.py` from the previous batch.
The two files are essentially copies of the OpenAI Realtime adapter,
so the same template applies: cast at the boundary, filter
LLMSpecificMessage with a documented note, replace the implicit-None
fallthrough with `raise ValueError`, and switch the `text_content +=`
pattern (which fails when one of the parts is None) to a
`text_parts.append(...)` + `" ".join(...)` pattern.
open_ai_adapter.py — pure typing. Cast at the
`OpenAILLMInvocationParams` return, narrow the system-instruction
warning's `initial_content` to `str | None`, and cast the custom-tools
list to `list[ChatCompletionToolParam]`.
open_ai_responses_adapter.py — pure typing. Same shape: narrow
`first_content` to `str | None` for the warning resolver, cast the
constructed dict literals at append sites where the target is
`ResponseInputItemParam`, and cast `get_messages_for_logging`'s
return to the declared `list[dict[str, Any]]`.
processors/aggregators/llm_context.py — pure typing. Cast the
deepcopied message in the redaction loop in `get_messages` to
`dict[str, Any]` and the create_image/audio_message return-dict
literals to `LLMContextMessage`.
Removes 6 newly-clean files from the pyright ignore list.
Net: -77 pyright errors (full-config: 680 -> 603).
Same shape of fix we applied to anthropic_adapter.py earlier — these
adapters do dict-style mutation on values typed as
ChatCompletionMessageParam (a union of TypedDicts) or against Optional
fields. Apply boundary casts (`cast(dict[str, Any], ...)` for the
mutation block, cast back to the TypedDict at return sites). Most
changes are pure typing (rename + cast); a handful in gemini and
openai_realtime are small defensive bug fixes for code paths that
were latently broken by Optional fields slipping through:
perplexity_adapter.py — pure typing. Cast the deepcopied messages to
`list[dict[str, Any]]` for the role-merging / system-conversion /
trailing-assistant-removal transformations and cast back to
ChatCompletionMessageParam at the return.
bedrock_adapter.py — pure typing. Cast the message to
`dict[str, Any]` at the top of `_from_standard_message` for the
tool-result / tool-use / image-content transformations. Cast the
constructed dict at the return site of `get_llm_invocation_params`.
gemini_adapter.py — typing + several None guards on Content.parts and
related Optional fields. Each guard turns a latent
`TypeError`/`AttributeError` (when the type-system-allowed None
showed up at runtime) into a defensive skip — the type annotations
say these can be None and we now handle that.
open_ai_realtime_adapter.py:
- Typing: cast the deepcopied messages, cast back where needed.
- LLMSpecificMessage handling: previously the function would crash on
the first `.get()` call if any LLMSpecificMessage was in the list.
Filter them out and document the limitation — this adapter's
pack-into-single-text-message strategy doesn't compose with opaque
per-provider payloads.
- Real bug fix: `events.ConversationItem` is a Pydantic BaseModel,
not a TypedDict. The bulk-packing path was constructing a raw dict
where a ConversationItem was expected. Replaced with proper
constructor calls (matches what the single-user-message path
already does).
- Real bug fix: `_from_universal_context_message` was declared
`-> events.ConversationItem` but on the unhandled-message
fallthrough it logged and returned None implicitly. Raise
ValueError so the violation is loud, not silent.
Removes 4 newly-clean files from the pyright ignore list:
adapters/services/{perplexity,bedrock,gemini,open_ai_realtime}_adapter.py.
Net: -95 pyright errors (full-config: 775 -> 680).
Six pyright errors followed the same pattern: a value flowed out of
`self._settings.X` (typed `T | _NotGiven`) into a context that wanted
the plain `T`. Wrap each with `assert_given(...)` so the sentinel
gets stripped at the boundary:
- aws/nova_sonic/llm.py: `_settings.model` (in InvokeModel...Input)
and `_settings.system_instruction` (passed to the adapter).
- deepgram/flux/base.py: iterating `_settings.keyterm`.
- google/stt.py: iterating `_settings.languages`.
- google/tts.py: iterating `_settings.speaker_configs`.
- openai/base_llm.py: `_settings.system_instruction` passed to the
adapter.
Also takes a deeper pass at the related Google STT issue: the override
of `language_to_service_language` had been broadened to take
`Language | list[Language]` and return `str | list[str]`, a Liskov
violation against the base's `Language -> str | None` contract.
External callers always pass a single Language, and the only consumer
of the list path was Google STT's own `_get_language_codes`. Restore
the override to a single-Language signature and let
`_get_language_codes` iterate. The override is also tightened to
return `str` (narrower than the base's `str | None`, which is
LSP-compatible) since it always falls back to `"en-US"` rather than
returning None.
Net: -7 pyright errors (full-config run: 782 -> 775).
These provider-specific helpers are all thin wrappers around
`resolve_language(...)`, which itself returns `str` — never `None`.
The `str | None` annotations were misleading and were producing
spurious pyright errors at the call sites that assigned the result
into a `str` field. Update each helper's signature to `str` and
rewrite the `Returns:` docstring to describe the actual fallback
behaviour (resolve to base or full code, with a warning).
Importantly, the per-class `language_to_service_language(...)`
methods on `STTService`/`TTSService` subclasses keep `str | None` as
their return type. That signature is an extension hook for future
and/or third-party subclasses that may genuinely not be able to
produce a code for some languages, even though all in-tree first-
party services currently return a string.
Also includes one small unrelated tightening in azure/stt.py: wrap
`self._settings.language` with `assert_given(...)` so the truthy
fallback to `language_to_azure_language(Language.EN_US)` doesn't
silently swallow a NotGiven sentinel.
Net: -3 pyright errors (full-config run: 785 -> 782).
Pyright flagged 19 sites where `await self._<connection>.send/recv/...`
was called on a receiver typed `X | None`. Each kind of call site
needed a slightly different fix to be both type-safe and behaviour-
preserving:
Streaming/user-facing paths (early return + warn — drop and warn is
the right runtime fail-safe when reconnect didn't succeed):
- cartesia/stt.py (run_stt)
- soniox/stt.py (_send_keepalive)
- elevenlabs/tts.py (run_tts — yields ErrorFrame and returns)
- deepgram/sagemaker/tts.py (run_tts)
- transports/lemonslice/transport.py (send_message)
- transports/tavus/transport.py (send_message)
"Should never happen" cases (early return with comment, no warn —
caller already gated on a separate `_is_*` check, so a warn would be
noise):
- deepgram/flux/stt.py (transport methods, gated by _transport_is_active)
- deepgram/flux/sagemaker/stt.py (same)
- stt_service.py (_send_keepalive, gated by _is_keepalive_ready)
- elevenlabs/stt.py (_send_keepalive, same)
- llm_service.py (_ws_recv — raises ConnectionError to match
_ensure_connected's contract)
- heygen/client.py (receive loop, gated by self._connected)
Just-assigned-above (use a local variable so pyright keeps the
narrowing across statements):
- lmnt/tts.py
- gradium/stt.py
- fish/tts.py
Other:
- transports/websocket/server.py — used the existing local `websocket`
parameter in scope instead of `self._websocket` for the close call.
- websocket_service.py — `send_with_retry` raises ConnectionError when
`self._websocket` is None inside the existing try-block, so the
broad `except Exception` triggers reconnect just as it would on a
real send failure (preserving the prior behaviour where None
silently fell through to the AttributeError-driven reconnect path).
Drops three now-clean files from the pyright ignore list: cartesia/stt.py,
elevenlabs/stt.py, and soniox/stt.py.
After making LLMService generic, an unparameterized subclass
(`class MyService(LLMService):` with no bracket — the third-party
provider pattern) saw `get_llm_adapter()` return `Unknown` rather
than `BaseLLMAdapter` as it did before the refactor.
Add `default=BaseLLMAdapter` (PEP 696) on the TypeVar — via
`typing_extensions.TypeVar` so older Python targets keep working —
so unparameterized callers get `LLMService[BaseLLMAdapter]` and
`get_llm_adapter()` returns `BaseLLMAdapter`, matching the
pre-refactor type precision.
Two internal fallouts of having a default (where the default makes
unannotated `LLMService` resolve invariantly to
`LLMService[BaseLLMAdapter]`):
- `FunctionCallParams.llm` is now `LLMService[Any]` so concrete
parameterizations like `LLMService[OpenAILLMAdapter]` can be
passed where the field is set.
- The explicit `LLMService.__init__(self, **kwargs)` in
`WebsocketLLMService.__init__` gets a `pyright: ignore[reportArgumentType]`
comment — pyright's invariance handling can't see through the
multi-inheritance + generic + default combination, but the
runtime call is correct (generics are erased).
Two follow-ups now that LLMService is generic over its adapter:
- Add an explicit backward-compat test verifying that an LLMService
subclass with no generic parameter (the third-party-provider
pattern) instantiates and returns a usable adapter. The existing
MockLLMService (declared without brackets) already exercised this
implicitly, but it's worth a named assertion.
- Drop the now-redundant `params: SomeLLMInvocationParams = ...`
variable annotations on `adapter.get_llm_invocation_params()`
results. Since `get_llm_adapter()` now returns the precise adapter
type, and `BaseLLMAdapter` is generic in its invocation-params
type, the call already infers the right TypedDict.
Previously, `LLMService.get_llm_adapter()` returned `BaseLLMAdapter`,
which forced every caller that wanted the precise adapter type to
write `adapter: SomeAdapter = self.get_llm_adapter()` and accept
pyright's complaint that the assignment doesn't match the declared
type. That pattern existed in 17 places across the LLM services.
Make `LLMService` generic over its adapter type — `LLMService(...,
Generic[TAdapter])` with `TAdapter = TypeVar("TAdapter",
bound=BaseLLMAdapter)` — so subclasses opt in via
`LLMService[XAdapter]` and callers get the precise type back from
`get_llm_adapter()` automatically.
Backward-compatible for third-party providers: code that says
`class MyService(LLMService):` (no bracket) still type-checks, with
TAdapter resolving to BaseLLMAdapter from the bound — identical to
the pre-refactor behavior. The `adapter_class` attribute keeps its
loose `type[BaseLLMAdapter] = OpenAILLMAdapter` typing so the default
remains usable; one localized cast in `__init__` bridges the loose
class attr to the precise instance attr.
In-tree subclasses opted in:
- AnthropicLLMService -> LLMService[AnthropicLLMAdapter]
- AWSBedrockLLMService -> LLMService[AWSBedrockLLMAdapter]
- AWSNovaSonicLLMService -> LLMService[AWSNovaSonicLLMAdapter]
- BaseOpenAILLMService -> LLMService[OpenAILLMAdapter] (propagates to
~15 OpenAI-compatible providers like Cerebras, Groq, Together)
- GeminiLiveLLMService -> LLMService[GeminiLLMAdapter]
- GoogleLLMService -> LLMService[GeminiLLMAdapter]
- GrokRealtimeLLMService -> LLMService[GrokRealtimeLLMAdapter]
- InworldRealtimeLLMService -> LLMService[InworldRealtimeLLMAdapter]
- OpenAIRealtimeLLMService -> LLMService[OpenAIRealtimeLLMAdapter]
- _BaseOpenAIResponsesLLMService -> LLMService[OpenAIResponsesLLMAdapter]
- WebsocketLLMService is also generic so the multi-inheritance case
(OpenAIResponsesLLMService) can keep both bases agreeing on TAdapter.
All 17 redundant `adapter: SomeAdapter = self.get_llm_adapter()`
annotations are now plain `adapter = self.get_llm_adapter()`.
Same pattern as the earlier get_setup_params fix: when context tools
are absent, the fallback `adapter.from_standard_tools(self._tools)`
can return the NotGiven sentinel, and `_send_prompt_start_event`
expects a list. Coerce via `or []` so the NotGiven case becomes an
empty list.
Three small changes that resolve pyright errors and sharpen the logic:
- Guard `self._context` with the codebase's "should never happen"
early-return pattern, so we don't blindly call `.get_messages()` on
None.
- Skip `LLMSpecificMessage` items in the iteration. They're opaque
provider-specific payloads with no `.get()`, and the surrounding
logic only applies to standard tool-result messages.
- Match `role == "tool"` explicitly. The previous truthy-only check
was working by accident — the `tool_call_id` filter further down
was effectively narrowing to tool messages, but the intent is
clearer when stated upfront.
reset_conversation is part of the public AWSNovaSonicLLMService API and
is also called internally from the receive-task error handler.
Previously it captured `self._context` (typed `LLMContext | None`) and
unconditionally passed it to `_handle_context`, which expects a real
context — silently doing the wrong thing if no initial context had
been received yet.
Treat that as developer error: log a warning and return early. Nothing
to preserve means nothing to reset.
The service implements the NovaSonicSessionSender protocol so the
session-continuation helper can target either the current or next
session. The protocol declares
`get_setup_params(self) -> tuple[str | None, list]`, but the
implementation was unannotated and could return NotGiven in the tools
position when from_standard_tools fell through to its NotGiven
sentinel. Add the matching return annotation and coerce the NotGiven
case to an empty list.
Same MessageParam content-typing issue as the consecutive-message merge
fix: pyright doesn't carry the str-to-list narrowing forward, and
Iterable has no `[-1]` access. Cast to `list[Any]` and document the
chain of assumptions (list, non-empty, dict-typed last item) and where
each is upheld upstream.
This brings anthropic_adapter.py to 0 pyright errors (down from 115).
The function takes an OpenAI ChatCompletionMessageParam (a union of
TypedDicts) and returns an Anthropic MessageParam (a different
TypedDict). It does the conversion via dict-level mutations that don't
type-check against either side's TypedDict schema. Work with the
deepcopied message as a plain dict and cast to MessageParam at the
return sites — matching the boundary-cast convention noted in
llm_context.py.
Drops anthropic_adapter.py from 20 to 2 pyright errors.
The fallback path in `_from_universal_context_message` returns
`message.message` from an `LLMSpecificMessage`, which is typed loosely
(`Any | dict`). The surrounding comment already documents the
assumption that the message is already in Anthropic format — make that
assumption explicit to pyright with a cast.
MessageParam types content as `str | Iterable[...]`, and Iterable has
no `.extend()`. After the str-to-list conversions, pyright re-reads
the TypedDict field as the original wide type rather than carrying the
narrowing forward. Cast to `list[Any]` to express the codebase's
existing str-or-list assumption.
Drops anthropic_adapter.py from 23 to 21 pyright errors.
Content items in MessageParam have a heterogeneous union type (Pydantic
ContentBlock variants and TypedDict *BlockParam variants), neither of
which supports the dict-style access and mutation this sanitizer does.
Treat the deepcopied message as a plain dict and guard each content
item with isinstance(item, dict) — matches the runtime shape produced
by _from_standard_message and avoids crashing if a non-dict ever flows
through the LLMSpecificMessage path.
Drops anthropic_adapter.py from 115 to 23 pyright errors.
xAI's Voice Agent API selects the model via the ?model= query
parameter on the WebSocket URL; it cannot be changed later via
session.update. The Grok Realtime service was setting the model in
Settings but never including it in the connection URL, so every
session silently fell back to the deprecated default
grok-voice-fast-1.0.
Append the model from Settings to the WebSocket URL on connect, and
default to the recommended grok-voice-think-fast-1.0.
Adds a `mip_opt_out` init parameter to both `DeepgramTTSService` (WebSocket)
and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
Improvement Program. When set, the value is forwarded as a query parameter
on the request, matching the pattern used by the Deepgram STT services.
Broaden `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s.
Involves 3 changes:
- A rename: `tool_resources` -> `app_resources`
- A new property on `PipelineTask`: `app_resources`
- A new property on `FrameProcessor`: `pipeline_task`
Usage in tool handler:
async def get_weather(params: FunctionCallParams):
resources = cast(MyAppResources, params.app_resources)
...
Usage in custom `FrameProcessor`:
class MyProcessor(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if self.pipeline_task is not None:
resources = cast(MyAppResources, self.pipeline_task.app_resources)
...
The previous `tool_resources` aliases (on `PipelineTask`,
`FunctionCallParams`, and `FrameProcessorSetup`) keep working but are
deprecated as of 1.2.0 and emit `DeprecationWarning`s.
The four krisp test files installed a process-wide mock of
importlib.metadata.version with `patch(...).start()` at module level and
never called .stop(). Once any of these files was collected, the mock
leaked across the rest of the test session, returning '0.0.0-dev' for
every version check. This corrupted unrelated tests that triggered
transformers' import-time dependency check (e.g. lazy imports of
LocalSmartTurnAnalyzerV3) — transformers saw tqdm=='0.0.0-dev' and
refused to load.
Wrap the pipecat imports in `with patch(...)` so the mock is active
during import (where pipecat's krisp version check needs it) and torn
down before any tests run.
Importing pipecat.turns.user_turn_strategies pulled in
LocalSmartTurnAnalyzerV3 → transformers → onnxruntime at module load
time. Since this module is imported by llm_response_universal (and
therefore most LLM services), any LLM service import paid the cost of
loading transformers and triggered its missing-backend warning in
environments without PyTorch/TF/Flax.
Move the LocalSmartTurnAnalyzerV3 import into
default_user_turn_stop_strategies() so it only loads when the default
smart-turn strategy is actually constructed.
Fixes#4392
The non-200 branch yielded an ErrorFrame and then raised, which the outer
except caught and yielded a second, less informative "Unknown error" frame.
Return after the yield and fold the status code into the message.
Pyright flagged the .post() call on a possibly-None _session. Raise a
clear RuntimeError if start() wasn't called instead of crashing on the
attribute access.
SPELL/EMOTION_TAG/PAUSE_TAG/VOLUME_TAG/SPEED_TAG are stateless and worked
only via class-level access. Decorating them lets instance access work too
and silences the missing-self lint warning.
- Bump default cartesia_version to 2026-03-01.
- Replace deprecated use_original_timestamps with use_normalized_timestamps
so word timestamps match what was actually spoken.
- Add max_buffer_delay_ms init arg; auto-derive 0 in SENTENCE mode to avoid
the doc-warned "middle ground" of client + server buffering, leave unset
in TOKEN mode for managed buffering.
- Silently consume flush_done messages now emitted per transcript when
server-side buffering is disabled.
Adds a `session_id: str | None` field to `RunnerArguments` so bots can
log/trace a per-session identifier in local development the same way
they can in Pipecat Cloud (where it is provided via the
`x-daily-session-id` header).
The local runner now mints a UUID at every `*RunnerArguments`
construction site. For paths that already returned a `sessionId` to the
caller (Daily `/start`, dial-in webhook), a single UUID is now generated
and shared between `runner_args.session_id` and the response body
instead of being thrown away. The SmallWebRTC `/api/offer` endpoint
accepts an optional `session_id` so the `/sessions/{session_id}/...`
proxy can thread it through.
This is the prerequisite step for collapsing pipecat-cloud's
`SessionArguments` / `*SessionArguments` hierarchy onto the upstream
runner types.
So the rendered changelog has the (PR [...]) line aligned as a list
continuation under its bullet. Verified with both short and wrapped
entries via `towncrier build --draft`.
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
Replaces the hardcoded camera publishing send settings in
DailyTransport with a new DailyParams.camera_out_send_settings dict that
applications can pass through verbatim to the Daily client. This makes
the encoding/codec/bitrate configuration user-controllable instead of
being driven solely by the generic TransportParams fields.
As a consequence, TransportParams.video_out_bitrate is deprecated for
the Daily transport (now configured via camera_out_send_settings) and
its default is changed to None.
Adds a dedicated screen video track alongside the existing camera track
so applications can publish to Daily's built-in "screenVideo" destination
via video_out_destinations. The track is created at join time and wired
into the client settings (inputs and publishing) when "screenVideo" is
configured; write_video_frame routes frames to the appropriate track
based on the frame's transport_destination.
Bound methods are created fresh on each attribute access, so
'self._missing_function_call_handler is self._missing_function_call_handler'
is always False. Using 'is' meant the placeholder branch never fired and
both warnings logged when a function was missing at queue time.
Switch to == so equality compares the underlying function and instance.
Strengthen the missing-at-queue-time test to assert the second warning
does not fire.
Address review feedback: a function may be unregistered between when
run_function_calls queues it and when _run_function_call executes it.
Restore the live lookup, falling back to the missing-function handler
when the entry is gone, so the call still terminates with a normal
tool result. Factor the missing-handler item construction into a
helper since it's now built in two places.
Runner-created Daily rooms previously had no expiration when callers
posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`).
The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's
cron never cleaned them up, so rooms accumulated indefinitely.
Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`,
inject `exp` and `eject_at_room_exp=True` into user-supplied properties via
`setdefault` (so explicit caller values still win), and pass
`room_exp_duration` to all four `configure()` call sites.
* VIVA SDK TT v3 support
* Format fix.
* Renamed the API naming, removed '3' from the name.
* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.
* TT demo tool
* Some improvements for demo scripts, audio recordin, etc.
* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.
* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.
* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.
* Add audio resampling functionality and update demo scripts for improved audio processing
- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.
* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic
- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.
* Refactor audio processing scripts for improved readability and consistency
- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.
* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity
- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.
* more format fixes.
* removed demo scripts.
* reverted wrongly removed file.
* Corrected the IP integration logic.
* style fix.
* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy
- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.
* FIxed formatting
---------
Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7`
pin, blocking users whose dependency graph already constrains protobuf to
the 5.x line. The original bump (PR #4136) was only needed because
`nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1.
Changes:
- Widen base pin to `protobuf>=5.29.6,<7`.
- Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per
Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x
and 6.x runtimes, so this single artifact serves all users.
- Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can
install `pipecat[dev,nvidia]` without resolver conflict. Comment in
`frames.proto` documents the 1.67.x requirement for regeneration.
- Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This
compensates for nvidia-riva-client's missing `protobuf` install
requirement (upstream packaging gap, see
https://github.com/nvidia-riva/python-clients/issues/172). When that
issue is resolved, the explicit protobuf entry in the `nvidia` extra
can be removed.
Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6;
`tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds
when `pipecat[nvidia]` is installed.
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.
Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s
Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
(independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
optional stream/prompt_name params, eliminating duplicate SC methods
Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.
Introduce two module-private cast helpers in open_ai_adapter.py:
- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)
Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.
No behavior change, no pyright error delta.
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.
Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.
Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):
- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.
Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.
Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).
- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.
Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.
Fields touched across 11 files:
- services/settings.py: TTSSettings.voice (base class; covers
asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
split_sentences, enable_diarization, speaker_*, max_speakers,
prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.
Clears 94 pyright errors (1035 -> 941).
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).
- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).
Two idioms used:
- Inline wrap for single uses:
func(assert_given(self._settings.enable_prompt_caching), ...)
- Extract-and-reuse when the same value is used multiple times:
thinking = assert_given(self._settings.thinking)
if thinking:
params["thinking"] = thinking.model_dump(...)
43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.
assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.
resolved_model = assert_given(self._settings.model) # str | None
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.
- adapters/services/anthropic_adapter.py: narrow converted.system for
_resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
is_given (aliased as openai_is_given alongside the existing
settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.
Each helper is co-located with the NotGiven flavor it guards:
- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
LLMContext's NotGiven. Treat LLMContext's re-exported types
(LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
LLMContext's own — independent definitions that happen to coincide
with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
NotGiven.
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.
On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).
Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
Mistral imposes three conversation-history quirks on top of the
OpenAI-compatible wire format: tool messages must be followed by an
assistant message; non-initial system messages are rejected; trailing
assistant messages require `prefix=True`. These rules were applied
inline in `MistralLLMService.build_chat_completion_params`, which is the
wrong layer — every other provider with OpenAI-compatible-but-quirky
shape (Perplexity, etc.) owns its transformations in a
`BaseLLMAdapter` subclass that runs during `get_llm_invocation_params`.
Create `MistralLLMAdapter(OpenAILLMAdapter)` on the Perplexity template
and wire it in via the existing `adapter_class` dispatch. The service
now only handles Mistral-specific request-level mapping (`random_seed`
in place of `seed`), and the message shape concerns live with other
provider format logic.
No behavior change. The transform function casts to `list[dict[str,
Any]]` internally because mutating `role` and attaching Mistral's
non-standard `prefix` field both step outside OpenAI's TypedDict
contract; the cast at the return boundary encodes that we're emitting
Mistral's extended schema, not OpenAI's.
`inspect.getdoc()` returns `str | None`, but `docstring_parser.parse()`
requires `str`. Functions without a docstring produced `None`, which
the type checker correctly flagged.
Coerce to `""` at the call site. `docstring_parser.parse("")` returns
an empty docstring whose `.description` and `.params` are already
handled by the surrounding `or ""` fallbacks, so runtime behavior is
unchanged.
`ToolsSchema.__init__` declared `standard_tools: list[FunctionSchema |
DirectFunction]`. Callers (`BaseLLMAdapter`, `MCPService`) pass in
`list[FunctionSchema]`, which is not assignable to the union list
because `list` is invariant in its element type.
Widen the parameter to `Sequence[...]` (covariant) so `list[X]` and
`list[X | Y]` both fit. A narrower `list[FunctionSchema]` is still
accepted, and nothing in this class mutates the argument — the
constructor immediately copies it via `_map_standard_tools`.
Also correct the `custom_tools` property return type to include
`None`, matching the stored `_custom_tools` field.
This single edit clears the pyright errors for three ignore-list
entries: `tools_schema.py`, `base_llm_adapter.py`, and `mcp_service.py`.
Two services were reading `_settings.model` (typed `str | _NotGiven |
None` because NOT_GIVEN is the default) and coercing it with `or ""`
or similar. `_NotGiven.__bool__` returns False, so the runtime
behavior happened to work, but the type was a lie — pyright saw
`str | _NotGiven` flowing into APIs that required `str` or `str | None`.
- `AIService._sync_model_name_to_metrics`: use `isinstance(model, str)`
narrowing with an empty-string fallback. Equivalent runtime behavior,
honest type, no truthiness dependency on a sentinel.
- `SarvamLLMService.__init__`: validate the model is a real string
before handing it to `_validate_model(str)`. A non-string model at
this point is a configuration bug; raise `ValueError` so the error
is clear and survives `python -O` (unlike an assert).
Three spots had the same shape: a field starts None, a later method
populates it, a read site later reads it. Pyright can't track the
cross-method invariant. Rather than spray assertions at the read
sites, fix each site at the structural level:
- `FastAPIWebsocketInputTransport._monitor_websocket` now takes the
session timeout as an argument. The task-creation site already
guards on truthiness, so the call can pass the non-None value
directly and the method's signature tells the truth.
- `FrameProcessorMetrics.task_manager` raises `RuntimeError` instead
of asserting. Asserts are stripped under `python -O`; a real raise
keeps the runtime safety net and still narrows the type for pyright.
- `SOXRStreamAudioResampler._maybe_initialize_sox_stream` returns the
initialized stream. Callers use the return value and never touch
the Optional `_soxr_stream` attribute, so narrowing stays inside
the init method where the invariant is established.
`ImageGenService.run_image_gen` and `VisionService.run_vision` were
declared `async def ... -> AsyncGenerator[Frame, None]` with `pass`
bodies. Without a `yield` anywhere in the body, Python treats the
function as a coroutine returning an `AsyncGenerator`, not as an async
generator itself, so callers got a coroutine where they expected an
iterator.
Add `raise NotImplementedError; yield` so the body contains a yield
(making this a real async generator) while still raising cleanly if a
subclass ever calls `super().run_*` by mistake.
Deepgram STT, Gradium TTS, Smallest STT, and xAI STT/TTS had exactly
one pyright error each, all of them the AsyncGenerator return-type
mismatch resolved in 08fe9157c. Remove them from the ignore list.
AssemblyAI, Cartesia, Gradium, and Soniox STT services sent audio over
the WebSocket without catching transient send failures, so a single
network hiccup could propagate an exception up through process_frame
and end the pipeline. Other push-based STT services (Deepgram, xAI,
Azure, Smallest, etc.) already guard their sends.
Follow the deepgram/stt.py pattern: log a warning and continue. The
existing connection-state check at the top of each call handles
recovery on the next invocation.
The push-based STT/TTS implementations send audio/text over a socket and
receive results via a separate receive task, so there is nothing to
yield inline. They yield `None` by design. The previous declaration of
`AsyncGenerator[Frame, None]` disagreed with that, while the consumer
(`AIService.process_generator`) already accepted `Frame | None`. Widen
the producer side (abstract base and every subclass) so the type honestly
describes the contract.
Pure annotation change; no runtime behavior difference.
Previously, six modules (adapters, audio, processors, serializers,
services, transports) were ignored wholesale. Many files in those
modules already pass type checking, but we had no way to protect them
from regressions or make the remaining work visible.
Switch the include list to src/pipecat so any new module is checked by
default, and replace directory-level ignores with the 140 specific
files that still fail. This puts 189 previously-untyped files under
type checking immediately and turns the remaining work into a concrete,
shrinking TODO list.
Moves src/pipecat/serializers into pyright's include list. Narrows
self._params to each subclass's InputParams in exotel, vonage, plivo,
twilio, genesys, and telnyx. In protobuf.py, renames the reassigned
frame local to avoid clobbering its Frame type and silences two dynamic
attribute accesses on the generated frames_pb2 module.
Also aligns telnyx and plivo hangup validation with twilio: if
auto_hang_up=True (the default) but required credentials are missing,
__init__ now raises ValueError instead of silently logging a warning
at call-end time. Previously a misconfigured serializer would construct
fine and fail to hang up the call later, leaving a phantom billable
session.
Collapse the separate fallback timer into the existing user_speech_timeout
timer, restarted when a transcript arrives without a VAD stop. stt_timeout
has no meaning on the fallback path, so the stt wait is marked done
immediately. This drops the _fallback_timeout_task / _fallback_expired
bookkeeping and the branched trigger condition.
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.
Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.
Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.
Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:
- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
return type is `str` rather than `str | None`, and misconfiguration
fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
before use.
If the WebSocket handshake is cancelled or fails before `keepalive_task`
is assigned (e.g. an STTUpdateSettingsFrame triggers a reconnect during
initial connect), the `finally` block tried to cancel an unbound local.
Initialize `keepalive_task = None` before the try and guard the cancel.
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.
- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
`language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
example using the new service.
SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into
max(stt_timeout, user_speech_timeout), which over-waited for finalizing
STT services and could also end the turn early in a legacy code path.
Run them as independent timers instead:
- user_speech_timeout: policy floor, always runs to completion.
- stt_timeout: latency safety net, short-circuited by a finalized
transcript since STT has signaled it has nothing more to send.
The no-VAD fallback now waits only user_speech_timeout rather than
max(stt_timeout, user_speech_timeout); stt_timeout is defined relative
to VAD stop and has no meaning when no VAD event occurred. This
shortens the fallback wait for users who set stt_timeout greater than
user_speech_timeout.
* Fix Smallest AI TTS WebSocket endpoint URL to match API documentation
Update base URL from waves-api.smallest.ai to api.smallest.ai and
fix path prefix from /api/v1/ to /waves/v1/ per the v4.0.0 docs.
* Update keepalive using silent space message instead of unsupported flush
Pylance analyzes open files even when they're outside the `include`
set, producing noise in the editor. Adding these paths to `ignore`
suppresses diagnostics without affecting import resolution.
Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and
punctuation are already embedded in the token text. When downstream consumers
join these tokens with an extra space they produce "hello , world" instead of
"hello, world".
Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to
`add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through
`_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`.
Defaults to `False` — no behaviour change for existing services.
`InworldTTSService` passes `includes_inter_frame_spaces=True` and stops
pre-processing tokens in `_calculate_word_times`, returning them verbatim.
Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket
delivery paths: verbatim text preservation, PTS ordering, text-before-audio
ordering, and the Inworld punctuation-token scenario.
Made-with: Cursor
The two logger.error lines in krisp_instance.py fired at module-load time
whenever anything transitively imported it (e.g. pipecat.turns.user_start
pulling in krisp_viva_ip_user_turn_start_strategy), producing noisy output
for users who never asked for Krisp. Drop the log calls and raise a more
informative ImportError that names the affected classes so direct
importers still get clear guidance.
- Fall back to Language.EN in _primary_detected_language when model is
flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
frames still carry Language.EN.
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
Add changelog entries for the pyright introduction and the
LiveKitRunnerArguments.token signature tightening. Restore the
indented multi-line format for the WhatsApp missing-env error,
now listing only the vars that are actually missing.
Make required parameters non-optional: LiveKitRunnerArguments.token,
_create_telephony_transport args. Use os.environ[] instead of
os.getenv() for required WhatsApp env vars. Guard spec/loader None
in module loading. Tighten sip_caller_phone guard in daily.py.
* VIVA SDK TT v3 support
* Format fix.
* Renamed the API naming, removed '3' from the name.
* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.
* Typo fix in voice-krisp-viva example to use KrispVivaFilter class
* style fix.
* test run error fixes.
* some test related changes.
* Fixed tests
* Stule fixes.
SentryMetrics.stop_ttfb_metrics and stop_processing_metrics called the
base FrameProcessorMetrics implementation but discarded its return
value (implicit `return None`). FrameProcessorMetrics.stop_ttfb_metrics
/ stop_processing_metrics build and return a MetricsFrame, which
FrameProcessor.stop_ttfb_metrics / stop_processing_metrics then pushes
downstream so observers (e.g. UserBotLatencyObserver,
MetricsLogObserver) can see TTFB / processing metrics.
Because SentryMetrics returned None, the FrameProcessor never pushed
the MetricsFrame, so any pipeline using metrics=SentryMetrics() on STT
/ LLM / TTS services silently lost all downstream TTFB and processing
MetricsFrames. The metrics were still calculated and logged
internally, and Sentry transactions still finished correctly, but
observers never saw them.
Forward the MetricsFrame returned by the base class so FrameProcessor
can push it into the pipeline.
Use Sequence[FrameProcessor] instead of list[FrameProcessor] in Pipeline,
ServiceSwitcher, and ServiceSwitcherStrategy parameters to accept subtype
lists. Add cast() in LLMSwitcher for narrowed return types. Guard against
None in task_observer._send_to_proxy and replace hasattr with truthiness
check in task._cleanup.
Widen base strategy process_frame return types to ProcessFrameResult |
None to match actual behavior (None treated as CONTINUE). Give
UserTurnCompletionLLMServiceMixin a FrameProcessor base class so pyright
can see create_task, cancel_task, process_frame, and push_frame.
Tighten LLMMessagesAppendFrame and LLMMessagesUpdateFrame message fields
from list[dict] to list[LLMContextMessage] to match actual usage. Add
type annotations on inline message lists in IVR navigator and voicemail
detector.
In token-streaming mode, _push_tts_frames previously stripped only
leading newlines and dropped any pure-whitespace frame. That silently
discarded meaningful inter-token whitespace (e.g. a standalone "\n"
token between "hello" and "world"), losing prosody cues and any
downstream sentence-boundary semantics.
Track whether a non-whitespace character has been sent in the current
context. While the flag is false, strip all leading whitespace; once
true, let whitespace tokens flow through. Reset the flag on
LLMFullResponseEndFrame/EndFrame and on interruption, and save/restore
it around TTSSpeakFrame since each utterance is its own context.
Sentence-aggregation mode preserves the existing behavior.
Group three co-assigned fields (_start_frame_id, _start_frame_arrival_ns,
_start_wall_clock) into a single _StartFrameInfo dataclass. This makes
the "always set together" invariant structural rather than implicit, and
fixes the incorrect str | None annotation on _start_frame_id (Frame.id
is int).
Add pyrightconfig.json with basic type checking for zero-error modules
(clocks, metrics, transcriptions, frames) and enforce via CI. The
include list will expand as modules are fixed.
* Improve HeyGen LiveAvatar plugin reliability and performance
- Add WebSocket ready gate: wait for session.state_updated connected
event before sending commands (prevents silently dropped messages)
- Add keep-alive mechanism: send session.keep_alive every 2.5 min to
prevent 5-minute inactivity timeout
- Optimize audio chunking: 600ms first chunk for faster initial
response, 1s subsequent chunks for efficient streaming
- Fix audio buffer flush: send remaining buffered audio on utterance
end instead of discarding it
- Fix WS state cleanup: properly reset connected/ready state when
WebSocket drops unexpectedly
- Add livekit_config passthrough in LiveAvatar session token creation
- Replace stray print() with logger.debug()
* Fix HeyGenOutputTransport.start() signature and use 400ms first chunk
- Update transport.py to match new client.start() signature (no
audio_chunk_size param)
- Change first chunk size from 600ms to 400ms per feedback
* Fix transport audio resampling and client.start() error propagation
- Add audio resampling in HeyGenOutputTransport.write_audio_frame() to
ensure audio is always 24kHz before sending to HeyGen (was sending
at pipeline sample rate, causing garbled audio)
- Raise exception on WS ready timeout instead of silently returning,
preventing transport from appearing ready when WS connection failed
* Fix session readiness gate to work with LITE mode
LITE mode does not send session.state_updated WS events. Instead,
use a dual-signal _session_ready event that fires on either:
- WS session.state_updated connected (FULL mode)
- LiveKit participant connected (LITE mode)
Also reorder start() to connect both WS and LiveKit before waiting,
since the WS events may depend on LiveKit being connected.
Verified with live sandbox session - all tests pass.
* Simplify session readiness to use only WS ready gate
Remove _session_ready dual-signal and use only _ws_ready, which fires
on the session.state_updated connected WS event. Increase timeout to
30s. LiveKit is connected before waiting so the WS event can arrive.
* Reduce WS ready gate timeout back to 10s
* Remove WS ready gate (session.state_updated not reliably received)
The session.state_updated connected event is not reliably received
via the websockets library. Remove the gate for now and assume the
session is ready after WS + LiveKit connect. Keep-alive, chunking,
buffer flush, state cleanup, and other improvements remain.
Mirrors the existing `from_string` classmethod and lets callers
turn a frame's `buttons` list back into a dial string like `"123#"`.
`__str__` and the Daily transport's native DTMF path reuse it.
The single-key `button` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` is kept as a first-class ergonomic shortcut
for the common single-keypress case, equivalent to
`buttons=[button]`. `buttons` takes precedence when both are set.
Replaces the string-based `tones` field with a type-safe
`buttons: list[KeypadEntry]` on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame`, matching the existing singular `button`
field on `InputDTMFFrame`. A `from_string` classmethod builds the
list from a dial string like `"123#"` (invalid characters raise
ValueError from the `KeypadEntry` constructor).
The base output audio fallback now iterates `frame.buttons`
directly, LiveKit sends `frame.buttons[0].value`, and the Daily
transport joins the button values into the single string Daily's
`send_dtmf` expects.
Introduces a new `tones` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` for sending multi-digit DTMF sequences and
deprecates the existing single-key `button` field. When only `button`
is set, it is used as a single-character `tones` string for backward
compatibility.
`DTMFFrame` is kept as an empty marker class so both input and output
DTMF frames can still be identified via isinstance. `InputDTMFFrame`
keeps its required `button` field (single keypress semantics).
The Daily-specific `DailyOutputDTMFFrame` and
`DailyOutputDTMFUrgentFrame` frames no longer need to override
`button` and simply add `session_id` and `digit_duration_ms`, which
are forwarded to Daily's `send_dtmf` as `sessionId` and
`digitDurationMs`.
The base output audio fallback now iterates `tones` and generates a
tone per character; LiveKit's native DTMF path sends `tones[0]` since
its API is single-tone.
Introduces Daily-specific DTMF output frames that carry explicit
`tones`, `session_id` and `digit_duration_ms` fields, forwarded to
Daily's `send_dtmf` as `tones`, `sessionId` and `digitDurationMs`.
The inherited `button` and `transport_destination` fields are
ignored for these frames in the Daily transport.
The Azure TTS _handle_completed callback was putting the audio stream
completion signal (None) directly into _audio_queue while the last word
was still pending in _word_boundary_queue. This caused a race condition
where run_tts could exit and TTSStoppedFrame could be emitted before the
word processor task had a chance to process and emit the final word's
TTSTextFrame.
The fix routes the completion signal through _word_boundary_queue as a
None sentinel. The word processor task now recognizes this sentinel and
only signals _audio_queue after all pending words have been drained.
This guarantees the last word's TTSTextFrame is always emitted before
TTSStoppedFrame.
The cancellation/interruption path (_handle_canceled) is unchanged and
still signals _audio_queue directly, which is correct since word ordering
does not matter when speech is interrupted.
When the LLM returned zero text tokens (e.g. it was interrupted before producing
tokens or about to push tokens), push_aggregation() returned an empty string and
on_assistant_turn_stopped was never emitted. This left consumers waiting for an
event that would never arrive.
Now on_assistant_turn_stopped always fires, with an empty content string when
the LLM produced no text tokens.
Fixes#4292
Only treat messages[0] as the initial system prompt when determining the
summarization range. Previously, the code scanned the entire context for
the first system-role message, which caused failures when the only system
message was a mid-conversation injection (e.g. "The user has been quiet").
In that case summary_start exceeded summary_end, producing an empty range
and "No messages to summarize" errors.
Fixes#4286
The enable_logging and enable_ssml_parsing URL params used truthy checks,
so False was treated the same as None (both skipped). Also, Python's
str(False) produces "False" but the API expects lowercase "false".
Additionally, add enable_logging support to ElevenLabsHttpTTSService
which was missing entirely.
When the STT p99 timeout fires without a transcript, the turn stop
strategy previously did nothing — falling through to the 5-second
user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the
timeout has elapsed so that a late transcript triggers the turn stop
immediately instead of waiting for the fallback.
Previously settings updates were ignored with a TODO comment. Now when
model/language changes via STTUpdateSettingsFrame the service disconnects
and reconnects with the new query parameters.
Key changes:
- Implement _update_settings to disconnect/reconnect on changes
- Check `is not State.OPEN` in run_stt to catch CLOSING state
- Send `done` command before closing for clean session shutdown
- Capture websocket reference in _disconnect_websocket to prevent a
concurrent _connect from having its new connection nulled by a stale
finally block
The strategy schedules background tasks during setup. Fast-running
tests could observe state before those tasks had a chance to run;
yielding once via asyncio.sleep(0) ensures they do.
Enable callers to get a compact version of context messages suitable
for serialization, logging, and debugging tools. For standard
messages, known binary data (base64 images, audio) is fully elided.
For LLM-specific messages, long string values are recursively
truncated. Adapter get_messages_for_logging() methods now use this.
Example files can live under subdirectories (e.g. foundational/01.py),
so the recording path needs its parent directory created before the
audio file is written.
Replaces the per-frame asyncio.Event signaling with a monotonic
timestamp updated on each audio frame. The handler sleeps until the
next deadline (last_audio_time + timeout), recomputing on each wake-up
to account for audio arriving during sleep.
This avoids waking the handler on every audio frame (~50/s at 20ms
chunks), and guarantees detection latency is bounded by timeout rather
than 2 * timeout.
Also renames audio_starvation_timeout to audio_idle_timeout and
associated identifiers for consistency with existing pipecat naming
(user_idle_timeout, etc.).
These are TypedDicts (plain dicts at runtime), so no behavioral change
— just more descriptive type hints for readers. Use ToolParam instead
of FunctionToolParam for the Responses adapter to reflect that custom
non-function tools are supported. Use ChatCompletionToolParam instead
of Any for the completions adapter return type. Update tests to use
typed params in expected values.
During pipeline shutdown, proxy tasks must be cancelled before observer
resources are cleaned up. Previously, stop() was called inside
_cancel_tasks() and start() was called in _start_tasks(), which could
lead to proxy tasks still consuming frames after observer resources
were closed.
Now the lifecycle is explicit in _handle_start_frame: start() after all
observers are loaded, and stop() before cleanup() on shutdown.
Also fixes misleading variable name in TaskObserver.cleanup() where
iterating self._proxies yields observer keys, not Proxy values.
Fixes#4195
Move event.clear() from finally block to success path in
IdleFrameProcessor and UserIdleProcessor._idle_task_handler().
The finally block unconditionally cleared signals set during
async timeout callbacks, causing false-positive idle detection.
Closes#3402
* Add Inworld Realtime LLM service
Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.
New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py
Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled
* Prefer init-provided system instruction in Inworld Realtime
Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.
* Update changelog entry with PR number
* Fix changelog format to use bullet point
* Polish PR: default model, example cleanup, changelog update
- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog
* Address PR review feedback for Inworld Realtime
- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
_ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths
* Simplify example, add provider tracking, remove local VAD path
- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends
* Default TTS model to inworld-tts-1.5-max
* Remove dead shimmed tools code, set STT/VAD defaults
- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
Integrate with Mistral's Voxtral TTS API (voxtral-mini-tts-2603) using
HTTP streaming with Server-Sent Events. Converts base64-encoded float32
PCM chunks from the API to int16 for the Pipecat pipeline.
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.
Set the flag in _handle_session_ready when we detect a reconnect, either
via session_resumption_handle (server restores state) or via existing
context (rare case where connection drops before first resumption handle).
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.
Set the flag in _handle_session_ready when context already exists
(i.e. reconnect case) since we don't need to go through
_create_initial_response again.
Remove the deprecation proxy infrastructure that allowed old-style flat
imports (e.g. `from pipecat.services.openai import OpenAILLMService`).
Users must now import from specific submodules
(`from pipecat.services.openai.llm import OpenAILLMService`), which is
already the established pattern across all internal code and 179+ examples.
- Strip 32 proxy `__init__.py` files to empty
- Strip 3 non-proxy files with bare star imports (minimax, sambanova, sarvam)
- Strip google/gemini_live `__init__.py` re-exports
- Remove DeprecatedModuleProxy class and helpers from services/__init__.py
- Remove ruff per-file ignore for services/__init__.py
- Fix 2 examples using old-style imports
Patch Pydantic's DICT_TYPES check in conf.py to accept Union-wrapped
dict types, fixing the autodoc import failure for models using
ConfigDict(extra="allow").
Make -W (warnings as errors) opt-in via --strict flag instead of
default, and update README to reflect uv-based workflow and current
directory structure.
Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame,
and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator,
mirroring the existing LLMUserAggregator downstream tests. Add
frames_to_send_direction param to run_test helper to support this.
The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages *at that point in time*, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.
Napoleon's Attributes section creates class-level attribute docs that
duplicate the __init__ parameter docs when napoleon_include_init_with_doc
is enabled. Using Parameters avoids the duplication.
- Remove expect_stripped_words from LLMAssistantAggregatorParams and related warnings
- Remove old multi-parameter on_push_frame observer signature support in TaskObserver
- Remove deprecated context field from UserImageRequestFrame
- Remove deprecated LiveKitTransportMessageFrame and LiveKitTransportMessageUrgentFrame
- Remove deprecated pipecat.turns.mute shim module
Replace Markdown code blocks with RST syntax in genesys.py, fix
deprecated directive transitions in nvidia and summarization modules,
remove stray bullet prefix in whisper arg docs, restructure code block
in turn completion mixin, and add deepgram mock to Sphinx conf.
Remove stale riva mock imports from autodoc_mock_imports since the riva
service was removed and nvidia-riva-client is installed during doc builds.
Add pipecat.turns and pipecat.extensions to import_core_modules() and
add Turns to the index.rst toctree. Regenerate uv.lock to reflect the
riva extra removal from pyproject.toml.
Move the FastAPI instance to module level so other packages can import
it and register routes before main() is called. main() now configures
the existing app with transport-specific routes instead of creating a
new one.
Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of
Silero VAD. This removes vad_events from settings and LiveOptions,
the should_interrupt parameter, the vad_enabled property,
_on_speech_started/_on_utterance_end handlers, and simplifies
_on_message and process_frame accordingly.
Remove the send_transcription_frames parameter from OpenAI Realtime LLM
(deprecated since 0.0.92). Also fix undefined _warn_deprecated_param
calls in both OpenAI and xAI realtime services, replacing them with the
existing _warn_init_param_moved_to_settings method.
UserBotLatencyLogObserver (deprecated 0.0.102) is replaced by
UserBotLatencyObserver. UserIdleProcessor (deprecated 0.0.100) is
replaced by LLMUserAggregator with user_idle_timeout.
A single service failing to reconnect should not kill the entire
pipeline. Non-fatal errors flow through the pipeline so application
code (e.g. ServiceSwitcher) can handle failover to a backup service.
When a WebSocket server accepts the handshake but immediately closes the
connection (e.g. invalid API key returning close code 1008), the existing
exponential backoff does not help because the handshake keeps succeeding.
This tracks how long each connection survives and emits a non-fatal
ErrorFrame after 3 consecutive sub-5s failures, allowing ServiceSwitcher
failover instead of killing the pipeline.
Fixes#3711
description: Review, refactor, document, and validate code changes in the current branch
---
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
---
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
## Instructions
### 1. Safety check
```bash
git status --short
```
If there are uncommitted changes, stop and tell the user to commit or stash them first.
### 2. Inspect the branch
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
```
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
### 3. Create a backup branch
```bash
git branch backup/<current-branch-name>
```
Tell the user the backup exists so they can recover if needed.
### 4. Soft-reset to main and unstage everything
```bash
git reset --soft main
git restore --staged .
```
All branch changes are now in the working tree, unstaged. No content has changed.
### 5. Plan the logical groups
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
- Core feature or fix (new source files + modified core files)
- Secondary features or fixes (each as its own commit if distinct)
- Refactoring or renames
- Tests
- Changelogs / docs
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
Present the proposed grouping to the user and ask for confirmation before committing.
### 6. Commit in logical groups
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
```bash
git add <file1> <file2> ...
git commit -m "..."
```
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
### 7. Verify
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
git status --short
```
Confirm:
- Commit count is small and each message is meaningful
- The set of changed files vs `main` is identical to before
- Working tree is clean
### 8. Remind about force-push
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
## Rules
- Never change file contents. If you find yourself editing a file, stop.
- Never skip the backup branch step.
- Never force-push without explicit user instruction.
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.
This file provides guidance to AI coding agents when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Deprecations**: Use the `.. deprecated:: <version>` Sphinx directive in docstrings (never inline tags like `[DEPRECATED]`), and pair it with a runtime `warnings.warn(..., DeprecationWarning)` at the call site. See `CONTRIBUTING.md` for full conventions.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
-`@dataclass`: Frame types, context aggregator pairs, internal data containers
-`BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
classMyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def__init__(self,param1:str,**kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
# Pydantic params class with a deprecated field
classMyParams(BaseModel):
"""Configuration parameters for MyService.
Parameters:
new_setting: Replacement for ``old_setting``.
old_setting: Legacy setting, no longer used.
.. deprecated:: 1.2.0
Use ``new_setting`` instead. Will be removed in 2.0.0.
"""
new_setting:str="default"
old_setting:str|None=None
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
-`@dataclass`: Frame types, context aggregator pairs, internal data containers
-`BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
classMyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def__init__(self,param1:str,**kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
Need multiple AI agents working together? [Pipecat Subagents](https://github.com/pipecat-ai/pipecat-subagents) lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.
### 📱 Client SDKs
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
@@ -67,7 +71,7 @@ and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/api-reference/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
@@ -79,28 +83,28 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
- ⚠️ Added WebSocket-based `OpenAIResponsesLLMService` as the new default for the OpenAI Responses API. It maintains a persistent connection to `wss://api.openai.com/v1/responses` and automatically uses `previous_response_id` to send only incremental context, falling back to full context on reconnection or cache miss. The previous HTTP-based implementation is now available as `OpenAIResponsesHttpLLMService`.
- ⚠️ Removed `OpenPipeLLMService` and the `openpipe` extra. OpenPipe was acquired by CoreWeave and the package is no longer maintained. If you were using `openpipe` as an LLM provider, switch to the underlying provider directly (e.g. `openai`). The OpenPipe interface can still be used with `OpenAILLMService` by specifying a `base_url`.
- ⚠️ Updated `langchain` extra to require langchain 1.x (from 0.3.x), langchain-community 0.4.x (from 0.3.x), and langchain-openai 1.x (from 0.3.x). If you pin these packages in your project, update your pins accordingly.
- Fixed `InworldHttpTTSService` streaming responses crashing with `UnicodeDecodeError` when multi-byte UTF-8 characters were split across chunk boundaries. This caused TTS audio to cut off mid-sentence intermittently.
- Fixed a crash (`JSONDecodeError`) when a user interruption occurs while the LLM is streaming function call arguments. Previously, the incomplete JSON arguments were passed directly to `json.loads()`, causing an unhandled exception. Affected services: OpenAI, Google (OpenAI-compatible), and SambaNova.
- ⚠️ Removed deprecated `on_pipeline_ended`, `on_pipeline_cancelled`, and `on_pipeline_stopped` events from `PipelineTask`. Use `on_pipeline_finished` instead.
- ⚠️ Removed deprecated `camera_in_enabled`, `camera_in_is_live`, `camera_in_width`, `camera_in_height`, `camera_out_enabled`, `camera_out_is_live`, `camera_out_width`, `camera_out_height`, and `camera_out_color` transport params. Use the `video_in_*` and `video_out_*` equivalents instead.
- ⚠️ Removed deprecated RTVI models, frames, and processor methods including `RTVIConfig`, `RTVIServiceConfig`, `RTVIServiceOptionConfig`, various `RTVI*Data` models, `RTVIActionFrame`, and `RTVIProcessor.handle_function_call`/`handle_function_call_start`. Use the updated RTVI processor API instead.
- ⚠️ Removed deprecated transport frames: `TransportMessageFrame`, `TransportMessageUrgentFrame`, `InputTransportMessageUrgentFrame`, `DailyTransportMessageFrame`, and `DailyTransportMessageUrgentFrame`. Use `OutputTransportMessageFrame`, `OutputTransportMessageUrgentFrame`, `InputTransportMessageFrame`, `DailyOutputTransportMessageFrame`, and `DailyOutputTransportMessageUrgentFrame` instead.
- ⚠️ Removed deprecated interruption frames: `StartInterruptionFrame` and `BotInterruptionFrame`. Use `InterruptionFrame` and `InterruptionTaskFrame` instead.
- ⚠️ Removed deprecated `pipecat.services.gemini_multimodal_live` package. Use `pipecat.services.google.gemini_live` instead. Note that class names no longer include "Multimodal" (e.g. `GeminiMultimodalLiveLLMService` → `GeminiLiveLLMService`).
- ⚠️ Removed deprecated `OpenAIRealtimeBetaLLMService` and `AzureRealtimeBetaLLMService`. Use `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService` from `pipecat.services.openai.realtime` and `pipecat.services.azure.realtime` instead.
- ⚠️ Removed deprecated `pipecat.services.deepgram.stt_sagemaker` and `pipecat.services.deepgram.tts_sagemaker` modules. Use `pipecat.services.deepgram.sagemaker.stt` and `pipecat.services.deepgram.sagemaker.tts` instead.
- ⚠️ Removed deprecated `GoogleLLMOpenAIBetaService` from `pipecat.services.google.openai`. Use `GoogleLLMService` from `pipecat.services.google.llm` instead.
- ⚠️ `BaseOpenAILLMService.get_chat_completions()` now accepts an `LLMContext` instead of `OpenAILLMInvocationParams`. If you override this method, update your signature accordingly.
- ⚠️ Removed deprecated service-specific context and aggregator machinery, which was superseded by the universal `LLMContext` system.
Service-specific classes removed: `AnthropicLLMContext`, `AnthropicContextAggregatorPair`, `AWSBedrockLLMContext`, `AWSBedrockContextAggregatorPair`, `OpenAIContextAggregatorPair`, and their user/assistant aggregators. Also removed `create_context_aggregator()` from `LLMService`, `OpenAILLMService`, `AnthropicLLMService`, and `AWSBedrockLLMService`.
- ⚠️ Removed deprecated frame types `LLMMessagesFrame` and `OpenAILLMContextAssistantTimestampFrame` from `pipecat.frames.frames`. Instead of `LLMMessagesFrame`, use `LLMContextFrame` with the new messages, or `LLMMessagesUpdateFrame` with `run_llm=True`.
- ⚠️ Removed `VisionImageFrameAggregator` (from `pipecat.processors.aggregators.vision_image_frame`). Vision/image handling is now built into `LLMContext` (from `pipecat.processors.aggregators.llm_context`). See the `12*` examples for the recommended replacement pattern.
- ⚠️ Removed `OpenAILLMContext`, `OpenAILLMContextFrame`, and `OpenAILLMContext.from_messages()`. Use `LLMContext` (from `pipecat.processors.aggregators.llm_context`) and `LLMContextFrame` (from `pipecat.frames.frames`) instead. All services now exclusively use the universal `LLMContext`.
From the developer's point of view, migrating will usually be a matter of going from this:
- Fixed `CartesiaTTSService` failing with "Context has closed" errors when switching voice, model, or language via `TTSUpdateSettingsFrame`. The service now automatically flushes the current audio context and opens a fresh one when these settings change.
- ⚠️ Removed deprecated service parameters and shims that have been replaced by the `settings=Service.Settings(...)` pattern or direct `__init__` parameters:
-`GoogleVertexLLMService`: `InputParams` class with `location`/`project_id` fields (use direct init params); `project_id` is now required, `location` defaults to `"us-east4"`
-`MiniMaxHttpTTSService`: `english_normalization` from `InputParams` (use `text_normalization`)
-`SimliVideoService`: `simli_config` init param (use `api_key`/`face_id`), `use_turn_server` init param; `api_key` and `face_id` are now required
-`AnthropicLLMService`: `enable_prompt_caching_beta` from `InputParams` (use `enable_prompt_caching`)
- ⚠️ `LLMService.function_call_timeout_secs` now defaults to `None` instead of `10.0`. Deferred function calls will run indefinitely unless a timeout is explicitly set at the service level or per-call. If you relied on the previous 10-second default, pass `function_call_timeout_secs=10.0` explicitly.
- ⚠️ Removed deprecated `pipecat.transports.services` and `pipecat.transports.network` module aliases. Update imports to use `pipecat.transports.daily.transport`, `pipecat.transports.livekit.transport`, `pipecat.transports.websocket.*`, `pipecat.transports.webrtc.*`, and `pipecat.transports.daily.utils` respectively.
- ⚠️ Removed deprecated `EmulateUserStartedSpeakingFrame` and `EmulateUserStoppedSpeakingFrame` frames, and the `emulated` field from `UserStartedSpeakingFrame` / `UserStoppedSpeakingFrame`.
- ⚠️ Removed deprecated `STTMuteFilter`, `STTMuteConfig`, and `STTMuteStrategy` from `pipecat.processors.filters.stt_mute_filter`. Use `pipecat.turns.user_mute` strategies with `LLMUserAggregator`'s `user_mute_strategies` parameter instead.
- ⚠️ Removed deprecated `allow_interruptions` parameter from `PipelineParams`, `StartFrame`, and `FrameProcessor`. Interruptions are now always allowed by default. Use `LLMUserAggregator`'s `user_turn_strategies` / `user_mute_strategies` parameters to control interruption behavior.
- ⚠️ Removed `ExternalUserTurnStrategies` and the automatic fallback to it in `LLMUserAggregator` when a `SpeechControlParamsFrame` was received from the transport.
- ⚠️ Removed `vad_analyzer` and `turn_analyzer` parameters from `TransportParams` and all transport input classes, along with all deprecated VAD/turn analysis logic in `BaseInputTransport`. VAD and turn detection are now handled entirely by `LLMUserAggregator`.
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.
- Added `pipecat.workers`, a worker-based agent framework folded in from the standalone `pipecat-subagents` package. Workers inherit from `BaseWorker`, share a `WorkerBus`, register in a `WorkerRegistry`, and exchange typed work via `@job` handlers. `LLMWorker` and `LLMContextWorker` provide ready-made LLM-driven workers. `PipelineRunner.spawn(worker)` registers fire-and-forget workers alongside the main pipeline worker.
- ⚠️ `FrameProcessorSetup.pipeline_worker` and `FunctionCallParams.pipeline_worker` are now mandatory fields, and `FrameProcessor.pipeline_worker` raises if read before `setup()` instead of returning `None`. Real-world code (frame processors set up by `PipelineWorker`, tool handlers invoked by `LLMService`) is unaffected; only callers that construct these dataclasses by hand (typically tests) now have to supply a `pipeline_worker` reference.
-`PipelineWorker` now inherits from `BaseWorker`, so every pipeline worker is also a bus participant. It accepts a new optional `bridged=()` parameter that auto-wraps the pipeline with bus edge processors, letting the worker exchange frames with other bridged workers over the shared `WorkerBus`. The bus is supplied by `PipelineRunner` via `worker.attach(registry=..., bus=...)` instead of through the constructor.
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.