Compare commits

...

793 Commits

Author SHA1 Message Date
Mark Backman
9054912dfb Update to add_workers 2026-05-21 23:20:40 -04:00
Mark Backman
0b9500aae4 Match shopping-list client styling to the other UI demos
Restyle from a bespoke dark theme to the light theme the other UI demos
share: the canonical :root tokens (--border, --muted, --highlight), the
#fafafa/#18181b body, the sticky white header with the light/red Connect
button, the fixed bottom-right #status toast, and the amber
ui-highlight-pulse keyframe. index.html drops the custom topbar wrapper for
the standard <header> plus a standalone #status element.
2026-05-21 23:20:40 -04:00
Mark Backman
10b8feb9ea Add shopping-list UIWorker example (bridge-free voice + UI)
Demonstrates the 'every input acts, may speak' pattern without bridging: a
standard voice pipeline (STT → LLM → TTS) whose LLM only converses, plus a
separate UIWorker that does all the list work. The voice pipeline's user
aggregator fires on_user_turn_stopped each turn and dispatches the transcript
to the UIWorker as a respond job (a bus message); the UIWorker reads the
auto-injected <ui_state> snapshot and drives the list silently via add_item /
set_checked / remove_item commands (plus the standard highlight). Items are
checkboxes whose label and checked state the snapshot exposes.

Includes a vanilla-JS client following the existing UI-demo client style.
2026-05-21 23:20:40 -04:00
Mark Backman
1c94feaaff Inject <ui_state> via the LLM's on_before_process_frame hook
Move <ui_state> snapshot injection out of respond_with_llm into a
cross-cutting on_before_process_frame handler on the UIWorker's LLM, so it
appends the current snapshot to the context the request is built from, just
before each inference. Injection is gated to the user-turn-initiating
inference so a tool-calling turn never stacks duplicate <ui_state> blocks;
respond_with_llm no longer injects manually.

Also drop the bridged parameter from UIWorker: there is no viable way to
bridge a UIWorker between workers — a shared, teed context would be polluted
by the injection, and per-worker turn detection off teed frames isn't
supported. Other workers keep their PipelineWorker bridging.
2026-05-21 23:20:40 -04:00
Mark Backman
950fc10f05 Add document-review UIWorker example
Synthesis example: a ReplyToolMixin UIWorker adds a start_review tool that fans
out to clarity/tone peers via start_user_job_group, translates each reviewer
response into an add_note command in on_job_response, handles a client
note_click event via @on_ui_event, and keeps history across turns.
2026-05-21 23:20:40 -04:00
Mark Backman
07725429b2 Add async-tasks UIWorker example
A UIWorker with a custom reply tool fans research out to three BaseWorker peers
via start_user_job_group; their progress streams to the client as ui-task cards
and the user can cancel a group mid-flight.
2026-05-21 23:20:40 -04:00
Mark Backman
6b0e204d66 Add form-fill UIWorker example
A ReplyToolMixin UIWorker that fills inputs (fills) and toggles checkboxes /
presses submit (click) by voice — the state-changing half of the standard
action set.
2026-05-21 23:20:40 -04:00
Mark Backman
f826da9ac9 Add deixis UIWorker example
A ReplyToolMixin UIWorker that grounds in the user's text selection (the
<selection> block in the snapshot) and points back via select_text — both
directions of deictic reference.
2026-05-21 23:20:40 -04:00
Mark Backman
81b956d963 Add pointing UIWorker example
The voice LLM delegates to a ReplyToolMixin UIWorker that scrolls offscreen
items into view and highlights the phones it names — exercising the scroll_to /
highlight UI commands and the [offscreen] state tag.
2026-05-21 23:20:40 -04:00
Mark Backman
2254a8d0a2 Add hello-snapshot UIWorker example
Smallest UIWorker demo: a voice LLM in the main pipeline delegates
screen-relevant utterances to a UIWorker via a respond job; the UIWorker
auto-injects the current <ui_state> and answers grounded in what's on screen.
Includes a vanilla-JS client that streams accessibility snapshots over RTVI.
2026-05-21 23:20:40 -04:00
Mark Backman
f1f5a986e8 Add UIWorker
UIWorker is an LLMContextWorker that observes and drives a client GUI over the
RTVI UI channel: it stores accessibility snapshots, auto-injects <ui_state> at
the start of each respond job, dispatches client events to @on_ui_event
handlers, sends UI commands back to the client, and surfaces fan-out work as
cancellable task cards via user_job_group(). The optional ReplyToolMixin exposes
a bundled reply tool.

The prompt_guide parameter auto-appends the UI wire-format guide to the LLM's
system instruction (default UI_STATE_PROMPT_GUIDE; override with a string or
disable with None), so the LLM can parse the injected <ui_state> / <ui_event>
messages without the app concatenating the guide by hand.
2026-05-21 23:20:40 -04:00
Mark Backman
02667a7255 Add native RTVI⇄bus UI bridge to PipelineWorker
When RTVI is enabled, PipelineWorker now republishes inbound ui-event /
ui-snapshot / ui-cancel-task messages onto the bus as a broadcast
BusUIEventMessage, and translates outbound BusUICommandMessage / BusUITask*
carriers into the matching RTVI frames. This lets a UIWorker on the bus observe
and drive the client UI with no decorator or manual wiring; when no UIWorker is
present the events are simply unconsumed.

The BusUI* carriers live in the bus layer so both pipeline and workers can
reference them without an import cycle.
2026-05-21 23:03:37 -04:00
Mark Backman
ee3d1128ec Add LLMService.append_system_instruction()
Composes durable text onto a user-provided system instruction (alongside the
turn-completion and async-tool-cancellation addons) so it is prepended on every
inference and survives context-message resets. The user's base prompt is now
snapshotted once and the effective instruction is always rebuilt from it,
replacing the prior lazy capture/restore logic with a single invariant.
2026-05-21 23:03:37 -04:00
Aleix Conchillo Flaqué
e8ec7c585f Rename PipelineRunner.add_worker() to variadic add_workers(*workers)
Lets callers register multiple workers in a single call instead of
awaiting add_worker() repeatedly. Updates all examples, docs, tests,
and proxy worker docstrings to use the new API.
2026-05-21 19:46:53 -07:00
Aleix Conchillo Flaqué
f91179a640 Forward active from PipelineWorker through to BaseWorker
PipelineWorker.__init__ was only forwarding `name` to BaseWorker, so
the `active` flag (the other BaseWorker constructor arg) wasn't
reachable from PipelineWorker callers. Add `active: bool = True` to
the signature and pass it through.
2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
e85f3fe606 update uv.lock 2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
d07ba562eb Separate bus messages from pipeline frames
BusMessage was a mixin tacked onto DataFrame / SystemFrame so the bus
could reuse the frame priority machinery. That made every bus message
also a Frame, which is misleading — bus messages travel on the bus, not
through pipelines. If a worker actually needs to ship a frame, it wraps
it in BusFrameMessage.

BusMessage is now a plain dataclass base carrying source/target.
BusDataMessage and BusSystemMessage are empty subclasses that exist
only as priority markers. The bus router and the priority queue check
``isinstance(item, BusSystemMessage)`` directly instead of
``isinstance(item, SystemFrame)``.

The serializer test that round-tripped DataFrame.name (a non-init
field) is rewritten against a local _MessageWithNonInit(BusDataMessage)
subclass so the serializer's init=False path stays covered.
2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
b03247f360 Rename BaseTask → BaseWorker and reserve "task" for asyncio
Replaces every "task" identifier that referred to the BaseTask
abstraction with "worker". Asyncio task plumbing (asyncio.Task,
BaseTaskManager, TaskManager, create_task, cancel_task, etc.) stays
untouched. Highlights:

- Classes: BaseTask → BaseWorker, PipelineTask → PipelineWorker,
  LLMTask → LLMWorker, LLMContextTask → LLMContextWorker, TaskBus →
  WorkerBus, TaskRegistry → WorkerRegistry, TaskActivationArgs →
  WorkerActivationArgs, TaskReadyData → WorkerReadyData,
  TaskRegistryEntry → WorkerRegistryEntry, TaskObserver →
  WorkerObserver, all Bus*TaskMessage → Bus*WorkerMessage,
  BusAddTaskMessage.task field → worker, BusWorkerRegistryMessage.tasks
  field → workers.
- Methods/decorators: activate_task → activate_worker, deactivate_task
  → deactivate_worker, add_task → add_worker, watch_task →
  watch_worker, @task_ready → @worker_ready, setup_pipeline_task hook
  → setup_pipeline_worker.
- Params/fields: FrameProcessorSetup.pipeline_task and
  FunctionCallParams.pipeline_task → pipeline_worker. Parameter names
  like task_name → worker_name; spawn/run accept worker:.
- Files: pipeline/base_task.py → base_worker.py, pipeline/task.py →
  worker.py (plus a re-export shim at pipeline/task.py),
  task_observer.py → worker_observer.py, task_ready_decorator.py →
  worker_ready_decorator.py, pipecat.tasks → pipecat.workers,
  llm_task.py → llm_worker.py, llm_context_task.py →
  llm_context_worker.py, examples/multi-task → examples/multi-worker.

Back-compat:
- PipelineTask kept as a deprecated subclass of PipelineWorker that
  warns on construction.
- pipecat.pipeline.task re-exports PipelineWorker/PipelineTask/etc. so
  existing user imports keep working.
- FrameProcessor.pipeline_task kept as a deprecated property that
  forwards to pipeline_worker.

Local variables in examples that hold a worker (task = PipelineTask(...))
are renamed to worker = PipelineWorker(...). Asyncio-task locals
(runner_task, etc.) are preserved.
2026-05-21 19:07:13 -07:00
Aleix Conchillo Flaqué
b9aed0d673 Rename BaseTask.send_error to send_bus_error_message
Symmetric with send_bus_message; "send_bus_error" on its own reads
ambiguously (sounds like an error about the bus, à la SIGBUS) and the
underlying types are BusTaskErrorMessage / BusTaskLocalErrorMessage,
so keeping "_message" in the name matches what's actually sent.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
d8947c68a9 Rename BaseTask.send_message to send_bus_message
Mirrors on_bus_message and makes it explicit that the call goes out on
the task bus, not on a transport (transports have their own
send_message for client/peer messaging).
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
373894fc65 Fold BaseTask.handoff_to into activate_task(deactivate_self=...)
BaseTask.handoff_to was just deactivate_self + activate_task. Remove
it and add a deactivate_self flag on activate_task instead, so there's
one entry point for activating another task.

LLMTask now overrides activate_task (mirroring its end() override) to
keep the messages / result_callback hooks that finish an in-progress
tool call before the target is activated. All multi-task examples and
unit tests switch to the new call.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
e8bbb5ee09 Add setup_pipeline_runner hook to PIPECAT_SETUP_FILES
PipelineRunner now picks up an async setup_pipeline_runner(runner) hook
from the same PIPECAT_SETUP_FILES env var that PipelineTask already uses
for setup_pipeline_task. Previously the runner used a separate
PIPECAT_RUNNER_SETUP_FILES variable and a setup_runner function — both
are removed.

A new _setup_files module hosts the loader for both hooks and caches
each setup file's module so a single file defining both hooks (e.g. a
debugger that registers a runner-level task in one hook and a per-task
observer in the other) sees its module-level state preserved across
invocations.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
a2e58044f2 update pyproject.toml and uv.lock 2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
8867426a97 Document sensor-controller example in the multi-task README
Add a Local-section entry with the running instructions, example
questions, and architecture diagram for the new sensor-controller
example.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
d984393213 Make local-handoff builder functions public
Rename ``_build_greeter`` / ``_build_support`` to ``build_greeter`` /
``build_support`` to match the convention used by other multi-task
examples (e.g. ``build_sensor_controller``). They're public factories
the example exposes; the leading underscore was misleading.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
959fb831f1 Drop redundant name strings from create_task calls in bus + proxy
``BaseObject.create_task`` already auto-names the task based on the
coroutine; the explicit ``f"{self}::..."`` strings duplicated that
default and made the call sites noisier. Remove them.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
410190dabb Add sensor-controller multi-task example
A voice agent talking to a worker that owns a simulated temperature
sensor. Demonstrates two ``PipelineTask`` instances side by side
communicating purely via ``BusJobRequestMessage`` /
``BusJobResponseMessage`` — the worker is a plain ``PipelineTask``
(no ``LLMTask`` subclassing, not bridged) whose pipeline runs both an
autonomous sensor tick loop and its own tool-calling LLM:

    SensorReader -> SensorStats -> user_agg -> llm -> assistant_agg

The voice agent's LLM has a single tool, ``ask_controller(question)``,
that forwards the user's request verbatim to the worker and speaks
back the controller's reply. The worker LLM has direct tools to read
the current temperature, inspect rolling stats, set the target, or
change the response rate; the sensor simulation drifts toward the
target with a first-order lag plus Gaussian noise.

Job responses are paired with completed LLM turns via the assistant
aggregator's ``on_assistant_turn_stopped`` event, skipping empty
turn-stopped events that fire between a tool call and its result.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
f22350ce2f Use symmetric spawn-then-run() pattern in multi-task examples
Switch every example to ``await runner.spawn(task)`` followed by
``await runner.run()`` (no task argument), and ``await runner.cancel()``
on client-disconnected instead of ``await task.cancel()``. This makes
the main pipeline task look the same as the worker / proxy tasks
spawned alongside it, and lets ``runner.cancel()`` drive a uniform
shutdown across every root task on the bus.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
5f1b91bb89 Clarify PipelineRunner.run() docstring for the no-task form
Spell out that spawned tasks finishing on their own does not unblock
``runner.run()`` when called without a ``task`` argument. The form is
for hosts (e.g. FastAPI servers) that have no single "main" pipeline
and want to stay up across many spawned sessions; callers who want
the runner to finish when a specific pipeline finishes should pass
that pipeline as ``task``.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
cd22742e10 Default WebSocketProxyClientTask to active=False
``on_activated`` on this task opens the upstream WebSocket connection,
which is almost always something the caller wants to trigger
explicitly (e.g. on local-client-connected). With the BaseTask default
of ``active=True`` the connection was opened twice: once when the task
auto-activated at start, and once again when the caller's
``activate_task("proxy")`` re-fired ``on_activated``. The result on
the remote side was two ``PipelineRunner`` instances per session
instead of one.

Default to ``active=False`` so the activation is a deliberate signal;
pass ``active=True`` explicitly to restore the eager-connect behavior.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
9ecb00d097 Skip pgmq/redis lazy-import tests when their extras are not installed
``test_pgmq_bus_lazy_import`` and ``test_redis_bus_lazy_import``
import ``pipecat.bus.network.pgmq`` / ``redis`` directly, which raises
when the optional ``pgmq`` / ``redis`` packages are missing. Gate each
test with ``@unittest.skipUnless`` on a top-level probe of the
underlying package so they're skipped (not errored) in environments
without the extras. ``test_unknown_attribute_raises`` is unaffected.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
79ae9740cc Skip pgmq/redis bus tests when their extras are not installed
The PGMQ and Redis bus modules raise an ``Exception`` at import time
when the optional ``pgmq`` / ``redis`` packages are missing, which broke
``pytest`` collection in environments without those extras (e.g. CI
that uses ``--no-extra gstreamer --no-extra local``). Wrap the imports
in ``try/except`` and ``raise unittest.SkipTest`` so the whole test
module is skipped cleanly instead of failing collection.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
df704a34f1 Move _wait_tasks_ready into the job-group internals section in BaseTask
``_wait_tasks_ready`` is only called from
``create_job_group_and_request_job``, so it belongs with the other
job-group internals (``_create_job_group``, ``_send_job_request``,
``_task_timeout``, ...) rather than next to the task-readiness
helpers (``_register_ready``, ``_on_watched_task_ready``).
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
7dc2b41412 Drive task end/cancel shutdown from BaseTask by default
``BaseTask._handle_task_end`` and ``_handle_task_cancel`` now call
``stop()`` after propagating to children, so bus-only subclasses
(``WebSocketProxyClientTask``, ``WebSocketProxyServerTask``, custom
worker tasks like ``CodeWorker``) don't need to override these
handlers just to set ``_finished_event``.

Children-propagation is extracted into ``_propagate_end_to_children``
and ``_propagate_cancel_to_children`` so ``PipelineTask`` can call
them directly without invoking ``stop()`` prematurely — the pipeline
still drives its own shutdown through the ``EndFrame`` / ``CancelFrame``
path, which triggers ``on_pipeline_finished`` and ``stop()`` after the
pipeline drains.

Drop the now-redundant overrides from the WebSocket proxy tasks.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
4d9e258e55 Collapse _pipeline_finished_event into BaseTask._finished_event
PipelineTask had its own ``_pipeline_finished_event`` that signalled
"pipeline run has truly finished" — the same role ``BaseTask._finished_event``
plays for bus-only tasks. They were two events with the same intent.

Set ``_finished_event`` directly when the pipeline-end frame propagates
through the sink, drop the now-redundant field, and drop the
``clear()`` after wait so the event stays set for the lifetime of the
task. As a side-effect, ``await pipeline_task.wait()`` from outside now
resolves at the moment the pipeline finishes, matching the semantics
of bus-only tasks.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
de1bd7cb7e code-assistant: work around CancelledError swallow in ClaudeSDKClient
claude_agent_sdk's _AsyncioTaskHandle.wait() uses
`with suppress(asyncio.CancelledError)` to silence the inner read
task's expected cancellation, but it also swallows the outer task's
cancellation if it lands on the same await — causing cancel_task to
time out.

Bypass `async with ClaudeSDKClient` and drive connect/disconnect
ourselves so disconnect() runs in a finally where the outer
CancelledError has already been raised and suspended by Python's
exception machinery, out of reach of the SDK's suppress.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
a5bb9f65de Fix LLMTask._finish_function_call to bypass deferral
self.queue_frame would defer the LLMMessagesAppendFrame because
_finish_function_call always runs inside a tool call. The subsequent
_flush_pipeline() then returned before the goodbye/handoff LLM output
was actually delivered. Use super().queue_frame to push the frame
straight into the pipeline, matching the pattern used in
_flush_pipeline().
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
402cf8dade Port multi-task unit tests from pipecat-subagents
Brings over 215 tests across 15 files covering the new
multi-task framework: BaseTask / PipelineTask bus lifecycle,
job RPC and job groups, the bus message hierarchy and serializers,
TaskBus + AsyncQueueBus + RedisBus + PgmqBus (with direct and
isolated backends), TaskRegistry, the BusBridgeProcessor, the
WebSocket proxy tasks, the LLMTask deferral logic, and the
PipelineRunner spawn-and-attach flow.
2026-05-21 10:13:21 -07:00
Jon Taylor
d757d8d06d Split PgmqBus into orchestrator + pluggable backends
Move the wire-side of PGMQ operations into a new
``pipecat.bus.network.pgmq_backends`` module with a ``PgmqBackend``
Protocol, a ``DirectPgmqBackend`` (peers discovered by queue prefix),
and an ``IsolatedPgmqBackend`` (SECURITY DEFINER ``public.bus_*``
wrappers over an asyncpg pool). ``PgmqBus`` now delegates join,
publish, read, archive, and leave to the configured backend.

Construct ``PgmqBus`` with either ``pgmq=PGMQueue`` (uses
``DirectPgmqBackend``) or ``backend=PgmqBackend`` (any backend); the
two are mutually exclusive.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
a63abc41b6 Add README and env.example for multi-task examples
Adapts the pipecat-subagents `examples/README.md` to the new
layout (`multi-task/` umbrella, `local-handoff/`, `distributed-handoff/`,
`remote-proxy-assistant/`, `parallel-debate/`, `code-assistant/`),
updates the agent→task / job-RPC vocabulary, drops the
single-agent and llm-and-flows examples (gone in the port), and
adds a new section for the PGMQ handoff transport.
2026-05-21 10:13:21 -07:00
Aleix Conchillo Flaqué
4c5fb85856 Add pgmq and redis extras for the distributed bus implementations
`pipecat.bus.network.pgmq` and `pipecat.bus.network.redis` need
optional dependencies. Adding `pgmq` and `redis` extras so users
can `pip install pipecat-ai[pgmq]` / `pip install pipecat-ai[redis]`
to opt in.
2026-05-21 10:13:18 -07:00
Aleix Conchillo Flaqué
4fbeb5fbcb Add remote-proxy-assistant example
Demonstrates the WebSocket proxy tasks: a local `main.py` voice
bot uses `WebSocketProxyClientTask` to forward bus messages
(including `BusFrameMessage`s) to a remote `assistant.py`
FastAPI server. Each incoming connection spawns a
`WebSocketProxyServerTask` plus an `LLMTask` assistant on a
per-session `PipelineRunner`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
4509caa724 Add distributed-handoff examples (redis and pgmq)
Two transports of the same shape: a main task that hosts the
voice pipeline plus a network-backed `TaskBus` (`RedisBus` or
`PgmqBus`), and a standalone `llm.py` worker process for the
greeter / support LLM. Workers connect to the same bus channel,
register on the shared `TaskRegistry`, and the main task waits
on `runner.registry.watch("greeter", ...)` before sending the
welcome activation so it doesn't fire before the worker is up.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
0f7211d072 Add parallel-debate example
A voice moderator that fans out a debate topic to three worker
tasks (advocate, critic, analyst) via `task.job_group(...)`,
then synthesizes their replies. Workers are `LLMContextTask`s
that keep their own conversation context across rounds and use
the assistant-aggregator's `on_assistant_turn_stopped` event
to ship the completed turn back as a job response.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
7c4294b7f6 Add local-handoff-two-agents-tts example
Variant of the local handoff example with per-task TTS voices.
Each child task wraps the LLM with its own `CartesiaTTSService`
in a custom pipeline override, so the main task has no TTS and
audio comes from whichever child is active over the bus.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
6964686808 Add code-assistant example
Voice code assistant that dispatches questions to a Claude Agent
SDK worker. The main task runs the voice pipeline (STT + LLM + TTS)
and an `ask_code` direct function. `CodeWorker` is a bus-only
`BaseTask` spawned on the runner: it accepts `@job`-style
requests through the bus, queues them onto an asyncio queue, and
runs them sequentially through a persistent Claude SDK session so
follow-ups share context. The example shows the job-RPC surface
(`task.job("code_worker", ...)`), bus-only tasks (no pipeline),
and the `pipeline_task` field on `FunctionCallParams`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
f364c088cf Add local-handoff-two-agents example
Two LLM tasks (greeter and support) handing off to each other over
the local `AsyncQueueBus`. The main task owns the transport
pipeline (STT, TTS, transport I/O) and the child tasks each run
their own LLM behind a `BusBridgeProcessor`. Each child uses
`bridged=()` so `PipelineTask` auto-wraps its pipeline with
the bus edge processors, and `transfer_to_agent` / `end_conversation`
tools demonstrate `handoff_to(...)` and `end(...)`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
42204c4d0f Fix pyright errors in new bus/task/proxy code
- `TaskBus._router_task`: cast the narrowed `SystemFrame` back
  to `BusMessage` for the subscriber callback.
- `bus.network.__init__`: expose `PgmqBus` / `RedisBus` to
  the type-checker via a TYPE_CHECKING block so `__all__` is
  satisfied; runtime path still goes through `__getattr__`.
- `RedisBus`: subscribe through a local before assigning
  `self._pubsub`, and `assert self._pubsub is not None` in
  the reader loop.
- `BaseTask.on_job_error` accepts
  `BusJobResponseMessage | BusJobResponseUrgentMessage` to match
  what is dispatched.
- `JobGroupContext.__aexit__` / `JobContext.__aexit__`: assert
  `self._group is not None` before `wait()`.
- `@task_ready` collector: type handlers dict as `dict[str, Callable]`
  so the `.__name__` read on a duplicate handler typechecks.
- WebSocket proxy client/server: assert the socket is set in
  `_receive_loop`, and decode `str` payloads to bytes before
  handing them to the serializer.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
5f86e39038 Port WebSocket proxies to the new BaseTask API
`WebSocketProxyServerAgent` / `WebSocketProxyClientAgent` are
renamed to `WebSocketProxyServerTask` / `WebSocketProxyClientTask`
and updated for the post-refactor surface:

- Drop `bus=` from the constructor; the bus arrives via
  `BaseTask.attach` from the runner.
- Constructor params `agent_name` / `remote_agent_name` /
  `local_agent_name` → `task_name` / `remote_task_name` /
  `local_task_name` (matching `BusBridgeProcessor`).
- Move setup logic from the now-removed `on_ready` hook into
  `start()`; replace `_stop()` overrides with `stop()`.
- Add `_handle_task_end` / `_handle_task_cancel` overrides that
  set `_finished_event` so `PipelineRunner._cancel_spawned_tasks`
  can drive these bus-only tasks to a clean exit.
- Update the registry-message field reference
  (`agents=`/`message.agents` → `tasks=`/`message.tasks`)
  and `TaskReadyData.task_name` access.
- Tighten the server's `_send_ws` exception handling to only
  catch `WebSocketDisconnect`.
- Update install hints (`pipecat-ai[websockets-base]` for the
  client, `starlette` for the server) and refresh docstrings/
  examples to use `runner.spawn(...)`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
86f8f137a8 Sweep agent->task and task->job in docstrings and identifiers
Cleans up leftover "agent" terminology in module/class/method
docstrings across `pipecat.bus`, `pipecat.registry`,
`pipecat.pipeline`, and `pipecat.tasks.llm`, and renames
job-RPC phrasing ("task request", "task identifier",
"task group execution") to use "job" consistently.

API-visible changes:

- `BusBridgeProcessor(agent_name=, target_agent=)` → `task_name=` /
  `target_task=`.
- `@task_ready` decorator's internal marker
  `fn.agent_ready_name` → `fn.task_ready_name`.
- `@tool` decorator's internal marker
  `fn.is_agent_tool` → `fn.is_llm_tool`.
- `PIPECAT_SUBAGENTS_SETUP_FILES` env var →
  `PIPECAT_RUNNER_SETUP_FILES`.
- pgmq/redis bus install hints point at `pipecat-ai\[extra\]`
  rather than the old `pipecat-ai-subagents\[extra\]` package.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
e546471bef Rename _handle_task_* job dispatchers to _handle_job_*
Fixes a name collision where `_handle_task_cancel` was defined
twice — once for `BusCancelTaskMessage` (task lifecycle) and
once for `BusJobCancelMessage` (job RPC) — the second silently
shadowing the first. Job-side dispatchers are now consistently
named `_handle_job_*` and the internal helpers
`_run_task_handler` / `_send_task_request` become
`_run_job_handler` / `_send_job_request`. Task-lifecycle
handlers (`_handle_task_end`, `_handle_task_cancel`,
`_handle_task_activate`, `_handle_task_deactivate`,
`_handle_task_error`) keep their names.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
6d87765648 LLMTask/LLMContextTask: fix LLMService type 2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
922293ae76 Spawn the main task before setup so attach happens uniformly
`PipelineRunner.run(task)` now calls `spawn(task)` first (which
runs `task.attach()`) and lets `_setup_session` start every
registered entry — main and pre-spawned — through the same path,
instead of relying on `spawn`'s post-running fast-path to start
the main task after setup. The two-branch wait stays for the
`task is None` case but reads the runner_task directly off the
freshly-spawned entry.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
eb4f0ac1ae Add changelog for #4493 2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
7506af5861 Replace set_registry with attach(*, registry, bus) on BaseTask
`BaseTask` no longer takes `bus=` in its constructor. Instead
the runner now hands both the registry and the bus to a task via
`task.attach(registry=..., bus=...)` (called from
`PipelineRunner.spawn()`), and `bus` / `registry` are
properties that raise if accessed before attach. `PipelineTask`,
`LLMTask`, and `LLMContextTask` lose their `bus=` parameters
to match, and `_BusEdgeProcessor` now stores only a task
reference and reads `task.bus` lazily so bridged pipelines work
even though the bus isn't known at construction time.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
ef806163b2 Tighten the pipeline_task contract for processors and tools
`FrameProcessorSetup.pipeline_task` is now mandatory and
`FrameProcessor.pipeline_task` raises if accessed before setup
instead of returning `None`. `FunctionCallParams` gains a
required `pipeline_task` field and `LLMService._run_function_call`
populates it (plus reads `app_resources` directly off the
pipeline task). Tests that build a processor or
`FunctionCallParams` outside a real pipeline stub it with a
`SimpleNamespace`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
7d28c46a5d Add tasks package with LLMTask, LLMContextTask, and proxy stubs
Adds `pipecat.tasks.llm` with `LLMTask` (LLM pipeline + `@tool`
collection + tool-call deferral via `PipelineFlushFrame`),
`LLMContextTask` (LLM + `LLMContextAggregatorPair`), and the
`@tool` decorator. Also includes `pipecat.tasks.proxy.websocket`
client/server stubs that need a follow-up port to the new
`BaseTask` lifecycle.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
befaa9ff27 Rewrite PipelineRunner around bus + spawn
`PipelineRunner` now owns the shared `TaskBus` and
`TaskRegistry` and runs all tasks (the main one plus any
spawned ones) through a unified `_start_task` / `_run_task`
background-task path. Adds `spawn(task)` for fire-and-forget
task registration, threads `end()` / `cancel()` through
`BusEndTaskMessage` / `BusCancelTaskMessage` to all root
tasks, and broadcasts/handles `BusTaskRegistryMessage` for
remote-runner discovery. The runner now wires its own
`TaskManager` via `super().setup(...)` so internal
`create_task` calls go through `BaseObject`.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
b5c757ab85 Make PipelineTask inherit BaseTask and support bridged pipelines
`PipelineTask` now extends `BaseTask` so every pipeline task is
also a bus participant. Adds optional `bus`, `bridged`, and
`exclude_frames` parameters: when `bridged` is set, the user's
pipeline is wrapped with `_BusEdgeProcessor` source/sink edges so
frames are mirrored onto the bus. Bridges pipeline lifecycle
events to `start()`/`stop()`, overrides `_handle_task_end` /
`_handle_task_cancel` to drive the pipeline shutdown, subscribes
to the bus in setup, and exposes the `bridged` property to the
registry. Moves `PipelineTaskParams` here and updates the
matching test import.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
6a738bd3a0 Replace BasePipelineTask with BaseTask
Drops the old abstract `BasePipelineTask` and replaces it with
`BaseTask` — the common base for any runtime task. `BaseTask`
subscribes to a `TaskBus`, participates in the shared
`TaskRegistry`, handles activation / deactivation, end / cancel,
and the full `@job` RPC surface (request_job, job, job_group,
send_job_response / update / stream_*, etc.). It ships a default
`run()` for bus-only tasks; subclasses with their own runtime
(e.g. `PipelineTask`) override it.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
c0b2a8c572 Add job context and decorators
Adds `JobContext` / `JobGroupContext` async context managers,
the `JobGroup` / `JobGroupEvent` / `JobGroupResponse` /
`JobGroupError` types, the `@job` decorator (with collector),
and the `@task_ready` decorator (with collector). These power
the bus-driven job RPC between tasks.
2026-05-21 10:12:51 -07:00
Jon Taylor
7e2055b7d0 Add PgmqBus for distributed agents
Adds ``pipecat.bus.network.pgmq.PgmqBus``, a PGMQ-backed
:class:`TaskBus` adapter that implements pub/sub fan-out over
PGMQ's point-to-point queue semantics. Each bus instance owns its
own queue, broadcasts on publish to peers discovered by channel
prefix, and long-polls its queue to dispatch received messages
to local subscribers.

Requires the optional ``pgmq`` extra
(``pip install pipecat-ai[pgmq]``).
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
5d94506265 Add task bus package
Introduces `TaskBus`, the in-process `AsyncQueueBus`, the bus
message hierarchy (lifecycle, jobs, frames, registry), a
priority-aware bus queue, the `BusSubscriber` mixin, and the
`BusBridgeProcessor` / internal `_BusEdgeProcessor` used to
exchange frames between a local pipeline and the bus.
2026-05-21 10:12:51 -07:00
Aleix Conchillo Flaqué
30df8e4ca5 Add task registry package
Introduces `TaskRegistry` and the supporting `TaskReadyData`,
`TaskErrorData`, and `TaskRegistryEntry` dataclasses used to track
local and remote tasks discovered through the bus.
2026-05-21 10:12:51 -07:00
Mark Backman
780c004168 Merge pull request #4423 from joycech333/feat/inception-llm-service
feat: add Inception LLM service with Mercury 2 support
2026-05-21 12:02:27 -04:00
Mark Backman
28f9203401 Code review fixes 2026-05-21 11:45:17 -04:00
joycech333
77cc314a08 feat: add Inception LLM service with Mercury-2 support
Adds InceptionLLMService, an OpenAI-compatible service for Inception's
Mercury-2 diffusion-based reasoning model. Supports reasoning_effort
(instant/low/medium/high) and realtime mode for reduced TTFT.
2026-05-21 11:23:23 -04:00
Mark Backman
4a8d1d0b5e Merge pull request #4532 from pipecat-ai/mb/cleanup-logging-after-smart-text-handling
Clean up smart text logging
2026-05-21 08:35:46 -04:00
Mark Backman
87f5d60693 Merge pull request #4531 from pipecat-ai/mb/pipecat-prebuilt-1.0.1
chore: bump pipecat-ai-prebuilt to 1.0.1
2026-05-21 08:35:31 -04:00
Mark Backman
c699b31daa Merge pull request #4534 from pipecat-ai/mb/changelog-4521
Add changelog for #4521
2026-05-21 08:35:15 -04:00
Mark Backman
ee674ffb01 Add changelog for #4521 2026-05-20 17:57:43 -04:00
mihafabcic-soniox
86a5710801 Add max_endpoint_delay_ms and clean up Sonoix STT settings (#4521) 2026-05-20 17:54:48 -04:00
Mark Backman
4a96b2a9e6 Clean up smart text logging 2026-05-20 15:38:59 -04:00
Mark Backman
105d6f27da Merge pull request #4514 from pipecat-ai/mb/websocket-stt-service-exception-handling
Align websocket STT connection failures
2026-05-20 15:15:35 -04:00
Filipi da Silva Fuchter
e0e3cd336a Merge pull request #4529 from pipecat-ai/filipi/squash_skill
New skill to squash commits.
2026-05-20 16:06:23 -03:00
Mark Backman
9586db5b50 Preserve websocket reconnect failure retries 2026-05-20 14:45:29 -04:00
Mark Backman
a890ab7b21 Add changelog for PR #4531 2026-05-20 12:18:03 -04:00
Mark Backman
c1bf7dbb4a chore: bump pipecat-ai-prebuilt to 1.0.1 2026-05-20 12:15:09 -04:00
Mark Backman
709a0ce839 Merge pull request #4527 from pipecat-ai/mb/fix-elevenlabs-keepalive-1008
Fix ElevenLabs keepalive racing context-init (1008 disconnects)
2026-05-20 11:21:17 -04:00
Mark Backman
be93350eae Merge pull request #4522 from pipecat-ai/mb/stt-latency-smallest
Add P99 latency for Smallest AI, Mistral, XAI STT
2026-05-20 11:21:00 -04:00
Mark Backman
4a96ab7073 Merge pull request #4524 from pipecat-ai/mb/fix-runner-imports
Improve runner optional transport handling
2026-05-20 11:16:16 -04:00
filipi87
c321f50e76 New skill to squash commits. 2026-05-20 10:29:03 -03:00
Filipi da Silva Fuchter
bca337f97e Merge pull request #4380 from pipecat-ai/filipi/smart_text
Smart Text Handling
2026-05-20 10:18:30 -03:00
filipi87
5d9e8c5ac5 Removing debug log. 2026-05-20 10:13:46 -03:00
Mark Backman
70773bce0a Add changelog for PR #4527 2026-05-20 09:08:47 -04:00
filipi87
8bdb49bd1a chore: add changelogs for word-timestamp and frame-ordering fixes 2026-05-20 10:03:30 -03:00
filipi87
81bb81c1d0 test: add automated tests for word tracking, frame sequencing, and Cartesia TTS
Adds tests for AggregatedFrameSequencer, WordCompletionTracker, and
word_timestamp_utils (including CJK language scenarios). Updates existing
Cartesia TTS and TTS frame ordering tests to cover the new behaviours.
2026-05-20 10:03:26 -03:00
filipi87
e1bdee598c fix: preserve raw_text through TTS pipeline for correct LLM context attribution
TTSTextFrame entries were losing their original text structure when word
timestamps were enabled. AggregatedTextFrame now carries a raw_text field with
the original LLM-produced text (including pattern delimiters such as
<card>...</card>). The assistant context receives properly-tagged content
rather than the cleaned words returned by the TTS provider. Also handles words
that straddle two sentence boundaries by splitting and attributing each part
to its correct source frame.
2026-05-20 10:03:21 -03:00
filipi87
185a89bb3b fix: strip Cartesia SSML tags from word timestamp entries
SSML markup (e.g. <spell>, <emotion>, <break>) was leaking into word entries
returned by the Cartesia word-timestamps API. Tags are now stripped before
processing so word-to-text attribution remains accurate when SSML is present
in the TTS input.
2026-05-20 10:03:15 -03:00
filipi87
6b9deefbe3 fix: preserve frame insertion order in BaseOutputTransport for equal PTS values
Frames sharing the same presentation timestamp were being reordered by the
priority queue. Adds a monotonic counter as a tiebreaker so frames with equal
PTS are always emitted in insertion order, preventing subtle audio/text
sequencing bugs.
2026-05-20 10:03:08 -03:00
filipi87
deefc32faf fix: hold skipped TTS frames in position until preceding spoken frames complete
Skipped frames (e.g. code blocks filtered via skip_aggregator_types) were
emitted to the assistant context immediately instead of waiting for preceding
spoken frames to finish. Introduces AggregatedFrameSequencer to hold each
frame's slot and flush only after all earlier spoken sentences are complete,
keeping context ordering correct.
2026-05-20 10:03:03 -03:00
Mark Backman
a5e6886b80 Fix ElevenLabs keepalive racing context-init (1008 disconnects)
The keepalive could fire for a new turn's context before that context's
voice_settings context-init was sent, making the keepalive the context's
first message (no voice_settings) and causing ElevenLabs to reject the
later init with a 1008 policy violation. The keepalive now only targets a
context once its context-init has been sent (tracked in _context_init_sent).
2026-05-20 08:59:01 -04:00
Mark Backman
d11a4ba0cd Use shared telephony route availability checks 2026-05-20 08:57:48 -04:00
Mark Backman
38407e091d Add p99 values for Mistral and XAI 2026-05-19 22:51:33 -04:00
Mark Backman
82cd931efa Merge pull request #4306 from YFortin/fix/azure-tts-last-word-race
fix(azure-tts): Route completion through word boundary queue to prevent last word from being missed
2026-05-19 22:27:50 -04:00
Mark Backman
33e5d1f89b Add changelog for PR #4522 2026-05-19 18:33:58 -04:00
Mark Backman
861dd23873 Add changelog for runner updates 2026-05-19 17:31:07 -04:00
Mark Backman
b825dd779e Clarify runner startup banner 2026-05-19 17:31:07 -04:00
Mark Backman
1487da53a9 Improve runner optional transport handling 2026-05-19 17:03:16 -04:00
Mark Backman
aff84a5d9e Add P99 latency for Smallest AI STT 2026-05-19 11:05:15 -04:00
Mark Backman
c09f6d5adb Merge pull request #4052 from Vonage/vonage_video_connector_transport
Vonage WebRTC Transport Integration
2026-05-19 10:56:20 -04:00
asilvestre
e2d249e5d9 adding uv.lock 2026-05-19 16:33:38 +02:00
asilvestre
956b39b0dc remove extraenous await in cleanup 2026-05-19 16:33:04 +02:00
Mark Backman
e298491068 Add changelog for websocket STT failure handling 2026-05-18 12:41:56 -04:00
Mark Backman
97b00042df Align websocket STT connection failures 2026-05-18 12:35:01 -04:00
asilvestre
bc769eaa82 Changing the example to use OpenAI 2026-05-18 14:40:56 +02:00
asilvestre
ee5aa4dc71 SubscribeSettings to be pydantic and comment fixes 2026-05-18 14:40:56 +02:00
asilvestre
dd38fbc735 add documentation entry 2026-05-18 14:40:56 +02:00
asilvestre
a1c40df471 add documentation entry 2026-05-18 14:40:56 +02:00
asilvestre
c4ff9300c9 fix linting and typechecking 2026-05-18 14:40:56 +02:00
asilvestre
cab4585cbb added changelog 2026-05-18 14:40:56 +02:00
Antoni Silvestre
18368d047e Linting and changes to adapt to v1.0 2026-05-18 14:40:56 +02:00
asilvestre
e3abb4b6d7 apply suggestions in PR 2026-05-18 14:40:56 +02:00
Antoni Silvestre
0fd971d59d Update src/pipecat/runner/types.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-18 14:40:56 +02:00
asilvestre
c61672194d Vonage Video Connector Transport 2026-05-18 14:40:49 +02:00
Filipi da Silva Fuchter
c51a817efa Merge pull request #4442 from pipecat-ai/filipi/runner_all_transports
Unified start route to make all transports available
2026-05-18 09:27:44 -03:00
Bismeet singh
d85eda6da8 Merge pull request #4507 from BismeetSingh/fix/elevenlabs-stt-service-crash-language
Fix/elevenlabs stt service crash language
2026-05-17 10:17:07 -04:00
Aleix Conchillo Flaqué
71feb42711 Merge pull request #4503 from pipecat-ai/changelog-1.2.1
Release 1.2.1 - Changelog Update
2026-05-15 15:19:55 -07:00
aconchillo
6b93ca0cb6 Update changelog for version 1.2.1 2026-05-15 22:18:46 +00:00
Aleix Conchillo Flaqué
b6ecce754b Merge pull request #4501 from pipecat-ai/aleix/fix-filter-incomplete-tool-calls
Fix filter-incomplete + function-calling deadlock
2026-05-15 15:11:45 -07:00
Aleix Conchillo Flaqué
d39e6bf921 Add changelog for #4501 2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
63064860ef Move OpenAITTSService instructions into Settings in the example
Mirrors the deprecation in ``OpenAITTSService.__init__``: ``instructions``
is now a Settings field. The constructor still accepts it for backward
compatibility but the canonical path is through ``Settings``.
2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
f5158d51e7 Add filter-incomplete + function-calling turn-management example
A copy of ``turn-management-filter-incomplete-turns.py`` extended with
a ``get_weather(location)`` direct function. Exercises the path where
the LLM responds to a complete user turn by calling a tool — used to
reproduce (and now verify the fix for) the ``_user_speaking`` gating
bug between filter-incomplete and function calls.
2026-05-15 14:54:51 -07:00
Aleix Conchillo Flaqué
94dbd2fa68 Broadcast UserTurnInferenceCompletedFrame on tool calls in filter-incomplete
With ``filter_incomplete_user_turns`` enabled, an LLM that responded to
a user turn by calling a tool (without first emitting a ✓ marker)
never finalized the user turn. ``UserStoppedSpeakingFrame`` stayed
deferred, the assistant aggregator kept ``_user_speaking=True``, and
when ``FunctionCallResultFrame`` arrived its ``not self._user_speaking``
gate dropped the context push — the LLM continuation never ran and
the call hung silently.

Broadcast ``UserTurnInferenceCompletedFrame`` on
``FunctionCallsStartedFrame`` (i.e. the moment the LLM commits to a
tool call, before the function dispatches), gated by a new
``_turn_completion_broadcasted`` flag so the ✓ path and the tool-call
path don't both fire. The flag resets in ``_turn_reset`` alongside
the other per-turn state.

Emitting on the start frame rather than ``LLMFullResponseEndFrame``
also shrinks the race window — ``UserStoppedSpeakingFrame`` (a
``SystemFrame``) has the maximum possible head start over the
``FunctionCallResultFrame`` (``DataFrame``) that follows.
2026-05-15 14:50:35 -07:00
Mark Backman
c6ea6c6522 Merge pull request #4500 from pipecat-ai/mb/update-gradium-endpoints
Update Gradium STT/TTS endpoints to region-neutral URLs
2026-05-15 15:59:14 -04:00
Mark Backman
58a22aeeb1 Add changelog for #4500 2026-05-15 15:19:39 -04:00
Mark Backman
5403aa56e4 Remove Gradium endpoint overrides from voice example
Drop the explicit US-region URLs so the example picks up the new
region-neutral defaults in GradiumSTTService and GradiumTTSService.
2026-05-15 15:17:12 -04:00
Mark Backman
0e0d76d020 Update Gradium endpoints to region-neutral URLs
Drop the EU-region default from the STT/TTS WebSocket URLs in favor of
the generic api.gradium.ai endpoint, and remove the explicit overrides
from the examples so they pick up the new defaults.
2026-05-15 15:02:05 -04:00
filipi87
b493ed8d3a Removing the websocket transport from elevenlabs example. 2026-05-15 10:11:38 -03:00
filipi87
c3338667b1 Mounting the prebuilt frontend UI and root redirect for all transports. 2026-05-15 10:06:47 -03:00
Aleix Conchillo Flaqué
ea296babe9 Merge pull request #4498 from pipecat-ai/changelog-1.2.0
Release 1.2.0 - Changelog Update
2026-05-14 14:47:47 -07:00
aconchillo
b13af2b053 Update changelog for version 1.2.0 2026-05-14 21:45:36 +00:00
Aleix Conchillo Flaqué
7b6d878f07 update uv.lock 2026-05-14 14:41:38 -07:00
Aleix Conchillo Flaqué
8e405f15aa changelog: fix 4446.change.md file name 2026-05-14 14:38:54 -07:00
Aleix Conchillo Flaqué
44a40e8eb2 Merge pull request #4497 from pipecat-ai/aleix/fix-tts-context-id-fallback
Fall back to _turn_context_id in get_active_audio_context_id
2026-05-14 13:34:34 -07:00
Aleix Conchillo Flaqué
ea97cb1a78 Add changelog for #4497 2026-05-14 13:22:50 -07:00
Aleix Conchillo Flaqué
22650b1b56 Move QwenLLMService model into Settings in the qwen example
Mirrors the deprecation in ``QwenLLMService.__init__``: ``model`` should
be passed via ``settings=QwenLLMService.Settings(model=...)`` instead of
as a direct constructor arg.
2026-05-14 13:22:07 -07:00
Aleix Conchillo Flaqué
b76831e677 Fall back to _turn_context_id in get_active_audio_context_id
TTS services whose wire protocol does not echo the context_id back on
incoming audio (Sarvam, Smallest, Soniox, Inworld, ...) call
``get_active_audio_context_id()`` to tag each chunk. That accessor
returned only ``_playing_context_id`` — the playback-side cursor set
asynchronously by ``_audio_context_task_handler`` when it pops a context
off the serialization queue.

Result: incoming audio that arrived in the gap between contexts or at
the very start of a turn (before the playback loop popped) had
``context_id=None`` and was dropped with
``unable to append audio to context: no context ID provided``.

Fall back to ``_turn_context_id`` (the synthesis-side cursor, set as
soon as the turn's context is created) so the gap is covered without
prematurely nulling the playback cursor.
2026-05-14 13:22:00 -07:00
Mark Backman
b57111743f Merge pull request #4495 from pipecat-ai/mb/soniox-stt-lang-counter 2026-05-14 15:57:31 -04:00
Mark Backman
dcbb0070c9 Add changelog for Soniox language selection 2026-05-14 15:42:43 -04:00
Mark Backman
73278d3309 Use majority language for Soniox transcripts 2026-05-14 15:18:43 -04:00
filipi87
c8efe319b3 Adding the changelog for the changes. 2026-05-14 11:10:33 -03:00
Mark Backman
49bda11ae8 Merge pull request #4482 from pipecat-ai/mb/soniox-stt-token-language
Propagate Soniox token language
2026-05-13 16:28:56 -04:00
Aleix Conchillo Flaqué
07640582ce Merge pull request #4467 from pipecat-ai/aleix/fix-tts-ttfb-tracing
Fix metrics.ttfb and partial output on TTS/STT/LLM OpenTelemetry spans
2026-05-13 13:10:52 -07:00
Mark Backman
078af6969a Merge pull request #4473 from timofey-TK/inworld-tts-v2
Add support for Inworld TTS v2 fields
2026-05-13 15:32:16 -04:00
Mark Backman
9f40ba21c2 Add changelog for Soniox language fix 2026-05-13 15:26:10 -04:00
Mark Backman
82f0896d6a Propagate Soniox token language 2026-05-13 15:23:22 -04:00
kompfner
7e4cd23de4 Merge pull request #4474 from pipecat-ai/pk/inworld-realtime-tools
Extend cancel_on_interruption=False to Inworld Realtime (best-effort + warning)
2026-05-13 15:12:34 -04:00
TimTk
97f50c8aa2 Address review: use resolve_language, narrow delivery_mode type, update changelog
- Replace custom LANGUAGE_MAP fallback in language_to_inworld_language with
  resolve_language(language, LANGUAGE_MAP, use_base_code=False) to match the
  pattern used by other services and restore the unverified-language warning
- Tighten delivery_mode type from str to Literal["STABLE", "BALANCED", "CREATIVE"]
- Update changelog entry to mention delivery_mode and language normalization
2026-05-13 21:43:02 +03:00
Mark Backman
08680732f6 Merge pull request #4475 from pipecat-ai/mb/cartesia-korean-fix
Fix Cartesia CJK timestamp spacing
2026-05-13 13:20:42 -04:00
Mark Backman
064b68aa01 Fix Cartesia CJK timestamp spacing 2026-05-13 13:13:40 -04:00
Filipi da Silva Fuchter
b0f8ea7e28 Merge pull request #4477 from pipecat-ai/filipi/nvidia_sagemaker_follow_up
NVidia TTS Sagemaker: Buffering audio to avoid glitches.
2026-05-13 14:06:44 -03:00
filipi87
ad50c8d5d5 Buffering audio to avoid glitches. 2026-05-13 14:01:03 -03:00
Mark Backman
5fef239b68 Merge pull request #4450 from pipecat-ai/mb/gpt-realtime-whisper
Default OpenAI Realtime transcription to gpt-realtime-whisper
2026-05-13 09:48:33 -04:00
Filipi da Silva Fuchter
9148e307cc Merge pull request #4464 from pipecat-ai/filipi/nvidia_sagemaker
NVidia sagemaker - TTS and STT services
2026-05-13 07:53:26 -03:00
Filipi da Silva Fuchter
703d23b658 Update examples/voice/voice-nvidia-sagemaker.py
Co-authored-by: Mark Backman <mark@daily.co>
2026-05-13 06:36:57 -04:00
Filipi da Silva Fuchter
227ba288da Update examples/voice/voice-nvidia-sagemaker.py
Co-authored-by: Mark Backman <mark@daily.co>
2026-05-13 06:36:45 -04:00
Timofey
39e7f9e354 Fix Inworld TTS v2 request fields 2026-05-13 11:17:31 +03:00
Aleix Conchillo Flaqué
7cc7968abb Fix pyright errors in service_decorators.py 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
52d8008783 Add LLM interruption changelog entry for #4467 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
a3ce963b54 Capture partial LLM output on interruption
traced_llm only attached the aggregated ``output`` attribute to the
span after the wrapped function returned successfully. When the LLM
call was cancelled mid-stream (e.g. interruption during generation),
the accumulated text was discarded — the span had no ``output``.

Moved the attribute assignment into the ``finally`` block alongside
the existing TTFB write so the partial text we already captured via
the patched ``push_frame`` lands on the span regardless of whether
``f`` returned normally, raised, or was cancelled.
2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
e70ee603b2 Add STT changelog entry for #4467 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
111e59a7b1 Apply the same span-scope fix to traced_stt
@traced_stt had the same root issue as @traced_tts: the span lifetime
was tied to a per-transcript handler call, which doesn't match the
operation we want to trace. Now uses the __set_name__ pattern to
install:

- A push_frame wrapper that drives one STT span per finalized
  TranscriptionFrame. The span is anchored at speech start
  (VADUserStartedSpeakingFrame.timestamp - start_secs) but lazy-opened
  on the first TranscriptionFrame. Opening earlier (on VAD or
  UserStartedSpeakingFrame) races with TurnTraceObserver._handle_turn_started,
  which runs as a background task via _call_event_handler (sync=False),
  so the span would end up parented to the previous turn. Deferring
  the open to the first TranscriptionFrame avoids that race because
  STT only emits transcripts well after the turn observer has set
  the current turn's context.

- A stop_ttfb_metrics wrapper that closes the span on the TTFB-timeout
  path (called with end_time != None from stt_service.py:566). The
  span is marked stt.timed_out=True and its end_time is pinned to
  the timeout's end_time (= _last_transcript_time) so the duration
  reflects when STT actually stopped responding, not when the timeout
  fired.

Span lifecycle:
- Open: lazy on first TranscriptionFrame of a segment.
- Close (success): finalized=True attaches metrics.ttfb and closes
  the span. Multiple finalized transcripts in a single turn produce
  multiple spans.
- Close (timeout): stop_ttfb_metrics(end_time=...) closes with
  stt.timed_out=True.
- Close (orphan): UserStoppedSpeakingFrame closes any still-open
  span with stt.incomplete=True (covers turns where no finalized
  transcript and no timeout fired).

No changes required outside service_decorators.py — stt_service.py
and every per-service file are untouched.
2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
079282d140 Add changelog for #4467 2026-05-12 20:10:43 -07:00
Aleix Conchillo Flaqué
0ccdd808e6 Fix traced_tts so metrics.ttfb reflects the real TTFB
Previously @traced_tts scoped the span to the lifetime of run_tts(). For
streaming TTS services run_tts() returns as soon as the synthesis request
is sent, long before audio chunks arrive, so:

- The span duration measured the WebSocket-send time, not synthesis time.
- The first synthesis recorded the WS-send duration as metrics.ttfb (via
  the in-progress fallback in FrameProcessorMetrics.ttfb).
- Subsequent syntheses recorded the previous call's TTFB on the current
  span (off-by-one).

The decorator now uses a __set_name__ descriptor to wrap the owning
class's setup() at class definition time. setup() installs per-instance
patches on create_audio_context, append_to_audio_context,
remove_audio_context, on_audio_context_completed, and
reset_active_audio_context. These patches own the span lifetime:

- create_audio_context: open span, set baseline attributes.
- append_to_audio_context: record metrics.ttfb on the first
  TTSAudioRawFrame (when stop_ttfb_metrics has produced a real value),
  end span on appended TTSStoppedFrame.
- on_audio_context_completed: end span on natural completion (handles
  services that auto-push TTSStoppedFrame via push_frame, bypassing
  append_to_audio_context).
- remove_audio_context: safety net for explicit removal paths.
- reset_active_audio_context: interruption hook (always reached from
  _handle_interruption); marks the span tts.interrupted=true only when
  nothing else has closed it.

The run_tts wrapper now only attaches per-call attributes (text,
metrics.character_count) to the already-open span. No changes required
in tts_service.py or in any of the per-service files.
2026-05-12 20:10:43 -07:00
Mark Backman
3e8c5c08f4 Clarify realtime settings update condition 2026-05-12 17:48:53 -04:00
Mark Backman
644030584f Centralize OpenAI audio constants 2026-05-12 17:48:53 -04:00
filipi87
0740021ff4 Removing changelog for sanitize_text_for_tts 2026-05-12 18:29:35 -03:00
filipi87
68f265fa62 Fixing ruff format. 2026-05-12 18:28:14 -03:00
filipi87
b9f052079d Removing sanitize_text_for_tts 2026-05-12 18:22:15 -03:00
filipi87
130bb7371c Removing sanitize_text_for_tts 2026-05-12 18:21:47 -03:00
filipi87
5d61763987 Refactoring how we are reconnecting the STT. 2026-05-12 18:20:19 -03:00
filipi87
7984556692 Fixing typecheck. 2026-05-12 18:00:07 -03:00
filipi87
bea9e4b3ba New example voice-nvidia-sagemaker.py 2026-05-12 17:44:11 -03:00
Mark Backman
19df443500 Merge pull request #4471 from pipecat-ai/mb/fix-gstreamer-pyright-import 2026-05-12 16:34:48 -04:00
Mark Backman
07f241143b Merge pull request #4469 from pipecat-ai/mb/remove-vad-analyzer-runner-utils-docstring 2026-05-12 16:34:27 -04:00
Mark Backman
2fdb9bbf42 Merge pull request #4462 from pipecat-ai/mb/cartesia-sonic-3.5 2026-05-12 16:34:04 -04:00
filipi87
0146947b68 Addressing the comments left in the PR review. 2026-05-12 17:12:19 -03:00
Paul Kompfner
863a1bf177 Add changelog for #4474 2026-05-12 16:04:12 -04:00
Paul Kompfner
58333b2705 Extend cancel_on_interruption=False to InworldRealtimeLLMService (best-effort)
Same async-tool routing approach as #4441: detect async-tool messages in
the LLM context, deliver the final result via the formal tool-result
channel.

Caveat: as of this writing, Inworld Realtime doesn't appear to handle
the resulting delayed tool result reliably, so the routing is
best-effort and the service emits a one-time warning when async-tool
messages are seen. Streamed intermediate results remain unsupported.

Also adds function calling to the realtime-inworld.py example, and
softens the Inworld mention in the #4447 changelog now that the
exclusion is being closed.
2026-05-12 16:03:34 -04:00
TimTk
ecaff1d1eb Fix changelog fragment number 2026-05-12 22:21:59 +03:00
Mark Backman
e2bfa6352f Add changelog for #4450 2026-05-12 15:20:57 -04:00
Mark Backman
abd28e2ac1 Update OpenAI realtime transcription default 2026-05-12 15:20:57 -04:00
kompfner
88deebbf5f Merge pull request #4472 from pipecat-ai/pk/default-gpt-realtime-2
Switch OpenAIRealtimeLLMService default model to gpt-realtime-2
2026-05-12 15:17:12 -04:00
TimTk
9b55d4ddd4 Add support for Inworld TTS v2 fields 2026-05-12 22:13:09 +03:00
filipi87
c2bdc1aada Fixing metrics and adding extra guard after sanitization. 2026-05-12 16:11:01 -03:00
Paul Kompfner
fc0589e8f1 Switch OpenAIRealtimeLLMService default model to gpt-realtime-2 2026-05-12 14:57:59 -04:00
kompfner
67f8d34e9f Merge pull request #4470 from pipecat-ai/pk/gpt-realtime-2-reasoning-effort
Add reasoning support to OpenAIRealtimeLLMService for gpt-realtime-2
2026-05-12 14:43:39 -04:00
kompfner
d3b8710720 Merge pull request #4465 from pipecat-ai/pk/gpt-realtime-2
Handle gpt-realtime-2 multi-output-item audio responses
2026-05-12 14:30:15 -04:00
Mark Backman
86e2aa85d3 Fix GStreamer pipeline source pyright import 2026-05-12 14:16:36 -04:00
Paul Kompfner
b89500256d Drop debug logging added while investigating multi-output-item audio 2026-05-12 14:05:16 -04:00
Paul Kompfner
a52bdef32b Add reasoning support to OpenAIRealtimeLLMService for gpt-realtime-2 2026-05-12 13:55:19 -04:00
Mark Backman
afd9fc5fdf Remove vad_analyzer from create_transport docstring example 2026-05-12 13:50:17 -04:00
filipi87
7f98dba925 Changelog files for the new nvidia features. 2026-05-12 14:43:12 -03:00
filipi87
6a27ed35b1 Fixing the Bidi client to accept None. 2026-05-12 12:19:30 -03:00
filipi87
a34864d643 Fixed ruff, pyright, and test_service_init failures 2026-05-12 11:39:52 -03:00
Paul Kompfner
007fa3a3a8 Handle gpt-realtime-2 multi-output-item audio responses
A single Realtime API response can now contain more than one audio item
(observed with gpt-realtime-2), and the first item's audio.done can
arrive after deltas from the second have started arriving. Deltas still
arrive strictly in playback order across items, so we keep forwarding
them as received — matching OpenAI's reference implementation.

Adjusted OpenAIRealtimeLLMService so a multi-item response is treated as
one continuous TTS turn:

- _handle_evt_audio_delta: on item switch, advance the tracked item in
  place (reset total_size) without emitting another TTSStartedFrame.
  Truncation now always targets the latest item.
- _handle_evt_audio_done: debug-trace only; no longer pushes
  TTSStoppedFrame.
- _handle_evt_response_done: pushes a single TTSStoppedFrame per turn,
  bookending the audio with the Started pushed on the first delta.

Added tests covering single-item, overlapping multi-item, non-overlapping
multi-item, and interrupt-during-multi-item (last-item-wins truncation).
2026-05-12 10:34:50 -04:00
filipi87
5dd7413c00 Nvidia Sagemaker Nemotron ASR STT service 2026-05-12 11:16:00 -03:00
filipi87
8e0a338d96 Nvidia Sagemaker Magpie TTS service 2026-05-12 11:15:42 -03:00
filipi87
d6655e7a5e Fixing ruff format. 2026-05-12 10:40:09 -03:00
filipi87
33b73df6ec Changing the websocket route to return the same data as PCC. 2026-05-12 10:38:15 -03:00
Mark Backman
d65aee9181 Add changelog for #4462 2026-05-11 17:34:00 -04:00
Mark Backman
1755016679 Update default Cartesia TTS model to sonic-3.5 2026-05-11 17:33:40 -04:00
Mark Backman
b7f6298601 Merge pull request #4461 from pipecat-ai/mb/security-vuln-2025-05-11
Update uv.lock for urllib3 and langchain-core
2026-05-11 15:58:05 -04:00
Mark Backman
396873ac7e Merge pull request #4460 from pipecat-ai/mb/codex-skills
Add Codex skills and AGENTS.md
2026-05-11 15:57:49 -04:00
Mark Backman
5b33964a1b Update uv.lock for urllib3 and langchain-core 2026-05-11 15:51:01 -04:00
Mark Backman
8b37cd1d3a Add agent-neutral repository instructions 2026-05-11 15:43:43 -04:00
Mark Backman
7a2b667fa1 Add Codex skill symlinks 2026-05-11 15:27:49 -04:00
Mark Backman
ee8c607315 Merge pull request #4452 from pipecat-ai/mb/cleanup-frontmatter
Add cleanup skill frontmatter
2026-05-11 09:33:44 -04:00
Aleix Conchillo Flaqué
71578e7151 Merge pull request #4449 from pipecat-ai/aleix/base-object-task-manager
Move create_task and cancel_task from FrameProcessor to BaseObject
2026-05-10 20:36:54 -07:00
Aleix Conchillo Flaqué
77058b01c4 Add changelog for #4449 2026-05-10 20:34:52 -07:00
Aleix Conchillo Flaqué
4f85e7c089 Fix pyright cr_code access on Coroutine in BaseObject.create_task
`collections.abc.Coroutine` doesn't expose `cr_code`/`co_name`; only
native coroutine objects do. Use `getattr` chains so pyright is happy
and any non-native awaitable falls back to a generic task name instead
of crashing.
2026-05-10 20:34:52 -07:00
Aleix Conchillo Flaqué
15531c8112 Wire TaskObserver via setup() instead of constructor
TaskObserver previously took a TaskManager in __init__ and reached into
it directly. Since BaseObject now provides task_manager / create_task /
cancel_task, drop the constructor argument and call
`observer.setup(task_manager)` from PipelineTask._setup() before
starting it.
2026-05-10 20:34:52 -07:00
Mark Backman
b9e8f13105 Add cleanup skill frontmatter 2026-05-09 12:30:20 -07:00
Aleix Conchillo Flaqué
784667bad2 Use inherited create_task/cancel_task in PipelineTask
PipelineTask owns its TaskManager but is itself a BaseObject, so it
inherits create_task/cancel_task. Replace the explicit
self._task_manager.create_task(coro, f"{self}::name") call sites with
self.create_task(coro, "name") for consistency with other BaseObject
subclasses.
2026-05-08 15:03:44 -07:00
Aleix Conchillo Flaqué
33db71ec32 Call super().setup() in PipelineTask to honor BaseObject contract
PipelineTask owns its TaskManager (still constructed in __init__ since
TaskObserver needs it eagerly). Adding the explicit
`await super().setup(self._task_manager)` in `_setup()` formalizes the
BaseObject lifecycle so any future wiring added to BaseObject.setup is
picked up automatically.
2026-05-08 15:03:44 -07:00
Aleix Conchillo Flaqué
dc035df0aa Use inherited create_task/cancel_task in PipelineTask
PipelineTask owns its TaskManager but is itself a BaseObject, so it
inherits create_task/cancel_task. Replace the explicit
self._task_manager.create_task(coro, f"{self}::name") call sites with
self.create_task(coro, "name") for consistency with other BaseObject
subclasses.
2026-05-08 15:03:44 -07:00
Aleix Conchillo Flaqué
df1b071a13 Move create_task and cancel_task from FrameProcessor to BaseObject
Lift the task manager wiring (`_task_manager`, `task_manager` property,
`create_task`, `cancel_task`, and `setup(task_manager)`) up to
`BaseObject`. Owners propagate the task manager to their child
`BaseObject`s via `await child.setup(task_manager)`, matching the
existing convention.

Removes duplicated `_task_manager` / `task_manager` property / setup
implementations from `FrameProcessor`, `FrameProcessorMetrics`,
`UserIdleController`, `UserTurnController`,
`BaseUserTurnStartStrategy`, and `BaseUserTurnStopStrategy`.
2026-05-08 15:03:44 -07:00
kompfner
95bcebe774 Merge pull request #4448 from pipecat-ai/pk/gemini-live-async-tool-support
feat: support cancel_on_interruption=False on Gemini Live (Gemini 2.x)
2026-05-08 16:57:32 -04:00
Paul Kompfner
5509377344 fix(gemini-live-vertex): disable NON_BLOCKING tools
GeminiLiveVertexLLMService overrides _supports_non_blocking_tools to
return False — Vertex AI's Gemini Live endpoint doesn't yet accept the
NON_BLOCKING behavior field on function declarations or the scheduling
field on FunctionResponse, and sending either breaks tool calling.

Effect: function declarations sent to Vertex no longer carry
NON_BLOCKING; FunctionResponses no longer carry scheduling: WHEN_IDLE.
Users registering a function with cancel_on_interruption=False against
Vertex get the same one-time logger.error + push_error the base class
surfaces on Gemini 3.x.
2026-05-08 16:54:15 -04:00
Paul Kompfner
e21180b962 refactor(gemini-live): use inherited LLMService._function_is_async
The same registry-lookup helper was hoisted to LLMService in #4447, so
drop the local duplicate. Behavior unchanged.
2026-05-08 16:42:54 -04:00
Paul Kompfner
53922819ed refactor: explicit kind=='final' check in async-tool routing (Gemini Live)
Mirrors the same change applied to AWSNovaSonicLLMService and
OpenAIRealtimeLLMService in #4441 / GrokRealtimeLLMService in #4447:
replaces the implicit "final happens last" pattern in
_process_completed_function_calls with an explicit
`if async_payload.kind == "final":` block, plus a trailing defensive
`continue` so async-tool messages with an unrecognized kind don't fall
through to the regular tool-result handling block.
2026-05-08 16:42:54 -04:00
Paul Kompfner
6faeffb884 chore: add changelog entry for cancel_on_interruption=False on Gemini Live 2026-05-08 16:42:54 -04:00
Paul Kompfner
9086a46900 feat(gemini-live): support cancel_on_interruption=False on supported models
Honors cancel_on_interruption=False on Gemini Live for models that support
Gemini's NON_BLOCKING tool mechanism (Gemini 2.x at the time of writing).
Function declarations registered via register_function(...,
cancel_on_interruption=False) are sent with behavior: NON_BLOCKING so the
conversation continues while the tool runs; the matching FunctionResponse
carries scheduling: WHEN_IDLE so the result lands at a graceful pause
rather than mid-sentence. Synchronous (default) tools stay BLOCKING —
applying NON_BLOCKING uniformly produced filler responses like "let me
look that up for you" on regular calls, since the model knew it would
have an opportunity to keep talking while waiting.

A new _supports_non_blocking_tools property gates the flow. On models
that don't support it (currently Gemini 3.x), the service falls back to
plain blocking behavior and surfaces a one-time error + ErrorFrame the
moment async-tool messages first appear in the context, explaining that
the flag's intent is not achievable.

Caveat (Gemini 2.5): an intermittent server-side 1008 "Operation is not
implemented" error can fire when realtime input arrives during a pending
tool call. We auto-reconnect, but the user may need to repeat what they
were saying. The proposed mitigation
(https://discuss.ai.google.dev/t/gemini-live-api-websocket-error-1008-operation-is-not-implemented-or-supported-or-enabled/114644/56)
of gating realtime input during pending tool calls is fundamentally
incompatible with NON_BLOCKING tool calling, so we don't apply it.
2026-05-08 16:42:54 -04:00
Paul Kompfner
1a4a6f4edf refactor(gemini-live): bring tool-result handling in line with the canonical realtime pattern
Lays groundwork for cancel_on_interruption=False support on Gemini Live by
restructuring _process_completed_function_calls to match the shape used by
AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass
forward iteration over raw context messages that detects async-tool
messages via async_tool_messages.parse_message and routes them — started
skipped silently, intermediate logged-as-error and surfaced via push_error,
final delivered via the formal FunctionResponse channel.

Replaces the prior two-pass structure that went through the adapter for
sync results — the service now uses a lightweight self._tool_call_id_to_name
map (populated when the model issues tool calls) for the name lookup the
adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict
static method for the dict-coercion logic that wraps non-dict tool returns
as {value: <result>} for Gemini's FunctionResponse.response field; the
adapter's existing inline copy in _from_standard_message uses it too.

Example consolidation:

- Folds realtime-gemini-live-function-calling.py into the base
  realtime-gemini-live.py example so the base exercises function calling
  out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py).
- Renames realtime-gemini-live-vertex-function-calling.py to
  realtime-gemini-live-vertex.py, mirroring the consolidation.
- Adds realtime-gemini-live-async-tool.py.
- Updates scripts/evals/run-release-evals.py for the renames.

This commit alone doesn't make cancel_on_interruption=False fully work on
Gemini Live — additional investigation is pending. This is foundational
work to be built on.
2026-05-08 16:42:54 -04:00
kompfner
ff80cde44e Merge pull request #4447 from pipecat-ai/pk/realtime-async-tool-support-followup
fix: extend cancel_on_interruption=False regression fix to remaining realtime services
2026-05-08 16:40:32 -04:00
Paul Kompfner
fb74f7714c refactor(ultravox): name async-tool result strings after the kinds they serve
Renames _ASYNC_TOOL_PLACEHOLDER_RESULT to _ASYNC_TOOL_STARTED_RESULT to
match the kind names from async_tool_messages, and lifts the inline
"[Async tool result for tool_call_id=...] {result}" into a sibling
_ASYNC_TOOL_FINAL_RESULT_TEMPLATE constant for the same reason.
2026-05-08 16:35:14 -04:00
Paul Kompfner
4864eddbc7 feat(ultravox): support cancel_on_interruption=False via placeholder + final-as-text
Replaces the prior "log a warning and skip" approach with actual handling
of async-tool messages on Ultravox.

The catch with Ultravox is that its API freezes the conversation between
client_tool_invocation and the matching client_tool_result — there's no
"keep talking while the tool runs" channel like NON_BLOCKING on Gemini
or function_call_output-without-blocking on OpenAI Realtime. So:

- When the model invokes an async-registered function (cancel_on_inter
  ruption=False), the service immediately ships a placeholder
  client_tool_result that tells the model "the actual result isn't
  ready yet; a follow-up will arrive shortly; keep the conversation
  going". This unfreezes the conversation. The placeholder is sent
  from _handle_tool_invocation, since the started async-tool message
  doesn't reach the context-frame path until later.
- When the real tool finishes, the final async-tool message lands in
  the context. _handle_context now forward-iterates and routes
  async-tool messages: started is a no-op (placeholder already sent),
  intermediate is logged-as-error and dropped (matching the other
  realtime services), and final is injected as user-side text via
  user_text_message with bracketed framing — the only mechanism
  Ultravox offers for adding non-tool input mid-conversation.

Hoists the registry-lookup helper to LLMService as
_function_is_async(name) so future services can use the same pattern
without re-implementing it.

Adds an async-tool example file for Ultravox modeled on the existing
ones for the other realtime services.
2026-05-08 16:20:40 -04:00
kompfner
d831930bd0 Merge pull request #4441 from pipecat-ai/pk/realtime-async-tool-support
fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime
2026-05-08 15:53:20 -04:00
Paul Kompfner
2c65713c99 refactor: explicit kind=='final' check in async-tool routing (Grok)
Mirrors the same change applied to AWSNovaSonicLLMService and
OpenAIRealtimeLLMService in #4441: replaces the implicit "final happens
last" pattern in _process_completed_function_calls with an explicit
`if async_payload.kind == "final":` block, plus a trailing defensive
`continue` so async-tool messages with an unrecognized kind don't fall
through to the regular tool-result handling block.
2026-05-08 15:45:05 -04:00
Paul Kompfner
b14a03d01f fix: extend cancel_on_interruption=False regression fix to remaining realtime services
Applies the same async-tool message routing introduced for AWSNovaSonicLLMService
and OpenAIRealtimeLLMService to additional realtime LLM services where the
flag's intent ("keep talking while the tool runs") is achievable:

- GrokRealtimeLLMService (xAI Realtime — also benefits the deprecated Grok
  alias since it re-exports the xAI module)
- AzureRealtimeLLMService picks up the fix transitively by inheriting from
  OpenAIRealtimeLLMService — no code change needed.

GrokRealtimeLLMService's _process_completed_function_calls now matches
the canonical pattern: skip LLMSpecificMessage, detect async-tool messages
via parse_message and route them — started skipped silently, intermediate
logged as an error and surfaced via push_error, final delivered through
the same channel as a synchronous result.

UltravoxRealtimeLLMService instead gets a one-time warning when async-tool
messages appear in the context. The Ultravox API freezes the conversation
during tool execution
(https://docs.ultravox.ai/tools/async-tools#custom-tool-timeouts), so the
flag's "keep talking while the tool runs" intent isn't achievable there —
applying the same code pattern would mislead users into expecting a UX
Ultravox can't deliver. Surfacing a clear warning is the right behavior
until Ultravox grows true async tool support.

Adds async-tool example files for Grok and Azure modeled on the existing
Nova Sonic / OpenAI Realtime ones (10s simulated network delay, weather
tool registered with cancel_on_interruption=False).

Two services remain excluded:

- GeminiLiveLLMService — the async-tool path needs deeper investigation.
- InworldRealtimeLLMService — appears to have a pre-existing problem
  with even simple synchronous tool calling on its Realtime API (the
  request reaches the server fine, but response generation fails with a
  generic server_error).
2026-05-08 15:43:53 -04:00
Paul Kompfner
ad0f0a1294 refactor: explicit kind=='final' check in async-tool routing
Replaces the implicit "final happens last" pattern in
_process_completed_function_calls with an explicit
`if async_payload.kind == "final":` block in both AWSNovaSonicLLMService
and OpenAIRealtimeLLMService. Adds a trailing defensive `continue` so
async-tool messages with an unrecognized kind don't fall through to the
regular tool-result handling block — clearer at the call site, and safer
against future additions to AsyncToolMessageKind.
2026-05-08 15:43:37 -04:00
Paul Kompfner
72d0fb418a fix: restore cancel_on_interruption=False support in AWS Nova Sonic and OpenAI Realtime
Before the new async-tool mechanism landed, AWSNovaSonicLLMService and
OpenAIRealtimeLLMService honored cancel_on_interruption=False by simply
not cancelling in-flight function calls on interruption — the eventual
result then flowed through the same channel as any synchronous tool
result. The new mechanism (which appends started/intermediate/final
messages to the LLM context as the underlying task progresses) broke
that path: the realtime services didn't know how to interpret those
messages, and the eventual result was never delivered to the provider.

Restore the flag's behavior by teaching both services to detect
async-tool messages in the context and route them appropriately:

- started → skipped silently. The provider already issued the tool call
  and natively awaits a result; nothing to send for the started marker.
- final → delivered via the formal tool-result channel. Same path as a
  synchronous tool result, just delayed.

Streamed intermediate results (FunctionCallResultProperties(is_final=
False)) are not supported on these realtime services. An intermediate
result is logged as an error and surfaced via push_error, then dropped.
Use a non-realtime LLM service if a tool needs to stream intermediate
results. (Docstrings on register_function, register_direct_function, and
FunctionCallResultProperties.is_final updated to call this out.)

A new shared module pipecat.processors.aggregators.async_tool_messages
is the single source of truth for the on-the-wire payload shape: the
aggregator uses its build_*_message functions when injecting messages,
and the realtime services use parse_message when scanning the context.

Adds two example files exercising a network-delayed weather tool with
each service. The plain realtime-aws-nova-sonic.py example is also
reverted to a synchronous tool call now that the async variant lives in
its own file.

Similar fixes for other realtime services are forthcoming.
2026-05-08 09:33:06 -04:00
filipi87
c9f0172e9f Example supporting plain websocket. 2026-05-08 09:46:18 -03:00
filipi87
2638885c62 Adding support for the plain websocket transport. 2026-05-08 09:37:07 -03:00
Aleix Conchillo Flaqué
94a94ee28c Merge pull request #4405 from pipecat-ai/aleix/user-turn-inference-event
Split user-turn-stop into inference-triggered and finalized events
2026-05-07 17:51:57 -07:00
Mark Backman
c46ede8335 Use Sphinx .. deprecated:: directive for deprecated aggregator params
Aligns deprecation docstrings on LLMUserAggregatorParams and
LLMAssistantAggregatorParams with CONTRIBUTING.md conventions:
present-tense parameter descriptions plus a `.. deprecated:: 1.2.0`
directive noting replacement and 2.0.0 removal. Also adds a runtime
DeprecationWarning for `user_turn_completion_config`, which previously
had no warning despite being deprecated.
2026-05-07 17:49:00 -07:00
Mark Backman
457a68ce64 Correct docstrings and comments regarding incomplete_long_timeout duration, 10 sec 2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué
b78cecf7b2 Rename UserTurnCompletedFrame to UserTurnInferenceCompletedFrame
The old name overlapped semantically with `UserStoppedSpeakingFrame`:
both could be read as "the user's turn is done." They're at different
layers — `UserStoppedSpeakingFrame` is the acoustic stop signal,
while this frame is the post-judgment "inference about the turn is
now complete (turn is semantically final)" signal emitted by the LLM
mixin (on ✓), an end-of-turn classifier, or a custom producer.

The new name pairs naturally with the existing
`on_user_turn_inference_triggered` event vocabulary and removes the
ambiguity with `UserStoppedSpeakingFrame`.
2026-05-07 17:47:41 -07:00
Aleix Conchillo Flaqué
952dddca8b Replace llm_completion_user_turn_stop_strategies() with FilterIncompleteUserTurnStrategies
Wrap the detector chain with `deferred(...)` and append the LLM
completion gate via a `UserTurnStrategies` specialization rather than
a free-standing helper, mirroring the existing
`ExternalUserTurnStrategies` pattern. The class lives next to other
strategy containers in `pipecat.turns.user_turn_strategies`, so users
discover it where they're already configuring `user_turn_strategies`.

The deprecated `filter_incomplete_user_turns` flag now rewires
through `FilterIncompleteUserTurnStrategies` under the hood, keeping
the migration path identical to before. `deferred(...)` stays public
as the explicit escape hatch for non-default compositions.
2026-05-07 17:47:39 -07:00
Aleix Conchillo Flaqué
e3e90d38aa Preserve full user transcript across multiple inferences in one turn
When a stop-strategy chain splits inference-triggered from
finalization (e.g. `LLMTurnCompletionUserTurnStopStrategy` gating a
deferred detector), more than one inference can fire inside a single
user turn — each adds the new transcription segment to the context.
Previously each inference overwrote `_pending_user_turn_aggregation`,
so the eventual `on_user_turn_stopped` event surfaced only the
segment from the last inference, dropping anything the user said
before it.

Concatenate each segment into `_full_user_turn_aggregation` instead
of overwriting, and combine that running buffer with any post-final-
inference segment when emitting the public event.
2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué
d1c8162b0c Route turn-completion markers through LLMMarkerFrame
Add an `LLMMarkerFrame(DataFrame)` for sideband LLM markers that need
to be persisted to context but should not flow through the standard
text path (TTS, transcript). The frame carries an
`append_to_context_immediately` flag so the assistant aggregator can
either commit the marker as a stand-alone message (○ / ◐) or merge it
with the upcoming aggregation as a prefix on the response (✓).

`UserTurnCompletionLLMServiceMixin` now emits `LLMMarkerFrame` instead
of pushing the marker as `LLMTextFrame(skip_tts=True)`, which fixes
the case where an incomplete-turn marker (○ / ◐) was aggregated by
the assistant aggregator but never committed to the context because
the assistant turn lifecycle didn't run to completion (no spoken
response, no `LLMFullResponseEndFrame`-driven `push_aggregation`).

The frame is intentionally generic so other components — STT services
with built-in turn signals, end-of-turn classifiers, custom
annotations — can use the same mechanism to inject sideband signals
into the assistant context.
2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué
1fa0310ea8 Add changelog for #4405 2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué
2281cd8359 Extract ExternalUserTurnCompletionStopStrategy as a reusable base
`LLMTurnCompletionUserTurnStopStrategy` previously bundled two
concerns: pushing `LLMUpdateSettingsFrame` on `StartFrame`, and
finalizing the turn on `UserTurnCompletedFrame`. The latter is
producer-agnostic — any component that emits `UserTurnCompletedFrame`
(STT with built-in turn detection, dedicated end-of-turn classifiers,
custom code) can drive finalization the same way.

Move the frame-handling half into a new
`ExternalUserTurnCompletionStopStrategy`. The LLM-specific subclass
now only adds the settings-frame push and inherits finalization. Mirrors
the existing `ExternalUserTurnStopStrategy` naming pattern.
2026-05-07 17:46:15 -07:00
Aleix Conchillo Flaqué
480eca42f5 Split user-turn-stop into inference-triggered and finalized events
Fixes a real bug: with `filter_incomplete_user_turns` enabled, the
smart-turn detector's tentative stop was firing `on_user_turn_stopped`
before the LLM had a chance to veto it. Observers, transcript
appenders and UI indicators received an early — and sometimes
duplicated — signal.

Decomposes the single stop concern into two events:
- `on_user_turn_inference_triggered` fires when a stop strategy has
  enough signal to start LLM inference. The aggregator pushes the
  context here, kicking off the LLM call.
- `on_user_turn_stopped` fires only when the user turn is semantically
  final. Built-in strategies fire both events at the same call site,
  preserving today's behavior for the common case.

Adds `LLMTurnCompletionUserTurnStopStrategy`, which gates
finalization on a `UserTurnCompletedFrame` (a fieldless system frame
emitted by any component judging turn completeness — currently the
`UserTurnCompletionLLMServiceMixin` on `✓`).

Adds `deferred(strategy)` / `DeferredUserTurnStopStrategy`, a thin
wrapper that forwards an inner strategy's events except
`on_user_turn_stopped`. Use this to install a stop strategy as an
inference trigger only, leaving finalization to a peer (e.g. the LLM
completion strategy).

Adds `llm_completion_user_turn_stop_strategies()` for the common
case:

    UserTurnStrategies(
        stop=llm_completion_user_turn_stop_strategies(),
    )

Deprecates `LLMUserAggregatorParams.filter_incomplete_user_turns`.
The aggregator emits a `DeprecationWarning`, wraps existing stop
strategies with `deferred(...)`, and appends
`LLMTurnCompletionUserTurnStopStrategy` automatically.
2026-05-07 17:46:09 -07:00
Mark Backman
1073510574 Merge pull request #4407 from pipecat-ai/mb/ui-agent-wire-format
feat(rtvi): add UI Agent Protocol as first-class RTVI message types
2026-05-07 20:03:41 -04:00
Mark Backman
47c05f3f30 Simplify changelog entry 2026-05-07 16:58:08 -07:00
Mark Backman
24904b89f5 Merge pull request #4443 from Anrahya/fix-gemini-tts-voice-names
fix: correct Gemini TTS voice names
2026-05-07 19:41:30 -04:00
orphis
c78977e4c7 chore: remove Gemini TTS voice name test 2026-05-08 05:03:15 +05:30
Mark Backman
f78b5f9240 Merge pull request #4446 from inworld-ai/ian/inworld-pcm
[inworld] default to using PCM encoding
2026-05-07 19:25:57 -04:00
Ian Lee
406f8b730b [inworld] default to using PCM encoding
* server returns audio bytes without headers
2026-05-07 16:05:34 -07:00
Mark Backman
7a2cec2e45 Merge pull request #4426 from marcelodiaz558/feature/elevenlabs_stt_keyterms
Add ElevenLabs STT keyterms support
2026-05-07 18:44:09 -04:00
Marcelo Díaz
edfcd6948b Add ElevenLabs STT keyterms support 2026-05-07 21:00:26 +00:00
kompfner
991ee9e0e6 Merge pull request #4404 from pipecat-ai/pk/mitigate-calls-to-missing-tools
Mitigate tool-call-related hallucination
2026-05-07 15:05:13 -04:00
filipi87
cb426cbb14 Fixing format. 2026-05-07 16:04:43 -03:00
filipi87
d39beff817 Fixing format. 2026-05-07 16:01:54 -03:00
filipi87
1eade184f1 Creating a status endpoint to return the available transports. 2026-05-07 15:53:15 -03:00
Mark Backman
a696729343 Merge pull request #4439 from pipecat-ai/mb/fix-deprecation-video-out-bitrate 2026-05-07 14:42:26 -04:00
orphis
ba705e9501 chore: add changelog for Gemini TTS voice fix 2026-05-08 00:11:19 +05:30
orphis
98c370457b fix: correct Gemini TTS voice names 2026-05-08 00:09:56 +05:30
filipi87
3fa193b983 Unified start route to make all transports available. 2026-05-07 15:34:32 -03:00
Filipi da Silva Fuchter
6189e920e1 Merge pull request #4433 from pipecat-ai/filipi/refactoring_elevenlabs
Refactoring ElevenLabs to send close_context as soon as the turn context is complete.
2026-05-07 13:10:36 -03:00
Filipi da Silva Fuchter
73625a273a Merge pull request #4440 from pipecat-ai/filipi/daily_send_message_issue
Fixing a race condition when cleaning up the daily transport.
2026-05-07 13:09:53 -03:00
filipi87
f91a55c97c Changelog entry for the fix. 2026-05-07 11:32:48 -03:00
filipi87
5f256e241c Fixing a race condition when cleaning up the daily transport. 2026-05-07 11:29:57 -03:00
Mark Backman
954f63dc7b Document deprecation docstring convention in CLAUDE.md.
Adds an explicit Code Style bullet for the `.. deprecated::` Sphinx
directive (forbidding inline `[DEPRECATED]` tags) and extends the
Docstring Example with a Pydantic params class showing the directive
inside a `Parameters:` block — the context CONTRIBUTING.md's existing
example didn't cover.
2026-05-07 10:03:43 -04:00
Mark Backman
6cc66a3df1 Update video_out_bitrate deprecation to use sphinx directive.
Replaces the inline `[DEPRECATED]` tag with a `.. deprecated:: 1.1.0`
directive per CONTRIBUTING.md docstring conventions, so the deprecation
shows up properly in the rendered docs.
2026-05-07 09:57:21 -04:00
filipi87
a445399337 Fixing a bug in the ElevenLabs TTS refactor where alignment state was reset too early mid-turn. 2026-05-07 10:10:54 -03:00
filipi87
5ed2057599 Merge branch 'main' into filipi/refactoring_elevenlabs 2026-05-07 09:32:53 -03:00
Filipi da Silva Fuchter
cacde00e26 Merge pull request #4435 from pipecat-ai/filipi/uninterruptible_frame
Refactoring TTSService to preserve uninterruptible frames.
2026-05-07 08:46:42 -03:00
Filipi da Silva Fuchter
b1b598f65e Merge pull request #4434 from pipecat-ai/filipi/fix_interruption_regression
Fix interruption blocked by slow non-uninterruptible frame in queue
2026-05-07 08:46:10 -03:00
filipi87
c48ee93892 Adding changelog entry for the fix. 2026-05-06 16:30:22 -03:00
filipi87
cf22dac171 Refactoring TTSService to preserve uninterruptible frames. 2026-05-06 16:26:45 -03:00
filipi87
36f6e22aee Adding changelog for the interruption fix. 2026-05-06 15:39:27 -03:00
filipi87
921a7a46cb Fix interruption blocked by slow non-uninterruptible frame in queue
When a non-uninterruptible frame was being processed slowly and an
uninterruptible frame was waiting in the queue, _start_interruption
skipped task cancellation. This caused interruptions to stall until
the slow frame finished, even though it had no reason to block them.

The fix: only skip cancellation when the *current* frame is
uninterruptible. Uninterruptible frames already in the queue are
preserved regardless, because __create_process_task calls
__reset_process_queue internally, which always retains them.

Fixes: https://github.com/pipecat-ai/pipecat/issues/4412
2026-05-06 15:35:43 -03:00
filipi87
fda18a9afa Adding changelog for the elevenlabs improvement. 2026-05-06 14:58:18 -03:00
filipi87
d146a7f8e0 Refactoring ElevenLabs to send close_context as soon as the turn context is complete. 2026-05-06 14:55:49 -03:00
Filipi da Silva Fuchter
90f0f7cd27 Merge pull request #4431 from pipecat-ai/filipi/tts_deadlock
Fixing TTSService deadlock.
2026-05-06 14:52:04 -03:00
Mark Backman
37376b3506 Merge pull request #4429 from pipecat-ai/mb/update-grok-default-llm-model
fix(xai): update default Grok model to grok-4.20-non-reasoning
2026-05-06 13:41:05 -04:00
Mark Backman
729418c2b7 Merge pull request #4428 from pipecat-ai/mb/deprecate-resampy
chore(audio): deprecate ResampyResampler
2026-05-06 13:40:51 -04:00
filipi87
4512038a17 Creating a changelog entry for the fix. 2026-05-06 13:36:20 -03:00
filipi87
a23baf9de6 Fixing TTSService deadlock. 2026-05-06 13:32:26 -03:00
Mark Backman
d18fe7c39c feat(rtvi): type UI accessibility snapshots 2026-05-06 11:29:19 -04:00
Mark Backman
41124dc494 refactor(rtvi): clarify UI message names 2026-05-06 11:08:25 -04:00
Filipi da Silva Fuchter
95db08646c Merge pull request #4430 from pipecat-ai/filipi/flux_audio
Implementing dynamic watchdog timeout for Deepgram Flux STT
2026-05-06 11:40:06 -03:00
filipi87
03e5ebb266 Improving watchdog_min_timeout description. 2026-05-06 11:37:18 -03:00
filipi87
5daf267c11 Adding changelogs. 2026-05-06 11:26:14 -03:00
filipi87
1cb77b422a Created a watchdog_min_timeout to allow to change the default value. 2026-05-06 11:22:37 -03:00
filipi87
0c779b4c3d Implementing dynamic watchdog timeout for Deepgram Flux STT 2026-05-06 11:01:58 -03:00
Mark Backman
138991418a docs(changelog): add 4429 entry for Grok default model update 2026-05-06 09:51:01 -04:00
Mark Backman
94e136a6b7 fix(xai): update default Grok model to grok-4.20-non-reasoning
grok-3 is being retired from the xAI API on May 15, 2026. Switch the
default to grok-4.20-non-reasoning, which xAI recommends for non-reasoning
workloads and is appropriate for real-time voice AI.
2026-05-06 09:48:39 -04:00
Mark Backman
9598e262b5 docs(changelog): add 4428 deprecation entry for ResampyResampler 2026-05-06 09:41:14 -04:00
Mark Backman
8c3521f2e4 chore(audio): deprecate ResampyResampler in favor of SOXR resamplers
Emits a DeprecationWarning on instantiation. ResampyResampler will be
removed in Pipecat 2.0 along with the default resampy and numba
dependencies.
2026-05-06 09:40:13 -04:00
Mark Backman
eda98fb13f Merge pull request #4424 from pipecat-ai/mb/revert-elevenlabs-tts-alignment
fix(elevenlabs): only use normalizedAlignment when pronunciation dict is set
2026-05-06 08:27:25 -04:00
Mark Backman
3722ee223c Merge pull request #4419 from pipecat-ai/mb/fix-changelog-entry-4416
Fix changelog filename for 4416
2026-05-05 14:50:24 -04:00
Mark Backman
2620e76dab docs(elevenlabs): clarify alignment leading-space handling 2026-05-05 14:49:41 -04:00
Mark Backman
2447db766e docs(changelog): add 4424 entry for elevenlabs alignment selection fix 2026-05-05 14:49:41 -04:00
Mark Backman
61a81ed87b fix(elevenlabs): use alignment by default, normalizedAlignment only with pronunciation dicts
PR #4344 unconditionally switched to normalizedAlignment to fix garbled
words with pronunciation dictionaries (#4316). But normalizedAlignment
returns the post-normalized form of what was spoken - including
romanization of non-Latin scripts (Chinese rendered as pinyin), which
ends up in the LLM context and degrades subsequent turns.

Gate the switch on pronunciation_dictionary_locators being configured.
Adds a _select_alignment helper with preferred-with-fallback (both
fields are nullable per the API schema), used by both the WebSocket
and HTTP services. Tests cover dictionary mode, default mode, fallback
when preferred is missing or null, and HTTP field-name variants.
2026-05-05 14:49:41 -04:00
Mark Backman
735cd09c7e Merge pull request #4422 from cshape/tts-2
feat(inworld): default to inworld-tts-2
2026-05-05 14:00:04 -04:00
Paul Kompfner
2616076bec Add deterministic dev-error demo example
``examples/function-calling/function-calling-missing-handler.py``
demonstrates the missing-handler path by deliberately advertising a
tool to the LLM without registering its handler — what happens when a
developer forgets to call ``register_function``. Exercises the new
``logger.error`` severity end-to-end without needing to coax the LLM
into hallucinating.
2026-05-05 13:08:00 -04:00
Paul Kompfner
40667e50fc Add changelog for #4404 2026-05-05 13:03:49 -04:00
Paul Kompfner
e06e0c0282 Mitigate tool-call-related hallucination
When tools change mid-conversation, LLMs can produce a few different
flavors of tool-call-related hallucination: calling tools that have
been removed, avoiding tools that have been re-added, or hallucinating
output (made-up answers or tool-call-shaped non-tool-calls) when tools
are unavailable.

This change introduces an opt-in ``add_tool_change_messages`` flag on
the LLM aggregators (preferred entry point: ``LLMContextAggregatorPair(
..., add_tool_change_messages=True)``) that appends a developer-role
message to the context whenever ``LLMSetToolsFrame`` changes the set
of advertised standard tools. Helps the LLM stay coherent across tool
changes by spelling out exactly what just became available or
unavailable. Both aggregators participate; whichever handles the
frame first wins, and the other (if any) sees an empty diff against
the shared context and stays silent — order-independent regardless of
whether the frame flows downstream or upstream.

Also tightens the existing missing-handler path (introduced in #4301):

- Reworded the terminal tool result to a neutral "The function
  ``X`` is not currently available." (overridable via
  ``LLMService.MISSING_FUNCTION_CALL_MESSAGE_TEMPLATE``). Previously
  read "Error: function 'X' is not registered."
- Logs at the call site now distinguish developer error (tool
  advertised but no handler registered → ``logger.error``) from
  hallucination (tool not advertised → ``logger.warning``).

Includes a manual validation harness
(``examples/features/features-add-tool-change-messages.py``) that
exercises the new ``add_tool_change_messages`` mitigation by flipping
tool availability on a turn counter so its effect can be observed
end-to-end with the flag on vs. off.
2026-05-05 13:02:43 -04:00
Cale Shapera
84eefba4df docs: add changelog fragment for tts-2 default flip 2026-05-05 09:20:16 -07:00
Cale Shapera
fe3af5d9f7 feat(inworld): default to inworld-tts-2
Flip the default Inworld TTS model from inworld-tts-1.5-max to
inworld-tts-2 across:
- InworldHttpTTSService (HTTP)
- InworldTTSService (WebSocket)
- InworldRealtimeLLMService (cascade Realtime)

inworld-tts-1.5-max and inworld-tts-1.5-mini remain valid options;
existing users can pin the prior model explicitly via the model
setting. Docstring examples updated to reference the new default.
2026-05-05 09:20:16 -07:00
Mark Backman
7729eecfe4 Fix changelog filename for 4416 2026-05-04 21:54:58 -04:00
Mark Backman
fa31a2fd63 Merge pull request #4416 from pipecat-ai/mb/pr-4333-aws-credentials-review
feat(aws): add shared credential resolver with boto3 chain fallback
2026-05-04 21:48:33 -04:00
Mark Backman
678d40e102 docs(changelog): add 4333 entries for AWS credential resolver expansion 2026-05-04 19:30:37 -04:00
Mark Backman
8becafee38 fix(aws): use shared credential resolver in Polly, Bedrock, AgentCore
Polly TTS, Bedrock LLM, and AgentCore previously did
`arg or os.getenv("AWS_...")` and handed the result straight to
aioboto3.  When only one of `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`
was set, aioboto3 received a half-populated kwarg and errored instead of
falling through to the boto3 credential provider chain (instance
profiles, IRSA, ECS task roles, SSO, etc.).

Route credential resolution through the shared `resolve_credentials()`
helper introduced for AWS Transcribe so all four services follow the
same `explicit → env → boto3 chain` fallback.  Add an
`AWSCredentials.to_boto_kwargs()` method to bridge the dataclass field
names (`access_key`, `secret_key`) to the aioboto3 kwargs
(`aws_access_key_id`, `aws_secret_access_key`).

No public API changes.  Behaviour is identical for fully-explicit and
fully-env-var configurations; partial env vars now correctly trigger
the chain instead of erroring.
2026-05-04 19:23:53 -04:00
Mark Backman
83190d38e9 Merge pull request #4414 from pipecat-ai/mb/fix-ttsspeakframe-assistant-turn-stopped 2026-05-04 18:12:33 -04:00
Mark Backman
7519c26ac5 Merge pull request #4417 from pipecat-ai/mb/resolve-runner-filepath 2026-05-04 18:09:34 -04:00
Mark Backman
b2b7e9ee6f Merge pull request #4415 from pipecat-ai/mb/fix-elevenlabs-leading-spaces-flash 2026-05-04 18:08:31 -04:00
Mark Backman
e864d5778a ci: install runner extra for the coverage job 2026-05-04 16:44:47 -04:00
Mark Backman
89f10dd9a1 test: drop webrtc-dependent test, remove webrtc extra from CI 2026-05-04 16:42:05 -04:00
Mark Backman
f67e3ef0b2 ci: install runner and webrtc extras for the test job 2026-05-04 16:29:58 -04:00
Mark Backman
5b087d6aeb docs: add changelog for #4417 2026-05-04 16:22:26 -04:00
Mark Backman
e780f759d0 fix: validate download path containment in runner
Resolve and contain the user-supplied filename before serving it from
the runner's /files endpoint. Also raise a 404 (instead of returning
None) when the downloads folder is unset, and use the resolved
basename for Content-Disposition.
2026-05-04 16:20:27 -04:00
Daniel Wirjo
35153de28e feat(aws): add shared credential resolver with boto3 chain fallback
AWS Transcribe STT previously only supported credentials via explicit
parameters or environment variables. Services running with IAM roles
(EKS pod roles, IRSA, ECS task roles, EC2 instance profiles) or SSO
couldn't use Transcribe without exporting static credentials.

Changes:
- Add resolve_credentials() to utils.py providing a standard fallback
  chain: explicit params → environment variables → boto3 credential
  provider chain (instance profiles, IRSA, pod roles, SSO, etc.)
- Add AWSCredentials dataclass for type-safe credential passing
- Update AWSTranscribeSTTService to use resolve_credentials() instead
  of manual os.getenv() calls
- The boto3 fallback is only attempted when both access key and secret
  key are unresolved, avoiding replacement of explicitly provided creds
- boto3 is imported lazily inside the function to avoid hard dependency
  for services that don't need the fallback chain
- Add 7 unit tests covering the credential resolution chain

The Bedrock LLM and Polly TTS services already support the full
credential chain via aioboto3.Session() and are not modified.

Related to #4197
2026-05-04 15:40:06 -04:00
Mark Backman
9886d72f5e Add changelog for PR #4415 2026-05-04 15:18:15 -04:00
Mark Backman
90e6b51acd Fix ElevenLabs alignment chunk spacing 2026-05-04 15:15:37 -04:00
Mark Backman
61acdba3ae docs: add changelog entry for #4414 2026-05-04 10:43:52 -04:00
Mark Backman
f1a3ee97de fix: surface TTSSpeakFrame greetings in on_assistant_turn_stopped
Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to
silently lose their trailing words and never fire on_assistant_turn_stopped:

- LLMAssistantPushAggregationFrame was emitted without a PTS, so the
  transport routed it through the audio (sync) queue while word-level
  TTSTextFrames travel through the clock queue. The aggregation could reach
  the assistant aggregator before the final words, leaving them orphaned
  in the buffer. Stamp the frame with `_word_last_pts + 1` when there are
  word timestamps so it can't overtake them.

- The aggregator's LLMAssistantPushAggregationFrame handler called
  push_aggregation() directly, bypassing _trigger_assistant_turn_stopped.
  For TTS-only flows there is no LLMFullResponseStartFrame, so the turn
  start timestamp was never set and on_assistant_turn_stopped never fired.
  Open a turn (if needed) and trigger stopped from the handler.

Fixes #4264.
2026-05-04 10:41:22 -04:00
Mark Backman
b363b91d12 Merge pull request #4401 from pipecat-ai/mb/grok-realtime-model
fix(xai/realtime): pass model as query param on connect
2026-05-04 09:44:33 -04:00
Mark Backman
43abca0b06 feat(rtvi): add UI Agent Protocol as first-class RTVI message types
The UI Agent Protocol lets server-side AI agents observe and drive
a GUI app on the client side through structured RTVI messages.
Five new top-level RTVI types in kebab-case, in line with the rest
of the protocol:

  ui-event         client → server  (named event with payload)
  ui-command       server → client  (named command with payload)
  ui-snapshot      client → server  (accessibility tree of the page)
  ui-cancel-task   client → server  (cancel an in-flight task group)
  ui-task          server → client  (task lifecycle envelope)

Each ships paired ``*Data`` / ``*Message`` pydantic models in
``rtvi.models``, following the existing RTVI envelope convention
(``BotReady`` / ``BotReadyData``, ``Error`` / ``ErrorData``, etc.).
Built-in command payload models (``Toast``, ``Navigate``,
``ScrollTo``, ``Highlight``, ``Focus``, ``Click``, ``SetInputValue``,
``SelectText``) ship alongside; matching default React handlers
live in ``@pipecat-ai/client-react``.

Bumps the RTVI ``PROTOCOL_VERSION`` from ``1.2.0`` to ``1.3.0``.
Purely additive: only new top-level message types are introduced;
no existing wire shapes are changed. The major-version
compatibility check on ``client-ready`` still passes for older
1.x clients, so old clients continue to connect without warning;
they simply will not exercise the new types.

The ``RTVIProcessor`` registers a new ``on_ui_message`` event
handler that fires for inbound ``ui-event`` / ``ui-snapshot`` /
``ui-cancel-task`` with the parsed Message envelope, mirroring how
``on_client_message`` works for ``client-message``.

Five new pipeline frames let pipeline observers and processors see
UI traffic the same way they see other RTVI messages, mirroring
the frame-and-event pattern used by ``client-message``:

  RTVIUICommandFrame(command_name, payload)
    Pushed by downstream code (e.g. ``pipecat-ai-subagents``'s
    bridge) to send a UI command to the client. Wrapped by the
    observer into a ``UICommandMessage`` envelope.

  RTVIUITaskFrame(data: UITaskData)
    Same shape but for ``ui-task``; wrapped into ``UITaskMessage``.
    ``UITaskData`` is a discriminated union of the four lifecycle
    kinds (group_started / task_update / task_completed /
    group_completed).

  RTVIUIEventFrame(msg_id, event_name, payload)
  RTVIUISnapshotFrame(msg_id, tree)
  RTVIUICancelTaskFrame(msg_id, task_id, reason)
    Pushed by ``RTVIProcessor._handle_message`` whenever the
    matching inbound message arrives, alongside firing
    ``on_ui_message``. Pipeline observers and processors can match
    on the frame; subscribers like the subagents bridge keep using
    the event handler.

The data layer is the canonical authority for the wire format:
higher-level frameworks like ``pipecat-ai-subagents`` build the
agent abstractions on top, and single-LLM Pipecat apps can target
the same wire format directly via custom tools that emit these
typed messages.
2026-05-02 12:09:01 -04:00
Mark Backman
30efd11e15 Merge pull request #4397 from pipecat-ai/mb/smallwebrtc-trace-app-message 2026-05-01 20:47:04 -04:00
kompfner
a745e8d318 Merge pull request #4378 from pipecat-ai/pk/more-pyright-fixes
More pyright fixes
2026-05-01 14:09:27 -04:00
Paul Kompfner
2730e47e61 ci: install all extras for the pyright type-check job
The pyright job in `format.yaml` previously installed only `--extra
daily --extra tracing`. That was sufficient when most optional-dep-
using files were in the pyright ignore list, but as this PR has
cleared dozens of files, those files now reference symbols from
optional-dep modules (`aiortc.RTCIceServer` via `IceServer`,
`google.genai.types.HttpOptions`, etc.). `reportMissingImports: false`
tolerates the failed imports themselves, but the imported names
become `Unknown` and using them as type expressions trips
`reportInvalidTypeForm` / `reportAttributeAccessIssue` — errors
that aren't gated by that flag.

Switch to `--all-extras --no-extra gstreamer --no-extra local`
(matching the dev setup in README.md), so pyright sees the same
dependency set the code is intended to be type-checked against and
the install-set scales naturally as more files leave the ignore list.

Also reconcile CLAUDE.md's setup command, which only excluded
`gstreamer`. README.md is canonical and additionally excludes
`local` (pyaudio requires `portaudio` native libs that aren't
installed by default on a clean Ubuntu CI runner).
2026-05-01 09:36:14 -04:00
Paul Kompfner
4703df8686 fix: clear 8 more services from pyright ignore list
A fourth pass over low-error-count files. Drops 8 files (57 → 49) and
full-pyright errors from 525 → 496. Default pyright stays clean.

Optional access on transport/client receivers (4 files). Same fix
shape as #4359 — a receiver typed `X | None` accessed without a
guard. For "should never happen" cases (caller's lifecycle ensures
the field is non-None when the method runs), used `assert` rather
than silent early-return so an invariant violation surfaces loudly:

- `transports/whatsapp/client.py` (5 errors): `_validate_whatsapp_webhook_request`
  was typed `bytes` / `str` but called with `bytes | None` / `str | None`.
  Widened the helper signature and pushed the explicit None-check
  inside (matching its existing empty-string check). Also handled
  `pipecat_connection.get_answer()` returning `None` — would have
  crashed at `.get("sdp")` before.
- `transports/websocket/client.py` (5 errors): four are the deprecated
  `websockets.WebSocketClientProtocol` alias (same `# pyright: ignore[reportAttributeAccessIssue]`
  as the `services/websocket_service.py` fix from earlier in this PR).
  The fifth was `async for message in self._websocket` — traced the
  call chain and confirmed `_client_task` is created only after
  `self._websocket` is assigned and cancelled before it's cleared, so
  the field is never None when `_client_task_handler` runs. Used `assert`.
- `services/openai/stt.py` (4 errors): same pattern. `_receive_messages`
  is started by `_connect()` only when `self._websocket` is set, and
  the reconnect loop in `WebsocketService._receive_task_handler`
  re-establishes it before each retry. `assert` at entry. Plus L478/L483:
  the `try`/`except ModuleNotFoundError` import-guard makes
  `websocket_connect` and `State` `<type> | None`; `__init__` already
  raises `ImportError` if either is None, so an `assert` at the
  `_connect_websocket` use site is honest. Plus an L538 `Language | str`
  cast (same shape as last batch).
- `services/deepgram/flux/base.py` (2 errors): `event = data.get("event")`
  flowed into `_handle_turn_resumed(event: str)` as `Any | None`.
  Tightened with an `isinstance(event, str)` guard before the
  `FluxEventType(event)` lookup. The other error (`average_confidence > min_confidence`
  where `min_confidence: float | None`) was a latent crash on missing
  confidence data — restored the original `not min_confidence` (which
  treats both `None` and `0.0` as "no filter") and added an explicit
  drop-on-missing-confidence-data branch.

`gemini_live` Settings/InputParams (vertex). The deprecated `InputParams`
declares `modalities: GeminiModalities | None` and `media_resolution: GeminiMediaResolution | None`,
but their downstream usage at `services/google/gemini_live/llm.py:952,959`
calls `.value` on each — `None` would crash. Rather than touching the
deprecated input model, translate `None` to the canonical defaults
(`GeminiModalities.AUDIO`, `GeminiMediaResolution.UNSPECIFIED`) at the
assignment site in `vertex/llm.py`. Also fixed an unrelated annotation
bug: `_get_credentials` was annotated `-> str` but actually returns
`service_account.Credentials` (used correctly by the caller — only
the annotation was wrong).

`moondream/vision.py` (3 errors). `frame.format` is `str | None` but
`Image.frombytes(mode, ...)` requires `str`; raise instead of crashing
on missing format. The other two errors are pyright thinking the
moondream2-custom `encode_image` and `query` methods are `Tensor`
(rather than callables) — those are provided by the model code via
`trust_remote_code=True` and aren't visible to pyright on the base
`AutoModelForCausalLM` type. Scoped `# pyright: ignore[reportCallIssue]`
on the two call sites.

`transports/base_output.py` (3 errors). Two are `self._mixer.mix(...)`
calls in `with_mixer`, a closure invoked only when `self._mixer` is
truthy at the call site — captured the mixer to a local variable
inside the closure with an `assert`, then used that. Third is the
PIL `frombytes(mode, ...)` shape — `frame.format is None` early-
return guard at the top of `resize_frame` so the main resize logic
reads cleanly.

`elevenlabs/tts.py` (4 errors). The payload-building dict at L1271
was typed `dict[str, str | dict[str, float | bool]]` — an aspirational
shape that matched only the first two assignments. Subsequent code
assigned `list[dict[...]]` (pronunciation locators) and bools, all
violating the annotation. Same pattern at L926 (the WebSocket-init
`msg`). Both widened to `dict[str, Any]`, which is the honest shape
for a JSON request payload and what similar code uses elsewhere.

Files dropped from the ignore list (57 → 49):
services/deepgram/flux/base.py, services/elevenlabs/tts.py,
services/google/gemini_live/vertex/llm.py,
services/moondream/vision.py, services/openai/stt.py,
transports/base_output.py, transports/websocket/client.py,
transports/whatsapp/client.py.
2026-05-01 09:36:14 -04:00
Paul Kompfner
26a40e2e62 fix: clear 10 more services from pyright ignore list
A third pass over low-error-count files in the ignore list. Drops 10
files (67 → 57) and full-pyright errors from 555 → 525. Default
pyright stays clean.

Optional access guards (4 files). The same fix shape as 9e9b1f39e:
a receiver typed `X | None` accessed without a guard, fixed with a
local-var capture or an early return.

- `mistral/stt.py`: `_connection.send_audio` could crash if
  `_connect()` swallowed an exception and left `_connection` unset;
  drop the audio chunk with a warning instead. `_receive_events`
  iterating `_connection.events()` got the same defensive narrowing.
- `deepgram/flux/stt.py`: `_websocket_url` is set in `_connect`
  before `_connect_websocket` is called, but pyright doesn't track
  that across methods — assert at the use site. `websocket.response`
  is `Response | None` in the websockets stubs even though it's
  always populated post-handshake; guarded with a fallback.
- `audio/filters/rnnoise_filter.py`: the module-level import sets
  `RNNoise` to `None` if `pyrnnoise` isn't installed; raise
  `ImportError` explicitly instead of relying on the existing try-
  block to catch the `None(...)` call. Also gated `filter()` with
  `or self._rnnoise is None` so pyright sees the narrowing.
- `transports/smallwebrtc/request_handler.py`: `get_answer()`
  legitimately returns `None`; raise instead of crashing on three
  subscript accesses.

`TTSService` `audio-context` API tightening. Mirroring the
`append_to_audio_context` fix from the previous batch:
`remove_audio_context` was typed `str` but is called with `str | None`
from `get_active_audio_context_id()` results. Widened to `str | None`
and the `None` handling lives in the function body (early debug log
+ return) — matching `append_to_audio_context`'s shape.
`audio_context_available` keeps its narrow `str` signature; asking
"is `None` available?" isn't a meaningful question (`_audio_contexts`
is `dict[str, asyncio.Queue]`). The internal call site in
`on_turn_context_completed` narrows `_turn_context_id` explicitly
before passing it. Side effect: deepgram/tts.py's L307 error clears
without local changes.

`deepgram/tts.py` (4 errors → 0): the same `push_error(ErrorFrame(...))`
latent bug we fixed in resembleai earlier in this PR — `push_error`
takes a string; there's a separate `push_error_frame` for frames.
Two sites switched. The Optional `_websocket.response` access is
guarded the same way as deepgram/flux/stt.py. The `remove_audio_context`
error was cleared by the tightening above.

`aws/utils.py` (3 errors → 0): `AWSTranscribePresignedURL` declared
`session_token: str` but the dict source is `str | None` (AWS
supports long-term IAM creds without a session token). Same for
`vocabulary_name`/`vocabulary_filter_name` on `get_request_url`,
which were typed `str = ""` even though the body uses truthy checks
to skip them. Widened to `str | None = None` — matches actual
runtime semantics.

`audio/dtmf/utils.py` (2 errors → 0): `files("...").joinpath(...)`
returns a `Traversable`, but `aiofiles.open` wants a real path. For
regular pip installs this worked in practice (Traversable was a
`Path`), but it would fail for zipped distributions (zipapp,
zipimport) where the resource isn't on disk. Wrapped in
`importlib.resources.as_file(...)` — the canonical bridge that
extracts to a temp file when the resource isn't already on the
filesystem. Validated end-to-end: regular install still reads bytes;
ad-hoc zipapp test confirmed `as_file` extracts the resource and
returns a real Path.

`openai/image.py` (2 errors → 0): the `size` arg to
`images.generate` is `Literal[...] | None` in the SDK but our
settings field is `str | None`. Mirrored the `groq/tts.py`
hint-not-constraint pattern from the previous batch: defined a
module-level `OpenAIImageSize = Literal[...]` alias with a comment
attributing the upstream symbol and documenting the cast contract
(callers can pass any string; invalid values surface as an OpenAI
API error). Also guarded `image.data[0]` (response.data is
`list[Image] | None`).

`processors/frameworks/{langchain,strands_agents}.py` (4 + 4 → 0):
both processors do `messages[-1]["content"]` on a value typed
`LLMStandardMessage | LLMSpecificMessage` (the latter is a dataclass,
not a dict, so `__getitem__` errors). Historically these only
handled plain-text user messages, so the fix is two explicit guards
(skip if the last message isn't a dict; skip if `content` isn't a
string) plus a TODO noting that other shapes (multi-modal content,
provider-specific messages) aren't supported yet. langchain's
`__get_token_value` also got a small fix where `AIMessageChunk.content`
is `str | list[parts]` but the function declares `-> str`; stringify
the list case. strands_agents' surfaced two unrelated narrows: a
`graph_exit_node: str | None` arg gated by an `__init__`-time assert,
and `agent.stream_async` reached only when we're not in graph mode.

Files dropped from the ignore list (67 → 57):
audio/dtmf/utils.py, audio/filters/rnnoise_filter.py,
processors/frameworks/langchain.py,
processors/frameworks/strands_agents.py, services/aws/utils.py,
services/deepgram/flux/stt.py, services/deepgram/tts.py,
services/mistral/stt.py, services/openai/image.py,
transports/smallwebrtc/request_handler.py.
2026-05-01 09:36:14 -04:00
Paul Kompfner
31ff07916f fix: clear 10 more services from pyright ignore list
A second pass over the low-error-count files in the ignore list. Drops
10 files (77 → 67) and full-pyright errors from 580 → 555. Default
pyright stays clean.

Three coherent shapes plus a handful of one-offs:

`Language | str | None` → `Language | None` at STT frame boundaries.
`assert_given(self._settings.language)` returns `Language | str | None`
(strips `_NotGiven`, keeps the rest), but `TranscriptionFrame.language`
expects `Language | None`. In practice both `_settings.language` and
SDK-supplied codes resolve to a `Language` enum value, but technically
they could be raw strings — and `Language` is a StrEnum, so downstream
consumers (which mostly compare/serialize as strings) handle either.
Used `cast("Language | None", ...)` at each call site rather than a
runtime-validating helper, so an unrecognised code (e.g. one we
haven't added to the enum yet) still flows through unchanged. Cleared
azure/stt.py, aws/stt.py, gradium/stt.py; mistral/stt.py keeps the
cast at the SDK boundary (storing under `_detected_language: Language
| None`) but stays in the ignore list because of two unrelated
Optional-access errors.

aiobotocore `async with` stub gap. `aioboto3.Session().client(...)`
is an async context manager at runtime but its stubs don't advertise
`__aenter__`/`__aexit__` to pyright. Scoped
`# pyright: ignore[reportGeneralTypeIssues]` on the two affected
sites: aws/agent_core.py and aws/tts.py. aws/tts.py also had a latent
bug on the no-`AudioStream` path: the original code set
`audio_data = None` and then crashed in `resample(...)` and
`len(audio_data)` below; replaced with an early `return` after
logging — matches the convention elsewhere (OpenAI TTS, etc.) of not
recording usage metrics on the error path.

heygen `event_id: str | None` → `str` at transport→client boundary.
Three call sites in transports/heygen/transport.py passed `self._event_id`
(`str | None`) into client methods that take `str`. Added a guard at
each: `agent_speak_end` and `interrupt` only fire when `_event_id` is
set; `write_audio_frame` warn-and-drops when there's no active bot
event rather than sending a malformed message.

`OpenAIResponsesLLMInvocationParams` TypedDict.
`get_llm_invocation_params` always sets both `input` and `tools` in
the same dict literal, but the TypedDict was `total=False` so direct
subscript access (`invocation_params["input"]`) tripped
`reportTypedDictNotRequiredAccess` in services/openai/responses/llm.py.
Marked both keys `Required[...]`; `instructions` stays non-required
since it's only added when a system instruction is present.

Latent bug in heygen/api_interactive_avatar.py: the code accessed
`request_data.voice.voiceId` and `request_data.voice.elevenlabsSettings`,
but those names are Pydantic *aliases*; the actual attribute names
(used for attribute access) are `voice_id` and `elevenlabs_settings`.
Switched to the field names — those camelCase accesses would have
raised AttributeError at runtime if `voice` was set.

Other small fixes:

- assemblyai/stt.py: the deprecated `connection_params=` init path
  was reading `formatted_finals` and `word_finalization_max_wait_time`
  off `AssemblyAIConnectionParams`, but those fields were never on
  the deprecated input model — they were added to Settings later.
  Removed the reads (with a comment noting they're only available
  via the canonical `settings=...` API); the deprecated input model
  is unchanged.
- rtvi/processor.py: two `about: Mapping[str, Any] = None` parameter
  signatures — declared `Mapping`, defaulted to `None`, and both
  function bodies already handled the None case. Widened to
  `Mapping[str, Any] | None = None`.
- aws/stt.py: `subprotocols=["mqtt"]` failed against websockets'
  `Sequence[Subprotocol] | None` (Subprotocol is a NewType wrapper).
  Wrapped: `subprotocols=[Subprotocol("mqtt")]`.

Files dropped from the ignore list (77 → 67):
processors/frameworks/rtvi/processor.py, services/assemblyai/stt.py,
services/aws/agent_core.py, services/aws/stt.py, services/aws/tts.py,
services/azure/stt.py, services/gradium/stt.py,
services/heygen/api_interactive_avatar.py,
services/openai/responses/llm.py, transports/heygen/transport.py.
2026-05-01 09:36:14 -04:00
Paul Kompfner
814f00ce41 fix: clear 19 TTS/STT/etc. services from pyright ignore list
Several adjacent fix shapes that together drop 19 files from the
pyrightconfig.json ignore list (96 → 77) and full-pyright errors from
605 → 580. Default pyright stays clean.

TTS voice/context_id None handling — most files in this batch had a
single error of the shape "value typed `T | None` passed where `T` is
required" coming out of `assert_given(self._settings.voice)` (which
strips `_NotGiven` but not `None`) or `get_active_audio_context_id()`.
Two patterns:

- For services where a missing voice means the request can't proceed
  (hume, openai, xtts, groq, kokoro, piper), added an explicit None
  check. Inside `run_tts` we yield an `ErrorFrame` and return — matching
  each service's existing error-emission style (a few wrap `Exception`
  broadly and were fine; openai/hume/xtts had narrower or no try blocks
  so a bare `raise ValueError` would have escaped uncaught). Piper
  validates in `__init__`, where failing fast at construction is the
  right shape. OpenAI also gained a `voice not in VALID_VOICES` guard
  with a clear message listing supported voices.

- For services where a missing audio context just means "skip this
  message" (fish, lmnt, smallest, sarvam, neuphonic), widened
  `TTSService.append_to_audio_context`'s `context_id` signature to
  `str | None`. The function body already explicitly handled the None
  case with a debug log + early return, so the prior `str` annotation
  was a lie; making it honest cleared call sites without local guards.
  inworld's `_close_context` got the same treatment.

google.genai imports — switched `from google import genai` to
`import google.genai as genai` in google/image.py and google/llm.py.
The dotted form sidesteps a PEP 420 namespace-package stub gap (the
`google` namespace stubs come from a different distribution and don't
declare `genai`), which means pyright now resolves `genai` to the
real module rather than `Unknown`. IDE autocomplete on `genai.<x>`
works for the first time. In image.py this surfaced three latent
bugs that the `Unknown` resolution had been hiding (model was
`str | _NotGiven | None` not narrowed before passing to the SDK; two
spots accessed `.image_bytes` on an `Image | None` without a guard) —
all fixed. llm.py's dotted import surfaced 8 errors (Content-list
typing nuances, internal `_api_client` access, a few small Optionals);
deferred to a future pass since they're outside this commit's scope,
so the file stays in the ignore list with the dotted import.

Latent bug fixes spotted along the way:

- resembleai/tts.py was calling `push_error(ErrorFrame(...))`, but
  `push_error` takes a string — there's a separate `push_error_frame`
  for the frame case. Switched to the right method.
- openai/base_llm.py: `max_completion_tokens` was the only sibling
  field on `OpenAILLMSettings` missing `| None` in its type, which
  caused the assignment in openai/llm.py from `params.max_completion_tokens`
  (`int | None`) to fail. Added `| None` for consistency with
  `max_tokens` etc.
- heygen/base_api.py: `livekit_url: str = None` and `ws_url: str = None`
  declared `str` while defaulting to `None`. Removed the bogus
  defaults — both fields are required at construction in every
  in-tree call site, and the previous `str = None` was a Pydantic
  footgun.

Other small ones: gladia/stt.py needed a None guard on `_session_url`
before `websocket_connect`; openrouter/llm.py's
`build_chat_completion_params` override widened to `dict[str, Any]`
diverging from the parent's `OpenAILLMInvocationParams` — restored
the parent's type; neuphonic/tts.py guarded the receive loop's
`async for message in self._websocket` with a local-variable narrowing
matching the pattern from 9e9b1f39e.

groq/tts.py: tightened `output_format`'s typing to
`Literal["flac","mp3","mulaw","ogg","wav"] | str = "wav"`. The literal
side gives IDE autocomplete hints for the currently-supported set;
the `| str` side keeps callers unblocked if groq adds a new format
before this list is updated. A `cast` at the API boundary satisfies
groq's stricter `Literal` parameter type. The literal alias mirrors
the inlined Literal on `groq.resources.audio.speech.AsyncSpeech.create`'s
`response_format` (the SDK doesn't export it as a named symbol).

websocket_service.py: scoped `# pyright: ignore[reportAttributeAccessIssue]`
on `websockets.WebSocketClientProtocol`. That alias is now a deprecated
re-export from the legacy submodule and pyright doesn't surface it
on the top-level `websockets` namespace; runtime is fine. Migrating
to `websockets.ClientConnection` is a separate piece of work
(transports/websocket/client.py uses the same alias four times) and
left for a future commit.

Files dropped from the ignore list: fish/tts.py, gladia/stt.py,
google/image.py, groq/tts.py, heygen/base_api.py, hume/tts.py,
inworld/tts.py, kokoro/tts.py, lmnt/tts.py, neuphonic/tts.py,
openai/llm.py, openai/tts.py, openrouter/llm.py, piper/tts.py,
resembleai/tts.py, sarvam/tts.py, smallest/tts.py,
websocket_service.py, xtts/tts.py.
2026-05-01 09:36:14 -04:00
Paul Kompfner
96756bc1f6 fix: clean up TypedDict / Optional patterns in 6 more LLM adapters
Same approach as the previous round — apply boundary casts where the
code does dict-style mutation on TypedDict-typed values, narrow at
return sites, and document the LLMSpecificMessage limitation in
realtime adapters that pack history into a single text message.

aws_nova_sonic_adapter.py — pure typing + small narrowing fixes:
- Filter LLMSpecific items in `_from_universal_context_messages`
  (documented).
- `_from_universal_context_message` now declared
  `-> AWSNovaSonicConversationHistoryMessage | None` (it already had
  paths returning None implicitly).
- `get_messages_for_logging` returns `dict[str, Any]` per element
  via `dataclasses.asdict`, matching the declared return type.
- Use a local `role` variable so pyright keeps the narrowing across
  the truthy-content guard.

grok_realtime_adapter.py / inworld_realtime_adapter.py — same shape
of fix as `open_ai_realtime_adapter.py` from the previous batch.
The two files are essentially copies of the OpenAI Realtime adapter,
so the same template applies: cast at the boundary, filter
LLMSpecificMessage with a documented note, replace the implicit-None
fallthrough with `raise ValueError`, and switch the `text_content +=`
pattern (which fails when one of the parts is None) to a
`text_parts.append(...)` + `" ".join(...)` pattern.

open_ai_adapter.py — pure typing. Cast at the
`OpenAILLMInvocationParams` return, narrow the system-instruction
warning's `initial_content` to `str | None`, and cast the custom-tools
list to `list[ChatCompletionToolParam]`.

open_ai_responses_adapter.py — pure typing. Same shape: narrow
`first_content` to `str | None` for the warning resolver, cast the
constructed dict literals at append sites where the target is
`ResponseInputItemParam`, and cast `get_messages_for_logging`'s
return to the declared `list[dict[str, Any]]`.

processors/aggregators/llm_context.py — pure typing. Cast the
deepcopied message in the redaction loop in `get_messages` to
`dict[str, Any]` and the create_image/audio_message return-dict
literals to `LLMContextMessage`.

Removes 6 newly-clean files from the pyright ignore list.

Net: -77 pyright errors (full-config: 680 -> 603).
2026-05-01 09:36:14 -04:00
Paul Kompfner
5e24027fd5 fix: type fixes (and a few latent bug fixes) in 4 LLM adapters
Same shape of fix we applied to anthropic_adapter.py earlier — these
adapters do dict-style mutation on values typed as
ChatCompletionMessageParam (a union of TypedDicts) or against Optional
fields. Apply boundary casts (`cast(dict[str, Any], ...)` for the
mutation block, cast back to the TypedDict at return sites). Most
changes are pure typing (rename + cast); a handful in gemini and
openai_realtime are small defensive bug fixes for code paths that
were latently broken by Optional fields slipping through:

perplexity_adapter.py — pure typing. Cast the deepcopied messages to
`list[dict[str, Any]]` for the role-merging / system-conversion /
trailing-assistant-removal transformations and cast back to
ChatCompletionMessageParam at the return.

bedrock_adapter.py — pure typing. Cast the message to
`dict[str, Any]` at the top of `_from_standard_message` for the
tool-result / tool-use / image-content transformations. Cast the
constructed dict at the return site of `get_llm_invocation_params`.

gemini_adapter.py — typing + several None guards on Content.parts and
related Optional fields. Each guard turns a latent
`TypeError`/`AttributeError` (when the type-system-allowed None
showed up at runtime) into a defensive skip — the type annotations
say these can be None and we now handle that.

open_ai_realtime_adapter.py:
- Typing: cast the deepcopied messages, cast back where needed.
- LLMSpecificMessage handling: previously the function would crash on
  the first `.get()` call if any LLMSpecificMessage was in the list.
  Filter them out and document the limitation — this adapter's
  pack-into-single-text-message strategy doesn't compose with opaque
  per-provider payloads.
- Real bug fix: `events.ConversationItem` is a Pydantic BaseModel,
  not a TypedDict. The bulk-packing path was constructing a raw dict
  where a ConversationItem was expected. Replaced with proper
  constructor calls (matches what the single-user-message path
  already does).
- Real bug fix: `_from_universal_context_message` was declared
  `-> events.ConversationItem` but on the unhandled-message
  fallthrough it logged and returned None implicitly. Raise
  ValueError so the violation is loud, not silent.

Removes 4 newly-clean files from the pyright ignore list:
adapters/services/{perplexity,bedrock,gemini,open_ai_realtime}_adapter.py.

Net: -95 pyright errors (full-config: 775 -> 680).
2026-05-01 09:36:14 -04:00
Paul Kompfner
ef226c8a8e fix: silence _settings NotGiven leaks and tighten Google STT language method
Six pyright errors followed the same pattern: a value flowed out of
`self._settings.X` (typed `T | _NotGiven`) into a context that wanted
the plain `T`. Wrap each with `assert_given(...)` so the sentinel
gets stripped at the boundary:

- aws/nova_sonic/llm.py: `_settings.model` (in InvokeModel...Input)
  and `_settings.system_instruction` (passed to the adapter).
- deepgram/flux/base.py: iterating `_settings.keyterm`.
- google/stt.py: iterating `_settings.languages`.
- google/tts.py: iterating `_settings.speaker_configs`.
- openai/base_llm.py: `_settings.system_instruction` passed to the
  adapter.

Also takes a deeper pass at the related Google STT issue: the override
of `language_to_service_language` had been broadened to take
`Language | list[Language]` and return `str | list[str]`, a Liskov
violation against the base's `Language -> str | None` contract.
External callers always pass a single Language, and the only consumer
of the list path was Google STT's own `_get_language_codes`. Restore
the override to a single-Language signature and let
`_get_language_codes` iterate. The override is also tightened to
return `str` (narrower than the base's `str | None`, which is
LSP-compatible) since it always falls back to `"en-US"` rather than
returning None.

Net: -7 pyright errors (full-config run: 782 -> 775).
2026-05-01 09:36:14 -04:00
Paul Kompfner
2a731336be fix: tighten language_to_<service>_language return types to plain str
These provider-specific helpers are all thin wrappers around
`resolve_language(...)`, which itself returns `str` — never `None`.
The `str | None` annotations were misleading and were producing
spurious pyright errors at the call sites that assigned the result
into a `str` field. Update each helper's signature to `str` and
rewrite the `Returns:` docstring to describe the actual fallback
behaviour (resolve to base or full code, with a warning).

Importantly, the per-class `language_to_service_language(...)`
methods on `STTService`/`TTSService` subclasses keep `str | None` as
their return type. That signature is an extension hook for future
and/or third-party subclasses that may genuinely not be able to
produce a code for some languages, even though all in-tree first-
party services currently return a string.

Also includes one small unrelated tightening in azure/stt.py: wrap
`self._settings.language` with `assert_given(...)` so the truthy
fallback to `language_to_azure_language(Language.EN_US)` doesn't
silently swallow a NotGiven sentinel.

Net: -3 pyright errors (full-config run: 785 -> 782).
2026-05-01 09:36:14 -04:00
Paul Kompfner
bec407ce3a fix: handle Optional websocket/client receivers across services
Pyright flagged 19 sites where `await self._<connection>.send/recv/...`
was called on a receiver typed `X | None`. Each kind of call site
needed a slightly different fix to be both type-safe and behaviour-
preserving:

Streaming/user-facing paths (early return + warn — drop and warn is
the right runtime fail-safe when reconnect didn't succeed):

- cartesia/stt.py (run_stt)
- soniox/stt.py (_send_keepalive)
- elevenlabs/tts.py (run_tts — yields ErrorFrame and returns)
- deepgram/sagemaker/tts.py (run_tts)
- transports/lemonslice/transport.py (send_message)
- transports/tavus/transport.py (send_message)

"Should never happen" cases (early return with comment, no warn —
caller already gated on a separate `_is_*` check, so a warn would be
noise):

- deepgram/flux/stt.py (transport methods, gated by _transport_is_active)
- deepgram/flux/sagemaker/stt.py (same)
- stt_service.py (_send_keepalive, gated by _is_keepalive_ready)
- elevenlabs/stt.py (_send_keepalive, same)
- llm_service.py (_ws_recv — raises ConnectionError to match
  _ensure_connected's contract)
- heygen/client.py (receive loop, gated by self._connected)

Just-assigned-above (use a local variable so pyright keeps the
narrowing across statements):

- lmnt/tts.py
- gradium/stt.py
- fish/tts.py

Other:

- transports/websocket/server.py — used the existing local `websocket`
  parameter in scope instead of `self._websocket` for the close call.
- websocket_service.py — `send_with_retry` raises ConnectionError when
  `self._websocket` is None inside the existing try-block, so the
  broad `except Exception` triggers reconnect just as it would on a
  real send failure (preserving the prior behaviour where None
  silently fell through to the AttributeError-driven reconnect path).

Drops three now-clean files from the pyright ignore list: cartesia/stt.py,
elevenlabs/stt.py, and soniox/stt.py.
2026-05-01 09:36:14 -04:00
Paul Kompfner
1cd73b1ef8 refactor: give TAdapter a default to restore precise typing for unparameterized LLMService subclasses
After making LLMService generic, an unparameterized subclass
(`class MyService(LLMService):` with no bracket — the third-party
provider pattern) saw `get_llm_adapter()` return `Unknown` rather
than `BaseLLMAdapter` as it did before the refactor.

Add `default=BaseLLMAdapter` (PEP 696) on the TypeVar — via
`typing_extensions.TypeVar` so older Python targets keep working —
so unparameterized callers get `LLMService[BaseLLMAdapter]` and
`get_llm_adapter()` returns `BaseLLMAdapter`, matching the
pre-refactor type precision.

Two internal fallouts of having a default (where the default makes
unannotated `LLMService` resolve invariantly to
`LLMService[BaseLLMAdapter]`):

- `FunctionCallParams.llm` is now `LLMService[Any]` so concrete
  parameterizations like `LLMService[OpenAILLMAdapter]` can be
  passed where the field is set.
- The explicit `LLMService.__init__(self, **kwargs)` in
  `WebsocketLLMService.__init__` gets a `pyright: ignore[reportArgumentType]`
  comment — pyright's invariance handling can't see through the
  multi-inheritance + generic + default combination, but the
  runtime call is correct (generics are erased).
2026-05-01 09:36:14 -04:00
Paul Kompfner
c4f5f1ebbb test, refactor: follow-ups to LLMService generic refactor
Two follow-ups now that LLMService is generic over its adapter:

- Add an explicit backward-compat test verifying that an LLMService
  subclass with no generic parameter (the third-party-provider
  pattern) instantiates and returns a usable adapter. The existing
  MockLLMService (declared without brackets) already exercised this
  implicitly, but it's worth a named assertion.

- Drop the now-redundant `params: SomeLLMInvocationParams = ...`
  variable annotations on `adapter.get_llm_invocation_params()`
  results. Since `get_llm_adapter()` now returns the precise adapter
  type, and `BaseLLMAdapter` is generic in its invocation-params
  type, the call already infers the right TypedDict.
2026-05-01 09:36:14 -04:00
Paul Kompfner
49068ff557 refactor: make LLMService generic over its adapter type
Previously, `LLMService.get_llm_adapter()` returned `BaseLLMAdapter`,
which forced every caller that wanted the precise adapter type to
write `adapter: SomeAdapter = self.get_llm_adapter()` and accept
pyright's complaint that the assignment doesn't match the declared
type. That pattern existed in 17 places across the LLM services.

Make `LLMService` generic over its adapter type — `LLMService(...,
Generic[TAdapter])` with `TAdapter = TypeVar("TAdapter",
bound=BaseLLMAdapter)` — so subclasses opt in via
`LLMService[XAdapter]` and callers get the precise type back from
`get_llm_adapter()` automatically.

Backward-compatible for third-party providers: code that says
`class MyService(LLMService):` (no bracket) still type-checks, with
TAdapter resolving to BaseLLMAdapter from the bound — identical to
the pre-refactor behavior. The `adapter_class` attribute keeps its
loose `type[BaseLLMAdapter] = OpenAILLMAdapter` typing so the default
remains usable; one localized cast in `__init__` bridges the loose
class attr to the precise instance attr.

In-tree subclasses opted in:

- AnthropicLLMService -> LLMService[AnthropicLLMAdapter]
- AWSBedrockLLMService -> LLMService[AWSBedrockLLMAdapter]
- AWSNovaSonicLLMService -> LLMService[AWSNovaSonicLLMAdapter]
- BaseOpenAILLMService -> LLMService[OpenAILLMAdapter] (propagates to
  ~15 OpenAI-compatible providers like Cerebras, Groq, Together)
- GeminiLiveLLMService -> LLMService[GeminiLLMAdapter]
- GoogleLLMService -> LLMService[GeminiLLMAdapter]
- GrokRealtimeLLMService -> LLMService[GrokRealtimeLLMAdapter]
- InworldRealtimeLLMService -> LLMService[InworldRealtimeLLMAdapter]
- OpenAIRealtimeLLMService -> LLMService[OpenAIRealtimeLLMAdapter]
- _BaseOpenAIResponsesLLMService -> LLMService[OpenAIResponsesLLMAdapter]
- WebsocketLLMService is also generic so the multi-inheritance case
  (OpenAIResponsesLLMService) can keep both bases agreeing on TAdapter.

All 17 redundant `adapter: SomeAdapter = self.get_llm_adapter()`
annotations are now plain `adapter = self.get_llm_adapter()`.
2026-05-01 09:36:14 -04:00
Paul Kompfner
d23bdaaacd fix: handle NotGiven from from_standard_tools in Nova Sonic connect
Same pattern as the earlier get_setup_params fix: when context tools
are absent, the fallback `adapter.from_standard_tools(self._tools)`
can return the NotGiven sentinel, and `_send_prompt_start_event`
expects a list. Coerce via `or []` so the NotGiven case becomes an
empty list.
2026-05-01 09:36:14 -04:00
Paul Kompfner
53ce57b7fa fix: tighten _process_completed_function_calls in AWS Nova Sonic
Three small changes that resolve pyright errors and sharpen the logic:

- Guard `self._context` with the codebase's "should never happen"
  early-return pattern, so we don't blindly call `.get_messages()` on
  None.
- Skip `LLMSpecificMessage` items in the iteration. They're opaque
  provider-specific payloads with no `.get()`, and the surrounding
  logic only applies to standard tool-result messages.
- Match `role == "tool"` explicitly. The previous truthy-only check
  was working by accident — the `tool_call_id` filter further down
  was effectively narrowing to tool messages, but the intent is
  clearer when stated upfront.
2026-05-01 09:36:14 -04:00
Paul Kompfner
dabca70744 fix: warn and bail in reset_conversation when no context exists
reset_conversation is part of the public AWSNovaSonicLLMService API and
is also called internally from the receive-task error handler.
Previously it captured `self._context` (typed `LLMContext | None`) and
unconditionally passed it to `_handle_context`, which expects a real
context — silently doing the wrong thing if no initial context had
been received yet.

Treat that as developer error: log a warning and return early. Nothing
to preserve means nothing to reset.
2026-05-01 09:36:14 -04:00
Paul Kompfner
191bdc733f fix: conform AWSNovaSonicLLMService.get_setup_params to its protocol
The service implements the NovaSonicSessionSender protocol so the
session-continuation helper can target either the current or next
session. The protocol declares
`get_setup_params(self) -> tuple[str | None, list]`, but the
implementation was unannotated and could return NotGiven in the tools
position when from_standard_tools fell through to its NotGiven
sentinel. Add the matching return annotation and coerce the NotGiven
case to an empty list.
2026-05-01 09:36:14 -04:00
Paul Kompfner
5e1bb4cbe5 chore: remove anthropic_adapter.py from pyright ignore list
The file is now clean under pyright's basic type checking, so it can
move out of the ignore list and be enforced on every run.
2026-05-01 09:36:14 -04:00
Paul Kompfner
9ee123bf33 fix: resolve final pyright error in Anthropic cache control marker
Same MessageParam content-typing issue as the consecutive-message merge
fix: pyright doesn't carry the str-to-list narrowing forward, and
Iterable has no `[-1]` access. Cast to `list[Any]` and document the
chain of assumptions (list, non-empty, dict-typed last item) and where
each is upheld upstream.

This brings anthropic_adapter.py to 0 pyright errors (down from 115).
2026-05-01 09:36:14 -04:00
Paul Kompfner
66f43baf8f fix: resolve pyright errors in Anthropic _from_standard_message
The function takes an OpenAI ChatCompletionMessageParam (a union of
TypedDicts) and returns an Anthropic MessageParam (a different
TypedDict). It does the conversion via dict-level mutations that don't
type-check against either side's TypedDict schema. Work with the
deepcopied message as a plain dict and cast to MessageParam at the
return sites — matching the boundary-cast convention noted in
llm_context.py.

Drops anthropic_adapter.py from 20 to 2 pyright errors.
2026-05-01 09:36:14 -04:00
Paul Kompfner
252bb493af fix: cast Anthropic-format passthrough message to MessageParam
The fallback path in `_from_universal_context_message` returns
`message.message` from an `LLMSpecificMessage`, which is typed loosely
(`Any | dict`). The surrounding comment already documents the
assumption that the message is already in Anthropic format — make that
assumption explicit to pyright with a cast.
2026-05-01 09:36:14 -04:00
Paul Kompfner
c517b67bad fix: resolve pyright error when merging consecutive Anthropic messages
MessageParam types content as `str | Iterable[...]`, and Iterable has
no `.extend()`. After the str-to-list conversions, pyright re-reads
the TypedDict field as the original wide type rather than carrying the
narrowing forward. Cast to `list[Any]` to express the codebase's
existing str-or-list assumption.

Drops anthropic_adapter.py from 23 to 21 pyright errors.
2026-05-01 09:36:14 -04:00
Paul Kompfner
70aeb5c7c2 fix: resolve pyright errors in Anthropic get_messages_for_logging
Content items in MessageParam have a heterogeneous union type (Pydantic
ContentBlock variants and TypedDict *BlockParam variants), neither of
which supports the dict-style access and mutation this sanitizer does.
Treat the deepcopied message as a plain dict and guard each content
item with isinstance(item, dict) — matches the runtime shape produced
by _from_standard_message and avoids crashing if a non-dict ever flows
through the LLMSpecificMessage path.

Drops anthropic_adapter.py from 115 to 23 pyright errors.
2026-05-01 09:36:14 -04:00
Mark Backman
440738f727 Update changelog for #4401: split fix and default-model change 2026-05-01 09:19:27 -04:00
Mark Backman
7da94436f5 Add changelog for #4401 2026-05-01 09:18:34 -04:00
Mark Backman
492c9702ee fix(xai/realtime): pass model as query param on connect
xAI's Voice Agent API selects the model via the ?model= query
parameter on the WebSocket URL; it cannot be changed later via
session.update. The Grok Realtime service was setting the model in
Settings but never including it in the connection URL, so every
session silently fell back to the deprecated default
grok-voice-fast-1.0.

Append the model from Settings to the WebSocket URL on connect, and
default to the recommended grok-voice-think-fast-1.0.
2026-05-01 09:16:52 -04:00
Mark Backman
f1eef9ba0a Merge pull request #4400 from pipecat-ai/mb/deepgram-tts-mip-opt-out
feat(deepgram): add mip_opt_out to TTS services
2026-05-01 09:12:03 -04:00
Mark Backman
132b9b1002 Add changelog for #4400 2026-05-01 08:58:38 -04:00
Mark Backman
eb4e56d2d9 feat(deepgram): expose mip_opt_out on TTS services
Adds a `mip_opt_out` init parameter to both `DeepgramTTSService` (WebSocket)
and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
Improvement Program. When set, the value is forwarded as a query parameter
on the request, matching the pattern used by the Deepgram STT services.
2026-05-01 08:55:23 -04:00
Mark Backman
13643b192b Add changelog for #4397 2026-04-30 21:41:28 -04:00
kompfner
6d66bbceeb Merge pull request #4395 from pipecat-ai/pk/app-resources-api-updates
Broaden tool_resources to app_resources
2026-04-30 21:19:05 -04:00
Mark Backman
6cab2ce3f7 chore(smallwebrtc): lower app message log to trace level
App messages can be high-frequency, so logging each one at debug is noisy.
2026-04-30 21:06:47 -04:00
Aleix Conchillo Flaqué
a27d9fc30b Merge pull request #4396 from pipecat-ai/aleix/remove-unused-user-mute-reset
refactor(user_mute): remove unused reset() method from strategies
2026-04-30 17:27:54 -07:00
Aleix Conchillo Flaqué
2a8f4734e0 refactor(user_mute): remove unused reset() method from strategies
The reset() method on BaseUserMuteStrategy and its subclasses was never
called anywhere in the codebase.
2026-04-30 16:31:29 -07:00
Mark Backman
48ac68e3c8 Merge pull request #4393 from pipecat-ai/mb/fix-smart-turn-import
fix(turns): defer LocalSmartTurnAnalyzerV3 import to fix transformers warning
2026-04-30 16:40:17 -04:00
Paul Kompfner
c3ef199efa Add changelog for #4395 2026-04-30 16:19:35 -04:00
Paul Kompfner
1b5c4cfa2a feat: broaden tool_resources to app_resources
Broaden `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s.

Involves 3 changes:

- A rename: `tool_resources` -> `app_resources`
- A new property on `PipelineTask`: `app_resources`
- A new property on `FrameProcessor`: `pipeline_task`

Usage in tool handler:

    async def get_weather(params: FunctionCallParams):
        resources = cast(MyAppResources, params.app_resources)
        ...

Usage in custom `FrameProcessor`:

    class MyProcessor(FrameProcessor):
        async def process_frame(self, frame, direction):
            await super().process_frame(frame, direction)
            if self.pipeline_task is not None:
                resources = cast(MyAppResources, self.pipeline_task.app_resources)
                ...

The previous `tool_resources` aliases (on `PipelineTask`,
`FunctionCallParams`, and `FrameProcessorSetup`) keep working but are
deprecated as of 1.2.0 and emit `DeprecationWarning`s.
2026-04-30 16:16:17 -04:00
Mark Backman
6e9dd1dbcc Merge pull request #4390 from pipecat-ai/mb/cartesia-tts-api-updates
feat(cartesia): align TTS services with latest API and buffering guidance
2026-04-30 15:59:15 -04:00
Mark Backman
6487f895b3 Setting use_normalized_timestamps to False so that input and output text match 2026-04-30 14:21:14 -04:00
Mark Backman
351105a975 test(krisp): scope importlib.metadata.version mock to imports only
The four krisp test files installed a process-wide mock of
importlib.metadata.version with `patch(...).start()` at module level and
never called .stop(). Once any of these files was collected, the mock
leaked across the rest of the test session, returning '0.0.0-dev' for
every version check. This corrupted unrelated tests that triggered
transformers' import-time dependency check (e.g. lazy imports of
LocalSmartTurnAnalyzerV3) — transformers saw tqdm=='0.0.0-dev' and
refused to load.

Wrap the pipecat imports in `with patch(...)` so the mock is active
during import (where pipecat's krisp version check needs it) and torn
down before any tests run.
2026-04-30 14:16:54 -04:00
Mark Backman
8ea963852d Add changelog for #4393 2026-04-30 14:16:46 -04:00
Mark Backman
6f4458f21d fix(turns): defer LocalSmartTurnAnalyzerV3 import to avoid loading transformers at module load
Importing pipecat.turns.user_turn_strategies pulled in
LocalSmartTurnAnalyzerV3 → transformers → onnxruntime at module load
time. Since this module is imported by llm_response_universal (and
therefore most LLM services), any LLM service import paid the cost of
loading transformers and triggered its missing-backend warning in
environments without PyTorch/TF/Flax.

Move the LocalSmartTurnAnalyzerV3 import into
default_user_turn_stop_strategies() so it only loads when the default
smart-turn strategy is actually constructed.

Fixes #4392
2026-04-30 14:16:41 -04:00
Mark Backman
fb42a7dcf3 Add changelog for #4390 2026-04-30 09:45:16 -04:00
Mark Backman
21547c8680 fix(cartesia): stop double-yielding ErrorFrame on HTTP non-200
The non-200 branch yielded an ErrorFrame and then raised, which the outer
except caught and yielded a second, less informative "Unknown error" frame.
Return after the yield and fold the status code into the message.
2026-04-30 09:41:43 -04:00
Mark Backman
3e5aabc5f2 fix(cartesia): guard HTTP session before use
Pyright flagged the .post() call on a possibly-None _session. Raise a
clear RuntimeError if start() wasn't called instead of crashing on the
attribute access.
2026-04-30 09:34:02 -04:00
Mark Backman
e508642b0a refactor(cartesia): mark tag helpers as @staticmethod
SPELL/EMOTION_TAG/PAUSE_TAG/VOLUME_TAG/SPEED_TAG are stateless and worked
only via class-level access. Decorating them lets instance access work too
and silences the missing-self lint warning.
2026-04-30 09:31:22 -04:00
Mark Backman
e546541e20 feat(cartesia): align WebSocket TTS with latest API and buffering guidance
- Bump default cartesia_version to 2026-03-01.
- Replace deprecated use_original_timestamps with use_normalized_timestamps
  so word timestamps match what was actually spoken.
- Add max_buffer_delay_ms init arg; auto-derive 0 in SENTENCE mode to avoid
  the doc-warned "middle ground" of client + server buffering, leave unset
  in TOKEN mode for managed buffering.
- Silently consume flush_done messages now emitted per transcript when
  server-side buffering is disabled.
2026-04-30 09:25:31 -04:00
Mark Backman
bfdd19464f Merge pull request #4385 from pipecat-ai/mb/runner-session-id
feat(runner): add session_id to RunnerArguments
2026-04-29 13:17:47 -04:00
Mark Backman
1a93ff52f1 Merge pull request #4386 from pipecat-ai/mb/update-soniox-model
feat(soniox): update default TTS model to tts-rt-v1
2026-04-29 13:17:09 -04:00
Mark Backman
6e2008a7a6 Add changelog for #4386 2026-04-29 11:09:38 -04:00
Mark Backman
da8d3a2d80 feat(soniox): update default TTS model to tts-rt-v1
Promotes the Soniox TTS default model from `tts-rt-v1-preview` to the
generally available `tts-rt-v1`.
2026-04-29 11:05:12 -04:00
Mark Backman
6b608e7e22 Add changelog for #4385 2026-04-29 09:53:42 -04:00
Mark Backman
924b9a9d8c feat(runner): add session_id to RunnerArguments
Adds a `session_id: str | None` field to `RunnerArguments` so bots can
log/trace a per-session identifier in local development the same way
they can in Pipecat Cloud (where it is provided via the
`x-daily-session-id` header).

The local runner now mints a UUID at every `*RunnerArguments`
construction site. For paths that already returned a `sessionId` to the
caller (Daily `/start`, dial-in webhook), a single UUID is now generated
and shared between `runner_args.session_id` and the response body
instead of being thrown away. The SmallWebRTC `/api/offer` endpoint
accepts an optional `session_id` so the `/sessions/{session_id}/...`
proxy can thread it through.

This is the prerequisite step for collapsing pipecat-cloud's
`SessionArguments` / `*SessionArguments` hierarchy onto the upstream
runner types.
2026-04-29 09:45:55 -04:00
Aleix Conchillo Flaqué
9411c4b67e Merge pull request #4382 from pipecat-ai/aleix/unfill-changelog-script
chore(changelog): add release-changelog.py and fix (PR line indentation in towncrier template
2026-04-28 13:18:49 -07:00
Mark Backman
ac5eb97670 Merge pull request #4384 from pipecat-ai/mb/nvidia-remove-riva-ref
Update README to remove NVIDIA references to RIVA
2026-04-28 13:18:36 -04:00
Mark Backman
3034f8bb3b Update README to remove NVIDIA references to RIVA 2026-04-28 12:42:58 -04:00
Aleix Conchillo Flaqué
60c66eda48 chore(towncrier): indent (PR ref line by two spaces in template
So the rendered changelog has the (PR [...]) line aligned as a list
continuation under its bullet. Verified with both short and wrapped
entries via `towncrier build --draft`.
2026-04-27 15:07:53 -07:00
Aleix Conchillo Flaqué
ea3585146c chore(scripts): add release-changelog.py
Adds a script to unfill (single-line) entry paragraphs in CHANGELOG.md
while keeping `(PR [...])` on its own continuation line.
2026-04-27 15:07:53 -07:00
Aleix Conchillo Flaqué
9697abe559 Merge pull request #4381 from pipecat-ai/changelog-1.1.0
Release 1.1.0 - Changelog Update
2026-04-27 14:02:20 -07:00
aconchillo
cb0335c82a Update changelog for version 1.1.0 2026-04-27 13:59:17 -07:00
Aleix Conchillo Flaqué
f560614af9 Merge pull request #4379 from pipecat-ai/aleix/bump-daily-python-0.28
chore(daily): bump daily-python to ~=0.28.0
2026-04-27 13:46:00 -07:00
Aleix Conchillo Flaqué
d7a196a3f4 docs(changelog): add entry for daily-python 0.28.0 bump 2026-04-27 13:35:14 -07:00
Aleix Conchillo Flaqué
644e106c03 chore(daily): bump daily-python to ~=0.28.0 2026-04-27 13:35:14 -07:00
Mark Backman
70f83b4a75 Merge pull request #4360 from pipecat-ai/mb/soniox-tts
Add Soniox real-time TTS service
2026-04-27 16:06:24 -04:00
Mark Backman
35ed37c539 chore: add changelog fragment for PR #4360 2026-04-27 16:04:02 -04:00
Mark Backman
58a038ddb2 Add Soniox real-time TTS service
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
2026-04-27 16:04:02 -04:00
Aleix Conchillo Flaqué
de3c1d6e8b Merge pull request #4370 from pipecat-ai/aleix/daily-screen-video-destination
feat(daily): support screenVideo destination and configurable camera send settings
2026-04-27 11:36:22 -07:00
Aleix Conchillo Flaqué
0a9878998f docs(changelog): add entries for camera_out_send_settings and video_out_bitrate deprecation 2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
8459c01af8 feat(daily): add camera_out_send_settings and deprecate video_out_bitrate
Replaces the hardcoded camera publishing send settings in
DailyTransport with a new DailyParams.camera_out_send_settings dict that
applications can pass through verbatim to the Daily client. This makes
the encoding/codec/bitrate configuration user-controllable instead of
being driven solely by the generic TransportParams fields.

As a consequence, TransportParams.video_out_bitrate is deprecated for
the Daily transport (now configured via camera_out_send_settings) and
its default is changed to None.
2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
baaabf7d73 docs(changelog): add entry for screenVideo destination support 2026-04-27 11:28:59 -07:00
Aleix Conchillo Flaqué
4735b74776 feat(daily): support screenVideo destination for video output
Adds a dedicated screen video track alongside the existing camera track
so applications can publish to Daily's built-in "screenVideo" destination
via video_out_destinations. The track is created at join time and wired
into the client settings (inputs and publishing) when "screenVideo" is
configured; write_video_frame routes frames to the appropriate track
based on the frame's transport_destination.
2026-04-27 11:28:59 -07:00
kompfner
0109aea04c Merge pull request #4377 from pipecat-ai/pk/add-example-for-tool-resources
Add example demonstrating usage of `tool_resources`
2026-04-27 13:02:39 -04:00
kompfner
ce1311f6ba Merge pull request #4301 from bnovik0v/fix-4300-missing-tool-lifecycle
Fail missing tool calls cleanly
2026-04-27 11:54:43 -04:00
Paul Kompfner
2520243d9d style: apply ruff format 2026-04-27 11:48:27 -04:00
borislav
8869e25142 fix: compare bound method by equality, not identity
Bound methods are created fresh on each attribute access, so
'self._missing_function_call_handler is self._missing_function_call_handler'
is always False. Using 'is' meant the placeholder branch never fired and
both warnings logged when a function was missing at queue time.

Switch to == so equality compares the underlying function and instance.
Strengthen the missing-at-queue-time test to assert the second warning
does not fire.
2026-04-27 17:34:31 +02:00
borislav
822392b0d4 fix: re-resolve registry item at execution time
Address review feedback: a function may be unregistered between when
run_function_calls queues it and when _run_function_call executes it.
Restore the live lookup, falling back to the missing-function handler
when the entry is gone, so the call still terminates with a normal
tool result. Factor the missing-handler item construction into a
helper since it's now built in two places.
2026-04-27 17:22:30 +02:00
Paul Kompfner
124863175a Add example demonstrating usage of tool_resources 2026-04-27 11:20:53 -04:00
kompfner
17a5e78fb4 Merge pull request #4376 from pipecat-ai/pk/add-openai-responses-to-readme
Add OpenAI responses to readme
2026-04-27 11:02:39 -04:00
kompfner
bc29bdb95e Merge pull request #4371 from Stoic-Angel/feat-global-context
Add a global context for tool calls: tool_resources
2026-04-27 10:55:03 -04:00
Paul Kompfner
005fe33b25 Update docs URLs in README to reflect new docs site structure and avoid redirects 2026-04-27 10:22:49 -04:00
Paul Kompfner
24154474c9 Add OpenAI Responses to the README's list of LLM services 2026-04-27 10:19:13 -04:00
kompfner
86effc4d10 Merge pull request #4015 from prettyprettyprettygood/feat/nova-sonic-session-continuation
feat(nova-sonic): add proactive session continuation for conversation…
2026-04-27 09:36:48 -04:00
Mark Backman
58e50882d8 Merge pull request #4374 from pipecat-ai/mb/fix-daily-runner-room-props
Expire runner-created Daily rooms after 4h
2026-04-27 09:07:31 -04:00
Mark Backman
ef183d0c96 Add changelog for #4374 2026-04-27 09:00:17 -04:00
Mark Backman
f078df7805 runner: expire Daily rooms after 4h to mirror Pipecat Cloud session limit
Runner-created Daily rooms previously had no expiration when callers
posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`).
The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's
cron never cleaned them up, so rooms accumulated indefinitely.

Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`,
inject `exp` and `eject_at_room_exp=True` into user-supplied properties via
`setdefault` (so explicit caller values still win), and pass
`room_exp_duration` to all four `configure()` call sites.
2026-04-27 09:00:17 -04:00
Mark Backman
815cd44c2a Merge pull request #4372 from pipecat-ai/mb/relax-frames-proto-5x
Relax protobuf pin to support both 5.x and 6.x runtimes
2026-04-27 08:58:23 -04:00
Garegin Harutyunyan
e5941926be Krisp tt demo tool (#4335)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* TT demo tool

* Some improvements for demo scripts, audio recordin, etc.

* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.

* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.

* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.

* Add audio resampling functionality and update demo scripts for improved audio processing

- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.

* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic

- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.

* Refactor audio processing scripts for improved readability and consistency

- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.

* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity

- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.

* more format fixes.

* removed demo scripts.

* reverted wrongly removed file.

* Corrected the IP integration logic.

* style fix.

* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy

- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.

* FIxed formatting

---------

Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
2026-04-27 08:14:00 -04:00
Mark Backman
6266c026a6 Merge pull request #4362 from ai-coustics/ai-coustics/aic-sdk-py-v2.2.0-update
Update aic-sdk to v2.2.0
2026-04-25 06:51:41 -04:00
Gökmen Görgen
e25dccfc6b update aic-sdk to ~=2.2.0 and rename AICOUSTICS_LICENSE_KEY to AIC_LICENSE_KEY. 2026-04-25 10:13:06 +02:00
Gökmen Görgen
3bbfc42854 remove adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 10:05:47 +02:00
Gökmen Görgen
3b2127f912 rename environment variables and references from AICOUSTICS to AIC. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
ea12b10742 rename mcp-aic-adaptive.py to mcp-aicoustics-adaptive.py. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
a2fbed86cf add adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
f75f361629 bump aic-sdk to 2.2.0 and update AICFilter with model_id and enhancement_level changes. 2026-04-25 09:51:23 +02:00
Mark Backman
4c153e5d3c Add changelog for #4372 2026-04-24 21:20:46 -04:00
Mark Backman
4088992d97 Relax protobuf pin to support both 5.x and 6.x runtimes
Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7`
pin, blocking users whose dependency graph already constrains protobuf to
the 5.x line. The original bump (PR #4136) was only needed because
`nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1.

Changes:

- Widen base pin to `protobuf>=5.29.6,<7`.
- Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per
  Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x
  and 6.x runtimes, so this single artifact serves all users.
- Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can
  install `pipecat[dev,nvidia]` without resolver conflict. Comment in
  `frames.proto` documents the 1.67.x requirement for regeneration.
- Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This
  compensates for nvidia-riva-client's missing `protobuf` install
  requirement (upstream packaging gap, see
  https://github.com/nvidia-riva/python-clients/issues/172). When that
  issue is resolved, the explicit protobuf entry in the `nvidia` extra
  can be removed.

Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6;
`tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds
when `pipecat[nvidia]` is installed.
2026-04-24 21:15:32 -04:00
Osman Ipek
f1b16a672a feat(nova-sonic): add proactive session continuation for conversations >8min
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.

Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
  session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
  buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s

Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
  (independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
  to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
  optional stream/prompt_name params, eliminating duplicate SC methods

Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
2026-04-24 14:55:55 -07:00
Aayush Jain
65b15a8528 add changelog 2026-04-25 02:23:25 +05:30
Aayush Jain
108e32eb72 Add a global context for tool calls - tool_resources, as a parameter to PipelineTask and FrameProcessorSetup 2026-04-25 02:12:40 +05:30
Filipi da Silva Fuchter
38a02271c5 Merge pull request #4368 from pipecat-ai/filipi/stt_service
Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.
2026-04-24 14:31:36 -03:00
filipi87
2ce203aeb8 Renaming the method to _maybe_reconnect_on_user_stopped_speaking. 2026-04-24 13:08:32 -03:00
filipi87
b30df95f13 Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService. 2026-04-24 13:00:38 -03:00
kompfner
6be8deee2a Merge pull request #4361 from pipecat-ai/pk/pyright-fixes
Some pyright fixes
2026-04-24 11:58:28 -04:00
Paul Kompfner
c113cacd59 refactor(types): name the LLMContext/OpenAI boundary with explicit cast helpers
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.

Introduce two module-private cast helpers in open_ai_adapter.py:

- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)

Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.

No behavior change, no pyright error delta.
2026-04-24 10:10:03 -04:00
Paul Kompfner
d0495eeef6 fix(types): narrow voice in SpeechmaticsTTSSettings to disallow None
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.

Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
2026-04-23 21:08:47 -04:00
Paul Kompfner
c3eb69165c fix(types): accept SDK NotGiven in LLM Settings fields used for passthrough
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.

Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):

- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
  temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
  max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.

Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.

Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
2026-04-23 18:32:46 -04:00
Paul Kompfner
0302f6d05c chore(pyright): drop newly-clean files from ignore list
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).

- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
2026-04-23 18:18:00 -04:00
Paul Kompfner
b9ff333654 fix(types): admit None on settings fields that accept it as a default
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.

Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.

Fields touched across 11 files:

- services/settings.py: TTSSettings.voice (base class; covers
  asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
  neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
  style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
  max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
  split_sentences, enable_diarization, speaker_*, max_speakers,
  prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.

Clears 94 pyright errors (1035 -> 941).
2026-04-23 18:18:00 -04:00
Paul Kompfner
92610944af chore(pyright): drop newly-clean files from ignore list
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).

- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
2026-04-23 17:44:17 -04:00
Paul Kompfner
6a337f1bc6 fix(types): assert_given at store-mode settings read sites
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).

Two idioms used:

- Inline wrap for single uses:
    func(assert_given(self._settings.enable_prompt_caching), ...)

- Extract-and-reuse when the same value is used multiple times:
    thinking = assert_given(self._settings.thinking)
    if thinking:
        params["thinking"] = thinking.model_dump(...)

43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
2026-04-23 17:39:17 -04:00
Filipi da Silva Fuchter
ef7fa07bf7 Merge pull request #4358 from pipecat-ai/filipi/fix_aiortc_sctp
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU
2026-04-23 17:49:18 -03:00
filipi87
ce1506792e Linking to the docs instead of full explanation. 2026-04-23 17:46:54 -03:00
Paul Kompfner
70f3d32734 feat(types): add assert_given for narrowing store-mode settings reads
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.

assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.

    resolved_model = assert_given(self._settings.model)  # str | None
2026-04-23 16:40:07 -04:00
Paul Kompfner
356618b448 fix(types): use is_given at call sites pyright flagged
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.

- adapters/services/anthropic_adapter.py: narrow converted.system for
  _resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
  is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
  is_given (aliased as openai_is_given alongside the existing
  settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
2026-04-23 16:15:07 -04:00
Paul Kompfner
1624d7a474 feat(types): add is_given TypeGuard helpers for NotGiven sentinels
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.

Each helper is co-located with the NotGiven flavor it guards:

- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
  LLMContext's NotGiven. Treat LLMContext's re-exported types
  (LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
  LLMContext's own — independent definitions that happen to coincide
  with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
  NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
  NotGiven.
2026-04-23 15:33:43 -04:00
Paul Kompfner
092b1dcb0f fix(types): widen TLLMInvocationParams bound to Mapping[str, Any]
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
2026-04-23 14:35:59 -04:00
Mark Backman
b90ea9bf6a Merge pull request #4352 from pipecat-ai/mb/pyright-fixes-1-per-file
More pyright fixes
2026-04-23 14:14:36 -04:00
kompfner
05c97804d5 Merge pull request #4359 from pipecat-ai/pk/changelog-4355-rename
chore: rebind Gemini Live reconnect changelog fragment to PR #4355
2026-04-23 14:10:36 -04:00
Paul Kompfner
7a8357a569 chore: rebind Gemini Live reconnect changelog fragment to PR #4355
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
2026-04-23 12:00:56 -04:00
filipi87
44756de15a Adding changelog for the SmallWebRTC fix. 2026-04-23 12:19:56 -03:00
filipi87
94304ec74e Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU. 2026-04-23 12:18:33 -03:00
kompfner
a3fe34f4a2 Merge pull request #4355 from pipecat-ai/pk/gemini-live-context-reseed-on-reconnect
Re-seed Gemini Live context on reconnect without session resumption
2026-04-23 11:00:22 -04:00
Sathwika Reddy Geereddy
21f6c2afa5 Update NVIDIA STT services for Nemotron Speech defaults and config parity (#4269)
* Update NVIDIA STT services for Nemotron Speech defaults and config parity

* Add changelog entry for PR #4269

* initialize boosted LM settings defaults in streaming STT

* Align NVIDIA STT language handling with other STT services

* add finalised flag to Nvidia stt final transcripts, remove processing latency logs

* Changing interim transcription logging to tracing.

---------

Co-authored-by: sathwika <geereddysath@nvidia.com>
Co-authored-by: filipi87 <filipi87@gmail.com>
2026-04-23 09:01:27 -04:00
Filipi da Silva Fuchter
4d14251f4a Merge pull request #4354 from pipecat-ai/filipi/includes_inter_frame_spaces
feat(tts): add includes_inter_frame_spaces flag to word-timestamp API - follow-up
2026-04-23 08:49:26 -03:00
Paul Kompfner
1421c4ba22 fix: handle Gemini Live 2.5 quirks when re-seeding context on reconnect
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.

On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
2026-04-22 15:58:54 -04:00
filipi87
6b1d8d9fa5 Fixing merge conflicts. 2026-04-22 15:22:32 -03:00
filipi87
ac810e57ed Merge branch 'main' into filipi/includes_inter_frame_spaces
# Conflicts:
#	uv.lock
2026-04-22 15:22:06 -03:00
filipi87
bba7ca80e3 Bumping to small-webrtc-prebuilt 2.5.0 to fix karaoke highlighting. 2026-04-22 15:20:37 -03:00
filipi87
79250f1fe0 Making includes_inter_frame_spaces optional for word-timestamp. 2026-04-22 14:20:30 -03:00
Mark Backman
4f6e76e6fd Add changelog entries for #4352 2026-04-22 12:23:33 -04:00
Mark Backman
b0962861c8 Acknowledge Tkinter's GC-reference idiom with a scoped type ignore
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).

Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
2026-04-22 12:19:16 -04:00
Mark Backman
ec7c35fe98 Move Mistral message fixups into MistralLLMAdapter
Mistral imposes three conversation-history quirks on top of the
OpenAI-compatible wire format: tool messages must be followed by an
assistant message; non-initial system messages are rejected; trailing
assistant messages require `prefix=True`. These rules were applied
inline in `MistralLLMService.build_chat_completion_params`, which is the
wrong layer — every other provider with OpenAI-compatible-but-quirky
shape (Perplexity, etc.) owns its transformations in a
`BaseLLMAdapter` subclass that runs during `get_llm_invocation_params`.

Create `MistralLLMAdapter(OpenAILLMAdapter)` on the Perplexity template
and wire it in via the existing `adapter_class` dispatch. The service
now only handles Mistral-specific request-level mapping (`random_seed`
in place of `seed`), and the message shape concerns live with other
provider format logic.

No behavior change. The transform function casts to `list[dict[str,
Any]]` internally because mutating `role` and attaching Mistral's
non-standard `prefix` field both step outside OpenAI's TypedDict
contract; the cast at the return boundary encodes that we're emitting
Mistral's extended schema, not OpenAI's.
2026-04-22 12:17:46 -04:00
Mark Backman
10b86b4bbe Coerce inspect.getdoc() None to empty string before parsing
`inspect.getdoc()` returns `str | None`, but `docstring_parser.parse()`
requires `str`. Functions without a docstring produced `None`, which
the type checker correctly flagged.

Coerce to `""` at the call site. `docstring_parser.parse("")` returns
an empty docstring whose `.description` and `.params` are already
handled by the surrounding `or ""` fallbacks, so runtime behavior is
unchanged.
2026-04-22 12:01:00 -04:00
Mark Backman
8ec56092c0 Remove duplicate ResponseCreated type 2026-04-22 11:58:15 -04:00
Mark Backman
0c3c5e5c7d Widen ToolsSchema.standard_tools to Sequence for covariance
`ToolsSchema.__init__` declared `standard_tools: list[FunctionSchema |
DirectFunction]`. Callers (`BaseLLMAdapter`, `MCPService`) pass in
`list[FunctionSchema]`, which is not assignable to the union list
because `list` is invariant in its element type.

Widen the parameter to `Sequence[...]` (covariant) so `list[X]` and
`list[X | Y]` both fit. A narrower `list[FunctionSchema]` is still
accepted, and nothing in this class mutates the argument — the
constructor immediately copies it via `_map_standard_tools`.

Also correct the `custom_tools` property return type to include
`None`, matching the stored `_custom_tools` field.

This single edit clears the pyright errors for three ignore-list
entries: `tools_schema.py`, `base_llm_adapter.py`, and `mcp_service.py`.
2026-04-22 11:54:20 -04:00
Mark Backman
b64ed3f9e2 Narrow settings.model at service boundaries, not via truthiness
Two services were reading `_settings.model` (typed `str | _NotGiven |
None` because NOT_GIVEN is the default) and coercing it with `or ""`
or similar. `_NotGiven.__bool__` returns False, so the runtime
behavior happened to work, but the type was a lie — pyright saw
`str | _NotGiven` flowing into APIs that required `str` or `str | None`.

- `AIService._sync_model_name_to_metrics`: use `isinstance(model, str)`
  narrowing with an empty-string fallback. Equivalent runtime behavior,
  honest type, no truthiness dependency on a sentinel.
- `SarvamLLMService.__init__`: validate the model is a real string
  before handing it to `_validate_model(str)`. A non-string model at
  this point is a configuration bug; raise `ValueError` so the error
  is clear and survives `python -O` (unlike an assert).
2026-04-22 11:52:20 -04:00
Mark Backman
5872006d6b Encode lazy-init invariants at the right site, not at read sites
Three spots had the same shape: a field starts None, a later method
populates it, a read site later reads it. Pyright can't track the
cross-method invariant. Rather than spray assertions at the read
sites, fix each site at the structural level:

- `FastAPIWebsocketInputTransport._monitor_websocket` now takes the
  session timeout as an argument. The task-creation site already
  guards on truthiness, so the call can pass the non-None value
  directly and the method's signature tells the truth.
- `FrameProcessorMetrics.task_manager` raises `RuntimeError` instead
  of asserting. Asserts are stripped under `python -O`; a real raise
  keeps the runtime safety net and still narrows the type for pyright.
- `SOXRStreamAudioResampler._maybe_initialize_sox_stream` returns the
  initialized stream. Callers use the return value and never touch
  the Optional `_soxr_stream` attribute, so narrowing stays inside
  the init method where the invariant is established.
2026-04-22 11:45:18 -04:00
Mark Backman
457eb7aa92 Mark abstract image/vision generators as real async generators
`ImageGenService.run_image_gen` and `VisionService.run_vision` were
declared `async def ... -> AsyncGenerator[Frame, None]` with `pass`
bodies. Without a `yield` anywhere in the body, Python treats the
function as a coroutine returning an `AsyncGenerator`, not as an async
generator itself, so callers got a coroutine where they expected an
iterator.

Add `raise NotImplementedError; yield` so the body contains a yield
(making this a real async generator) while still raising cleanly if a
subclass ever calls `super().run_*` by mistake.
2026-04-22 11:19:23 -04:00
Mark Backman
14cd476b20 Drop pyright ignores for services fixed by run_stt/run_tts widening
Deepgram STT, Gradium TTS, Smallest STT, and xAI STT/TTS had exactly
one pyright error each, all of them the AsyncGenerator return-type
mismatch resolved in 08fe9157c. Remove them from the ignore list.
2026-04-22 11:09:27 -04:00
Mark Backman
3b0affe5b4 Guard run_stt WebSocket sends with try/except
AssemblyAI, Cartesia, Gradium, and Soniox STT services sent audio over
the WebSocket without catching transient send failures, so a single
network hiccup could propagate an exception up through process_frame
and end the pipeline. Other push-based STT services (Deepgram, xAI,
Azure, Smallest, etc.) already guard their sends.

Follow the deepgram/stt.py pattern: log a warning and continue. The
existing connection-state check at the top of each call handles
recovery on the next invocation.
2026-04-22 11:03:41 -04:00
Mark Backman
08fe9157cc Widen run_stt/run_tts return type to AsyncGenerator[Frame | None, None]
The push-based STT/TTS implementations send audio/text over a socket and
receive results via a separate receive task, so there is nothing to
yield inline. They yield `None` by design. The previous declaration of
`AsyncGenerator[Frame, None]` disagreed with that, while the consumer
(`AIService.process_generator`) already accepted `Frame | None`. Widen
the producer side (abstract base and every subclass) so the type honestly
describes the contract.

Pure annotation change; no runtime behavior difference.
2026-04-22 11:01:50 -04:00
Mark Backman
3f3d3c9203 Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy
Split user-turn stop timeout into independent speech and STT timers
2026-04-22 10:23:03 -04:00
Mark Backman
6b6896a543 Merge pull request #4350 from pipecat-ai/mb/pyright-precise-ignore-list
Expand pyright coverage to full src/pipecat with per-file ignores
2026-04-22 09:56:59 -04:00
Filipi da Silva Fuchter
7858813871 Merge pull request #4270 from sathwikareddy02/nvidia-llm-update
Enhance NVIDIA LLM reasoning tokens handling and allow keyless local …
2026-04-22 10:47:54 -03:00
Mark Backman
7bba74ebd6 Expand pyright coverage to full src/pipecat with per-file ignores
Previously, six modules (adapters, audio, processors, serializers,
services, transports) were ignored wholesale. Many files in those
modules already pass type checking, but we had no way to protect them
from regressions or make the remaining work visible.

Switch the include list to src/pipecat so any new module is checked by
default, and replace directory-level ignores with the 140 specific
files that still fail. This puts 189 previously-untyped files under
type checking immediately and turns the remaining work into a concrete,
shrinking TODO list.
2026-04-22 09:45:31 -04:00
Mark Backman
f425e946eb Merge pull request #4349 from pipecat-ai/mb/serializer-pyright
Fix type errors in serializers and add to pyright checked set
2026-04-22 09:43:31 -04:00
Filipi da Silva Fuchter
75bd1b5b9b Merge pull request #4323 from dakshdua/daksh/allow-noninitial-whitespace-chunks
fix: when aggregating by tokens, allow inter-token whitespace once non-whitespace has been sent
2026-04-22 10:27:08 -03:00
filipi87
d953c201bd Adding changelog entry to the fix. 2026-04-22 10:24:21 -03:00
Mark Backman
263cad41f0 Add changelog for #4349 2026-04-21 18:14:15 -04:00
Mark Backman
df9642eb5a Fix type errors in serializers and add to pyright checked set
Moves src/pipecat/serializers into pyright's include list. Narrows
self._params to each subclass's InputParams in exotel, vonage, plivo,
twilio, genesys, and telnyx. In protobuf.py, renames the reassigned
frame local to avoid clobbering its Frame type and silences two dynamic
attribute accesses on the generated frames_pb2 module.

Also aligns telnyx and plivo hangup validation with twilio: if
auto_hang_up=True (the default) but required credentials are missing,
__init__ now raises ValueError instead of silently logging a warning
at call-end time. Previously a misconfigured serializer would construct
fine and fail to hang up the call later, leaving a phantom billable
session.
2026-04-21 18:12:54 -04:00
Mark Backman
dcbe86d0fc Unify fallback timeout into the user-speech timer
Collapse the separate fallback timer into the existing user_speech_timeout
timer, restarted when a transcript arrives without a VAD stop. stt_timeout
has no meaning on the fallback path, so the stt wait is marked done
immediately. This drops the _fallback_timeout_task / _fallback_expired
bookkeeping and the branched trigger condition.
2026-04-21 17:33:12 -04:00
Mark Backman
7fc79511dd Merge pull request #4348 from pipecat-ai/mb/pyright-scripts-docs
Fix type errors in scripts and add to pyright checked set
2026-04-21 16:56:49 -04:00
Mark Backman
4d9dc64af8 Install all extras in format workflow for pyright
CI was running `uv sync --group dev` without extras. Adds daily and
tracing to extras.
2026-04-21 16:53:57 -04:00
Mark Backman
21f5cfe21a Fix type errors in utils and add to pyright checked set 2026-04-21 16:47:12 -04:00
Mark Backman
308044808d Rename to _user_speech_wait_done 2026-04-21 16:39:30 -04:00
Mark Backman
c244a950eb Add src/pipecat/tests to include list, alphabetize list 2026-04-21 16:24:53 -04:00
Mark Backman
847bd8af4b Remove src/pipecat/sync which doesn't exist 2026-04-21 16:21:46 -04:00
Mark Backman
10e58d6e42 Fix type errors in scripts and add to pyright checked set 2026-04-21 16:17:49 -04:00
Mark Backman
609a0a14e7 Merge pull request #4341 from pipecat-ai/mb/xai-tts
Add XAITTSService for xAI streaming WebSocket TTS
2026-04-21 15:52:37 -04:00
Mark Backman
84891de04d Add voice/xai-http.py to release evals 2026-04-21 15:49:59 -04:00
Mark Backman
9a49517609 Add changelog entry for #4341 2026-04-21 15:48:27 -04:00
Mark Backman
d8f5c0be71 Add XAITTSService for xAI streaming WebSocket TTS
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.

Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.

Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.

Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
2026-04-21 15:48:26 -04:00
Mark Backman
93393ea91c Merge pull request #4338 from pipecat-ai/mb/fix-examples-types
Include examples in type checking
2026-04-21 15:47:10 -04:00
Mark Backman
58a17c7b1b Include examples in type checking
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:

- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
  return type is `str` rather than `str | None`, and misconfiguration
  fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
  before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
  methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
  before use.
2026-04-21 15:43:31 -04:00
Mark Backman
103ced1eaa Merge pull request #4347 from pipecat-ai/mb/deepgram-stt-keepalive-unbound 2026-04-21 15:15:55 -04:00
Mark Backman
ac9bea27aa Merge pull request #4340 from pipecat-ai/mb/xai-stt
Add xAI streaming STT service
2026-04-21 14:52:38 -04:00
Mark Backman
648094da26 Add changelog for #4347 2026-04-21 14:51:30 -04:00
Mark Backman
29d604f608 Fix UnboundLocalError in Deepgram STT connection handler
If the WebSocket handshake is cancelled or fails before `keepalive_task`
is assigned (e.g. an STTUpdateSettingsFrame triggers a reconnect during
initial connect), the `finally` block tried to cancel an unbound local.

Initialize `keepalive_task = None` before the try and guard the cancel.
2026-04-21 14:48:55 -04:00
Mark Backman
b838bd906b Add changelog for #4340 2026-04-21 13:45:34 -04:00
Mark Backman
c091232f2f Add xAI streaming STT service
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.

- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
  `language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
  the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
  example using the new service.
2026-04-21 13:45:34 -04:00
Mark Backman
8e247f395b Merge pull request #4344 from pipecat-ai/mb/11labs-normalized-alignment 2026-04-21 13:41:04 -04:00
Mark Backman
b0e3b69b35 Merge pull request #4342 from pipecat-ai/mb/docs-workflow-label 2026-04-21 13:40:38 -04:00
kompfner
9213b22852 Merge pull request #4346 from pipecat-ai/pk/use-ExternalUserTurnStrategies-in-deepgram-flux-example
Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example
2026-04-21 13:20:27 -04:00
Paul Kompfner
81571beb1b Use ExternalUserTurnStrategies, as expected, in a Deepgram Flux example 2026-04-21 10:51:59 -04:00
Mark Backman
a07bee2318 Add changelog for #4344 2026-04-21 09:12:15 -04:00
Mark Backman
a0f79b4700 Use ElevenLabs normalized_alignment so word timestamps match spoken audio 2026-04-21 09:09:19 -04:00
Mark Backman
2c3f051a1f Merge pull request #4325 from radhikagpt1208/fix/sentry-metrics-drop-metricsframe
Fix SentryMetrics dropping MetricsFrame from stop_ttfb/stop_processing
2026-04-21 07:57:42 -04:00
Mark Backman
c1b3a9f4b5 Add pipecat label to update-docs CI workflow 2026-04-20 20:40:54 -04:00
Mark Backman
9ded7bab1b Merge pull request #4334 from dhruvladia-sarvam/feat/sarvam-stt-vad-parameters-exposed
Sarvam - VAD parameters configurable on saaras:v3
2026-04-20 16:04:23 -04:00
dhruvladia-sarvam
34fb303c44 changelog descriptions 2026-04-21 00:29:38 +05:30
dhruvladia-sarvam
2aec2467cb Deprecated InputParams fix and default model change to saaras:v3 2026-04-21 00:19:49 +05:30
Mark Backman
9d8eefd2a2 Add changelog for #4337 2026-04-20 12:02:20 -04:00
Mark Backman
b59c4775da Split user-turn stop timeout into independent speech and STT timers
SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into
max(stt_timeout, user_speech_timeout), which over-waited for finalizing
STT services and could also end the turn early in a legacy code path.
Run them as independent timers instead:

- user_speech_timeout: policy floor, always runs to completion.
- stt_timeout: latency safety net, short-circuited by a finalized
  transcript since STT has signaled it has nothing more to send.

The no-VAD fallback now waits only user_speech_timeout rather than
max(stt_timeout, user_speech_timeout); stt_timeout is defined relative
to VAD stop and has no meaning when no VAD event occurred. This
shortens the fallback wait for users who set stt_timeout greater than
user_speech_timeout.
2026-04-20 11:55:09 -04:00
Harshita Jain
03bd667f95 Fix Smallest AI TTS WebSocket endpoint URL and remove unsupported flush (#4320)
* Fix Smallest AI TTS WebSocket endpoint URL to match API documentation

Update base URL from waves-api.smallest.ai to api.smallest.ai and
fix path prefix from /api/v1/ to /waves/v1/ per the v4.0.0 docs.

* Update keepalive using silent space message instead of unsupported flush
2026-04-20 11:15:25 -04:00
Mark Backman
e8c3f73968 Merge pull request #4336 from pipecat-ai/mb/pyright-ignore-modules
Silence pyright diagnostics for unchecked modules in IDE
2026-04-20 09:15:02 -04:00
sathwika
91e5b1ad9a Handle NVIDIA LLM reasoning content in stream wrapper 2026-04-20 14:17:39 +05:30
dhruvladia-sarvam
f2a19cb1a3 Initial commit for vad parameters on saaras:v3 2026-04-20 13:52:48 +05:30
sathwika
74becffe55 add changelog 2026-04-20 11:47:20 +05:30
sathwika
995f897b80 Enhance NVIDIA LLM reasoning tokens handling and allow keyless local NIM endpoints 2026-04-20 11:47:16 +05:30
Mark Backman
74d11dc0aa Silence pyright diagnostics for unchecked modules in IDE
Pylance analyzes open files even when they're outside the `include`
set, producing noise in the editor. Adding these paths to `ignore`
suppresses diagnostics without affecting import resolution.
2026-04-19 09:19:15 -04:00
Ian Lee
b435ddfa44 feat(tts): add includes_inter_frame_spaces flag to word-timestamp API
Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and
punctuation are already embedded in the token text. When downstream consumers
join these tokens with an extra space they produce "hello , world" instead of
"hello, world".

Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to
`add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through
`_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`.
Defaults to `False` — no behaviour change for existing services.

`InworldTTSService` passes `includes_inter_frame_spaces=True` and stops
pre-processing tokens in `_calculate_word_times`, returning them verbatim.

Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket
delivery paths: verbatim text preservation, PTS ordering, text-before-audio
ordering, and the Inworld punctuation-token scenario.

Made-with: Cursor
2026-04-18 12:03:32 -07:00
Mark Backman
6d3dfd8f64 Merge pull request #4329 from pipecat-ai/mb/resolve-krisp-warning
Silence krisp_audio import logs on auto-import
2026-04-17 18:23:01 -04:00
Mark Backman
ce9c214eec Silence krisp_audio import logs on auto-import
The two logger.error lines in krisp_instance.py fired at module-load time
whenever anything transitively imported it (e.g. pipecat.turns.user_start
pulling in krisp_viva_ip_user_turn_start_strategy), producing noisy output
for users who never asked for Krisp. Drop the log calls and raise a more
informative ImportError that names the affected classes so direct
importers still get clear guidance.
2026-04-17 18:18:33 -04:00
Mark Backman
8c8b76e9d2 Merge pull request #4326 from pipecat-ai/mb/flux-multilingual 2026-04-17 15:59:11 -04:00
denxxs
7b3141ba19 chore: update changelog fragment to PR #4328 2026-04-18 01:15:27 +05:30
denxxs
928ade993b fix: re-seed Gemini Live context on reconnect without session resumption 2026-04-18 01:14:05 +05:30
Mark Backman
42a6fc703c Address review feedback
- Fall back to Language.EN in _primary_detected_language when model is
  flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
  now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
  frames still carry Language.EN.
2026-04-17 15:38:14 -04:00
Mark Backman
c5c18335fd Merge pull request #4324 from pipecat-ai/mb/pyright-initial
Add pyright type checking: step 1
2026-04-17 14:04:35 -04:00
Mark Backman
3159503c7f Merge pull request #4327 from pipecat-ai/filipi/pyright_service_switcher
Fixing typecheck for service switcher.
2026-04-17 13:59:40 -04:00
filipi87
0340e25e9f Fixing typecheck for service switcher. 2026-04-17 12:44:57 -03:00
Mark Backman
af861b7975 Add changelog for #4326 2026-04-17 10:31:37 -04:00
Mark Backman
6bb4e8295f Add multilingual support for Deepgram Flux STT
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
2026-04-17 10:30:45 -04:00
Mark Backman
f5f92dea63 Add changelog entries and restore multi-line WhatsApp error log
Add changelog entries for the pyright introduction and the
LiveKitRunnerArguments.token signature tightening. Restore the
indented multi-line format for the WhatsApp missing-env error,
now listing only the vars that are actually missing.
2026-04-17 09:39:55 -04:00
Mark Backman
cb1463f9f1 Fix type errors in runner and add to pyright checked set
Make required parameters non-optional: LiveKitRunnerArguments.token,
_create_telephony_transport args. Use os.environ[] instead of
os.getenv() for required WhatsApp env vars. Guard spec/loader None
in module loading. Tighten sip_caller_phone guard in daily.py.
2026-04-17 09:39:55 -04:00
Garegin Harutyunyan
4c19f5584c VIVA SDK TT v3 support (#4252)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* Typo fix in voice-krisp-viva example to use KrispVivaFilter class

* style fix.

* test run error fixes.

* some test related changes.

* Fixed tests

* Stule fixes.
2026-04-17 07:53:41 -04:00
Radhika Gupta
80fecab4de Fix SentryMetrics dropping MetricsFrame from stop_ttfb/stop_processing
SentryMetrics.stop_ttfb_metrics and stop_processing_metrics called the
base FrameProcessorMetrics implementation but discarded its return
value (implicit `return None`). FrameProcessorMetrics.stop_ttfb_metrics
/ stop_processing_metrics build and return a MetricsFrame, which
FrameProcessor.stop_ttfb_metrics / stop_processing_metrics then pushes
downstream so observers (e.g. UserBotLatencyObserver,
MetricsLogObserver) can see TTFB / processing metrics.

Because SentryMetrics returned None, the FrameProcessor never pushed
the MetricsFrame, so any pipeline using metrics=SentryMetrics() on STT
/ LLM / TTS services silently lost all downstream TTFB and processing
MetricsFrames. The metrics were still calculated and logged
internally, and Sentry transactions still finished correctly, but
observers never saw them.

Forward the MetricsFrame returned by the base class so FrameProcessor
can push it into the pipeline.
2026-04-17 14:48:36 +05:30
Mark Backman
ab91047300 Fix type errors in pipeline and add to pyright checked set
Use Sequence[FrameProcessor] instead of list[FrameProcessor] in Pipeline,
ServiceSwitcher, and ServiceSwitcherStrategy parameters to accept subtype
lists. Add cast() in LLMSwitcher for narrowed return types. Guard against
None in task_observer._send_to_proxy and replace hasattr with truthiness
check in task._cleanup.
2026-04-16 21:47:11 -04:00
Mark Backman
3127cc6161 Fix type errors in turns and add to pyright checked set
Widen base strategy process_frame return types to ProcessFrameResult |
None to match actual behavior (None treated as CONTINUE). Give
UserTurnCompletionLLMServiceMixin a FrameProcessor base class so pyright
can see create_task, cancel_task, process_frame, and push_frame.
2026-04-16 21:33:43 -04:00
Mark Backman
36319ecbf0 Replace system role message
In UserTurnCompletionMixin, use a developer role message for
LLM messages following an incomplete turn
2026-04-16 21:26:08 -04:00
Mark Backman
c6a1837844 Fix type errors in extensions and add to pyright checked set
Tighten LLMMessagesAppendFrame and LLMMessagesUpdateFrame message fields
from list[dict] to list[LLMContextMessage] to match actual usage. Add
type annotations on inline message lists in IVR navigator and voicemail
detector.
2026-04-16 21:22:46 -04:00
Daksh Dua
31127abd9a Allow inter-token whitespace once non-whitespace has been sent
In token-streaming mode, _push_tts_frames previously stripped only
leading newlines and dropped any pure-whitespace frame. That silently
discarded meaningful inter-token whitespace (e.g. a standalone "\n"
token between "hello" and "world"), losing prosody cues and any
downstream sentence-boundary semantics.

Track whether a non-whitespace character has been sent in the current
context. While the flag is false, strip all leading whitespace; once
true, let whitespace tokens flow through. Reset the flag on
LLMFullResponseEndFrame/EndFrame and on interruption, and save/restore
it around TTSSpeakFrame since each utterance is its own context.

Sentence-aggregation mode preserves the existing behavior.
2026-04-16 15:51:35 -07:00
Mark Backman
aa355e3d32 Fix type errors in observers and add to pyright checked set
Group three co-assigned fields (_start_frame_id, _start_frame_arrival_ns,
_start_wall_clock) into a single _StartFrameInfo dataclass. This makes
the "always set together" invariant structural rather than implicit, and
fixes the incorrect str | None annotation on _start_frame_id (Frame.id
is int).
2026-04-16 18:25:10 -04:00
Mark Backman
9bd51cd88c Add incremental pyright type checking with CI enforcement
Add pyrightconfig.json with basic type checking for zero-error modules
(clocks, metrics, transcriptions, frames) and enforce via CI. The
include list will expand as modules are fixed.
2026-04-16 18:04:42 -04:00
Aleix Conchillo Flaqué
fc1c3b48dc Merge pull request #4322 from pipecat-ai/aleix/readme-subagents
Add Pipecat Subagents to the ecosystem section in README
2026-04-16 10:38:56 -07:00
Aleix Conchillo Flaqué
4278a37ebc Merge pull request #4321 from pipecat-ai/aleix/fix-redundant-type-checks
Remove redundant duplicate type checks in direct_function.py
2026-04-16 10:38:45 -07:00
Mark Backman
7e045257e8 Merge pull request #4314 from pipecat-ai/mb/prudent-system-instruction-logging
Log system instruction once at composition time, not on every LLM call
2026-04-16 13:18:33 -04:00
dyi1
b8a1f45d4c Improve HeyGen LiveAvatar plugin reliability and performance (#4312)
* Improve HeyGen LiveAvatar plugin reliability and performance

- Add WebSocket ready gate: wait for session.state_updated connected
  event before sending commands (prevents silently dropped messages)
- Add keep-alive mechanism: send session.keep_alive every 2.5 min to
  prevent 5-minute inactivity timeout
- Optimize audio chunking: 600ms first chunk for faster initial
  response, 1s subsequent chunks for efficient streaming
- Fix audio buffer flush: send remaining buffered audio on utterance
  end instead of discarding it
- Fix WS state cleanup: properly reset connected/ready state when
  WebSocket drops unexpectedly
- Add livekit_config passthrough in LiveAvatar session token creation
- Replace stray print() with logger.debug()

* Fix HeyGenOutputTransport.start() signature and use 400ms first chunk

- Update transport.py to match new client.start() signature (no
  audio_chunk_size param)
- Change first chunk size from 600ms to 400ms per feedback

* Fix transport audio resampling and client.start() error propagation

- Add audio resampling in HeyGenOutputTransport.write_audio_frame() to
  ensure audio is always 24kHz before sending to HeyGen (was sending
  at pipeline sample rate, causing garbled audio)
- Raise exception on WS ready timeout instead of silently returning,
  preventing transport from appearing ready when WS connection failed

* Fix session readiness gate to work with LITE mode

LITE mode does not send session.state_updated WS events. Instead,
use a dual-signal _session_ready event that fires on either:
- WS session.state_updated connected (FULL mode)
- LiveKit participant connected (LITE mode)

Also reorder start() to connect both WS and LiveKit before waiting,
since the WS events may depend on LiveKit being connected.

Verified with live sandbox session - all tests pass.

* Simplify session readiness to use only WS ready gate

Remove _session_ready dual-signal and use only _ws_ready, which fires
on the session.state_updated connected WS event. Increase timeout to
30s. LiveKit is connected before waiting so the WS event can arrive.

* Reduce WS ready gate timeout back to 10s

* Remove WS ready gate (session.state_updated not reliably received)

The session.state_updated connected event is not reliably received
via the websockets library. Remove the gate for now and assume the
session is ready after WS + LiveKit connect. Keep-alive, chunking,
buffer flush, state cleanup, and other improvements remain.
2026-04-16 12:58:14 -04:00
Aleix Conchillo Flaqué
8ec85f981d Add Pipecat Subagents to the ecosystem section in README 2026-04-16 09:57:23 -07:00
Aleix Conchillo Flaqué
2f52905d32 Remove redundant duplicate type checks in direct_function.py
After the typing modernization, `dict or dict` and `list or list`
were left behind where `Dict`/`List` had been replaced by `dict`/`list`.
2026-04-16 09:51:21 -07:00
Aleix Conchillo Flaqué
f86cf98c6d Merge pull request #4319 from pipecat-ai/aleix/modernize-typing
Modernize Python typing across the codebase
2026-04-16 09:43:17 -07:00
Aleix Conchillo Flaqué
84fcba772d Replace percent format with f-string in daily/utils.py 2026-04-16 09:30:19 -07:00
Aleix Conchillo Flaqué
b3bb6fdaa5 Modernize Python typing across the codebase
Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311):

- Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type`
  with their built-in equivalents (`list`, `dict`, `tuple`, etc.)
- Replace `typing.Optional[X]` with `X | None`
- Replace `typing.Union[X, Y]` with `X | Y`
- Move `Mapping`, `Sequence`, `Callable`, `Awaitable`,
  `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`,
  `AsyncGenerator` imports from `typing` to `collections.abc`
- Remove now-unused `typing` imports
- Add `from __future__ import annotations` to 5 files that use
  forward-reference strings in `X | "Y"` annotations
2026-04-16 09:28:23 -07:00
Aleix Conchillo Flaqué
12b8af3d89 pyproject: use UP ruff linting option 2026-04-16 09:26:12 -07:00
Aleix Conchillo Flaqué
1c4ffb7845 Merge pull request #4313 from pipecat-ai/ac/daily-send-dtmf
Add send_dtmf() to DailyTransport
2026-04-16 08:57:48 -07:00
Aleix Conchillo Flaqué
8d4feede23 Split #4313 changelog into one entry per file 2026-04-16 08:55:03 -07:00
Aleix Conchillo Flaqué
b11a3bc43f Add method field to Daily DTMF output frames
Lets callers specify Daily's DTMF delivery method (e.g. "rfc2833"
or "info") alongside `session_id` and `digit_duration_ms`. Forwarded
to Daily's `send_dtmf` as `method`.
2026-04-16 08:55:03 -07:00
Mark Backman
8dce66933f Merge pull request #4315 from pipecat-ai/mb/update-tavus-transport-on-connected
Update Tavus transport example
2026-04-16 09:20:52 -04:00
Mark Backman
7291026695 Update Tavus transport example
Show how to use on_connected event handler to obtain
Daily room URL
2026-04-15 23:04:31 -04:00
Mark Backman
686e250db1 Add changelog for #4314 2026-04-15 21:03:13 -04:00
Mark Backman
e8d6f611cd Log system_instruction once at composition time 2026-04-15 21:02:20 -04:00
Aleix Conchillo Flaqué
f094ce80fb Add to_string helper on output DTMF frames
Mirrors the existing `from_string` classmethod and lets callers
turn a frame's `buttons` list back into a dial string like `"123#"`.
`__str__` and the Daily transport's native DTMF path reuse it.
2026-04-15 15:14:47 -07:00
Aleix Conchillo Flaqué
9fbe1bf2a3 Document button as a convenience shortcut, not a deprecation
The single-key `button` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` is kept as a first-class ergonomic shortcut
for the common single-keypress case, equivalent to
`buttons=[button]`. `buttons` takes precedence when both are set.
2026-04-15 15:09:01 -07:00
Aleix Conchillo Flaqué
d8b0e78bc8 Represent DTMF sequences as list[KeypadEntry] via buttons field
Replaces the string-based `tones` field with a type-safe
`buttons: list[KeypadEntry]` on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame`, matching the existing singular `button`
field on `InputDTMFFrame`. A `from_string` classmethod builds the
list from a dial string like `"123#"` (invalid characters raise
ValueError from the `KeypadEntry` constructor).

The base output audio fallback now iterates `frame.buttons`
directly, LiveKit sends `frame.buttons[0].value`, and the Daily
transport joins the button values into the single string Daily's
`send_dtmf` expects.
2026-04-15 15:05:45 -07:00
Aleix Conchillo Flaqué
675b7df408 Add tones to OutputDTMFFrame and simplify DTMF frame hierarchy
Introduces a new `tones` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` for sending multi-digit DTMF sequences and
deprecates the existing single-key `button` field. When only `button`
is set, it is used as a single-character `tones` string for backward
compatibility.

`DTMFFrame` is kept as an empty marker class so both input and output
DTMF frames can still be identified via isinstance. `InputDTMFFrame`
keeps its required `button` field (single keypress semantics).

The Daily-specific `DailyOutputDTMFFrame` and
`DailyOutputDTMFUrgentFrame` frames no longer need to override
`button` and simply add `session_id` and `digit_duration_ms`, which
are forwarded to Daily's `send_dtmf` as `sessionId` and
`digitDurationMs`.

The base output audio fallback now iterates `tones` and generates a
tone per character; LiveKit's native DTMF path sends `tones[0]` since
its API is single-tone.
2026-04-15 14:48:02 -07:00
Aleix Conchillo Flaqué
30f39d7395 Add DailyOutputDTMFFrame and DailyOutputDTMFUrgentFrame
Introduces Daily-specific DTMF output frames that carry explicit
`tones`, `session_id` and `digit_duration_ms` fields, forwarded to
Daily's `send_dtmf` as `tones`, `sessionId` and `digitDurationMs`.
The inherited `button` and `transport_destination` fields are
ignored for these frames in the Daily transport.
2026-04-15 14:20:08 -07:00
Aleix Conchillo Flaqué
fe2ef9c712 Add changelog for #4313 2026-04-15 10:43:28 -07:00
Aleix Conchillo Flaqué
173cf39aee Add send_dtmf() to DailyTransport
Exposes the Daily call client's DTMF sending capability so
applications can send tones during a call (e.g. IVR navigation).
2026-04-15 10:43:28 -07:00
Filipi da Silva Fuchter
ac43a70d36 Merge pull request #4311 from pipecat-ai/filipi/reconnect_websocket
New approach to reconnect STT services after updating settings.
2026-04-15 14:39:24 -03:00
filipi87
8e4fd10e0f Removing CancelledError handling from DeepgramSTTService. 2026-04-15 14:36:17 -03:00
filipi87
aeab417cd1 Changelogs for the STT service reconnect improvements. 2026-04-15 13:23:25 -03:00
filipi87
d263ad3c34 Refactoring DeepgramSTT to use request to reconnect. 2026-04-15 13:21:12 -03:00
filipi87
f3c454dc54 Refactoring CartesiaSTT to use request to reconnect. 2026-04-15 13:19:36 -03:00
filipi87
fc63790657 New approach to reconnect STT services after updating settings. 2026-04-15 11:01:58 -03:00
Mark Backman
9ffcccdd84 Merge pull request #4253 from pipecat-ai/mb/mistral-stt
Add Mistral Voxtral Realtime STT service
2026-04-15 09:00:27 -04:00
Yan Fortin
6feeee515f chore: rename changelog fragment to match PR #4306 2026-04-14 18:49:35 -04:00
Yan Fortin
55fb4b0845 fix(azure-tts): route completion through word boundary queue to prevent last word from being missed
The Azure TTS _handle_completed callback was putting the audio stream
completion signal (None) directly into _audio_queue while the last word
was still pending in _word_boundary_queue. This caused a race condition
where run_tts could exit and TTSStoppedFrame could be emitted before the
word processor task had a chance to process and emit the final word's
TTSTextFrame.

The fix routes the completion signal through _word_boundary_queue as a
None sentinel. The word processor task now recognizes this sentinel and
only signals _audio_queue after all pending words have been drained.
This guarantees the last word's TTSTextFrame is always emitted before
TTSStoppedFrame.

The cancellation/interruption path (_handle_canceled) is unchanged and
still signals _audio_queue directly, which is correct since word ordering
does not matter when speech is interrupted.
2026-04-14 18:48:40 -04:00
Mark Backman
503782c8b2 Merge pull request #4304 from pipecat-ai/mb/tavus-deps
Add missing daily-python dependency for tavus extra
2026-04-14 18:14:19 -04:00
Mark Backman
b834a893fe Add changelog for #4304 2026-04-14 17:52:29 -04:00
Mark Backman
ba023248d9 Add missing daily-python dependency for tavus extra 2026-04-14 17:48:37 -04:00
borislav
14cf783647 chore: add changelog for #4301 2026-04-14 22:41:09 +02:00
borislav
86e726107f fix: fail missing tool calls cleanly 2026-04-14 22:40:45 +02:00
Aleix Conchillo Flaqué
457f55e99a Merge pull request #4297 from pipecat-ai/changelog-1.0.0
Release 1.0.0 - Changelog Update
2026-04-14 12:08:35 -07:00
aconchillo
f8318289d4 Update changelog for version 1.0.0 2026-04-14 12:06:43 -07:00
Aleix Conchillo Flaqué
958d90819f Merge pull request #4294 from pipecat-ai/ac/fix-assistant-turn-stopped-event
Fix on_assistant_turn_stopped not firing for tool-call-only responses
2026-04-14 10:09:55 -07:00
Aleix Conchillo Flaqué
403235eb48 Add changelog for #4294 2026-04-14 10:07:19 -07:00
Aleix Conchillo Flaqué
698c2ba92e Fix on_assistant_turn_stopped not firing for empty LLM responses
When the LLM returned zero text tokens (e.g. it was interrupted before producing
tokens or about to push tokens), push_aggregation() returned an empty string and
on_assistant_turn_stopped was never emitted. This left consumers waiting for an
event that would never arrive.

Now on_assistant_turn_stopped always fires, with an empty content string when
the LLM produced no text tokens.

Fixes #4292
2026-04-14 10:07:19 -07:00
Mark Backman
f013d5632b Merge pull request #4293 from pipecat-ai/mb/fix-elevenlabs-tts-enable-logging
Fix ElevenLabs TTS boolean params and add missing features
2026-04-14 12:58:31 -04:00
Mark Backman
570849955c Merge pull request #4295 from pipecat-ai/mb/context-summarization-index-0
Fix context summarization failing with mid-conversation system messages
2026-04-14 12:24:47 -04:00
Mark Backman
84b885682f Add changelog for #4295 2026-04-14 11:49:31 -04:00
Mark Backman
989fb4deaa Fix context summarization failing with mid-conversation system messages
Only treat messages[0] as the initial system prompt when determining the
summarization range. Previously, the code scanned the entire context for
the first system-role message, which caused failures when the only system
message was a mid-conversation injection (e.g. "The user has been quiet").
In that case summary_start exceeded summary_end, producing an empty range
and "No messages to summarize" errors.

Fixes #4286
2026-04-14 11:48:50 -04:00
dhruvladia-sarvam
ab74605a26 Sarvam TTS request id added to agent logs (#4278)
- Added trace logging to correlate Sarvam request_id with context_id
2026-04-14 11:02:05 -04:00
Mark Backman
49998d252b Add changelog for #4293 2026-04-14 10:13:12 -04:00
Mark Backman
84566c1110 Remove unused ElevenLabsOutputFormat and add missing sample rates
Remove dead ElevenLabsOutputFormat type alias. Add pcm_32000 and
pcm_48000 to output_format_from_sample_rate to match the ElevenLabs API.
2026-04-14 10:11:31 -04:00
Mark Backman
45aa95fa10 Fix ElevenLabs boolean query params and add enable_logging to HTTP service
The enable_logging and enable_ssml_parsing URL params used truthy checks,
so False was treated the same as None (both skipped). Also, Python's
str(False) produces "False" but the API expects lowercase "false".

Additionally, add enable_logging support to ElevenLabsHttpTTSService
which was missing entirely.
2026-04-14 10:04:23 -04:00
Mark Backman
d1f7af0330 Merge pull request #4283 from pipecat-ai/mb/user-stop-transcript-improvements 2026-04-13 19:27:05 -04:00
Mark Backman
31b5a64382 Merge pull request #4282 from pipecat-ai/mb/cartesia-stt-settings-update
Reconnect Cartesia STT websocket on settings change
2026-04-13 18:18:36 -04:00
Mark Backman
d20013d7a6 Add changelog for #4283 2026-04-13 18:12:04 -04:00
Mark Backman
804e3ea9ec Trigger turn stop immediately when transcript arrives after p99 timeout
When the STT p99 timeout fires without a transcript, the turn stop
strategy previously did nothing — falling through to the 5-second
user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the
timeout has elapsed so that a late transcript triggers the turn stop
immediately instead of waiting for the fallback.
2026-04-13 18:11:32 -04:00
Aleix Conchillo Flaqué
a14d257cf2 update pytest to >=9 2026-04-13 15:08:47 -07:00
Aleix Conchillo Flaqué
a8660aabfe update uv.lock 2026-04-13 15:06:25 -07:00
Aleix Conchillo Flaqué
7dc763d512 Merge pull request #4272 from pipecat-ai/pk/llm-context-get-messages-elide-large-values
Add truncate_large_values to LLMContext.get_messages()
2026-04-13 15:04:41 -07:00
Mark Backman
36b15c92ef Add changelog for #4282 2026-04-13 17:29:39 -04:00
Mark Backman
64ed0aae13 Reconnect Cartesia STT websocket when settings change at runtime
Previously settings updates were ignored with a TODO comment. Now when
model/language changes via STTUpdateSettingsFrame the service disconnects
and reconnects with the new query parameters.

Key changes:
- Implement _update_settings to disconnect/reconnect on changes
- Check `is not State.OPEN` in run_stt to catch CLOSING state
- Send `done` command before closing for clean session shutdown
- Capture websocket reference in _disconnect_websocket to prevent a
  concurrent _connect from having its new connection nulled by a stale
  finally block
2026-04-13 17:28:34 -04:00
Mark Backman
be81dac723 Merge pull request #4280 from pipecat-ai/mb/resolve-vuln-2026-04-13
Update uv.lock resolving langchain-core and cryptography vulnerabilities
2026-04-13 11:58:25 -04:00
Mark Backman
d942a713af Update uv.lock resolving langchain-core and cryptography vulnerabilities 2026-04-13 11:09:31 -04:00
Filipi da Silva Fuchter
e248c4c049 Merge pull request #4249 from sathwikareddy02/nvidia-tts-update
Add stitching support and enhancements for NvidiaTTSService
2026-04-13 09:39:48 -03:00
filipi87
1d5dcf1698 Invoking to remove the audio context when there is no more audio to receive. 2026-04-13 09:34:13 -03:00
sathwika
f45a410f56 refactor/simplify NvidiaTTSService synthesis stream shutdown 2026-04-13 14:35:17 +05:30
Paul Kompfner
e38647151d Fix language: binary data is replaced with placeholders, not truncated 2026-04-11 14:39:25 -04:00
Paul Kompfner
1a02b5d61a Rename elide_large_values to truncate_large_values 2026-04-11 14:29:05 -04:00
Aleix Conchillo Flaqué
4254c1f0e0 Merge pull request #4273 from pipecat-ai/ac/test-fixes
Fix LLM test constructors and wake phrase test race
2026-04-10 21:27:00 -07:00
Aleix Conchillo Flaqué
f91a113de7 tests: yield in wake phrase strategy setup to let tasks start
The strategy schedules background tasks during setup. Fast-running
tests could observe state before those tasks had a chance to run;
yielding once via asyncio.sleep(0) ensures they do.
2026-04-10 17:37:50 -07:00
Aleix Conchillo Flaqué
e553bb010f tests: migrate LLM tests to Settings-based constructor API
Replace the old `model=` / `params=InputParams(...)` style with the
new `settings=<Service>.Settings(...)` form across LLM service tests.
2026-04-10 17:37:49 -07:00
Paul Kompfner
245339e885 Add changelog for #4272 2026-04-10 16:37:49 -04:00
Paul Kompfner
812cdc6822 Add elide_large_values to LLMContext.get_messages()
Enable callers to get a compact version of context messages suitable
for serialization, logging, and debugging tools. For standard
messages, known binary data (base64 images, audio) is fully elided.
For LLM-specific messages, long string values are recursively
truncated. Adapter get_messages_for_logging() methods now use this.
2026-04-10 16:35:36 -04:00
Aleix Conchillo Flaqué
153814ecc2 scripts/evals: create recording subdirectories when saving audio
Example files can live under subdirectories (e.g. foundational/01.py),
so the recording path needs its parent directory created before the
audio file is written.
2026-04-10 13:19:20 -07:00
Filipi da Silva Fuchter
b1204cc430 Merge pull request #4241 from pipecat-ai/filipi/async_tools_cancellable
Enable async tool cancellation feature.
2026-04-10 15:28:01 -03:00
filipi87
c542167065 Refactored on_function_calls_cancelled to use FunctionCallFromLLM. 2026-04-10 15:06:39 -03:00
Aleix Conchillo Flaqué
02116c58de Merge pull request #4244 from omChauhanDev/fix/vad-stuck-speaking-on-mute
fix VAD stuck in SPEAKING state when audio stops mid-speech
2026-04-10 10:46:53 -07:00
Aleix Conchillo Flaqué
dcd21e7ff4 Rework audio idle detection with timestamp-based adaptive sleep
Replaces the per-frame asyncio.Event signaling with a monotonic
timestamp updated on each audio frame. The handler sleeps until the
next deadline (last_audio_time + timeout), recomputing on each wake-up
to account for audio arriving during sleep.

This avoids waking the handler on every audio frame (~50/s at 20ms
chunks), and guarantees detection latency is bounded by timeout rather
than 2 * timeout.

Also renames audio_starvation_timeout to audio_idle_timeout and
associated identifiers for consistency with existing pipecat naming
(user_idle_timeout, etc.).
2026-04-10 10:35:18 -07:00
Aleix Conchillo Flaqué
5356f3028b Merge pull request #4271 from pipecat-ai/mb/fix-translation-readme
Fix translation example in README
2026-04-10 10:26:27 -07:00
Om Chauhan
cb2c1868b0 fix VAD stuck in SPEAKING state when audio stops mid-speech 2026-04-10 09:54:48 -07:00
Aleix Conchillo Flaqué
dac88c0a47 Merge pull request #4267 from pipecat-ai/ac/fix-observer-cleanup-ordering
Fix observer cleanup ordering to stop proxy tasks before closing resources
2026-04-10 09:05:33 -07:00
kompfner
8e5fe8afda Merge pull request #4067 from omChauhanDev/fix-gemini3-flash-thinking-default
fix: default thinking config for Gemini 3+ Flash models
2026-04-10 10:41:44 -04:00
kompfner
d07eebff20 Merge pull request #4248 from omChauhanDev/add-openai-custom-tools-support
Add custom_tools support for OpenAI adapters
2026-04-10 10:27:28 -04:00
Paul Kompfner
ef4dcca4f1 Update changelog to describe user-facing custom_tools support 2026-04-10 10:23:13 -04:00
Paul Kompfner
fc3307bc63 Use OpenAI SDK types for tool params in adapters and tests
These are TypedDicts (plain dicts at runtime), so no behavioral change
— just more descriptive type hints for readers. Use ToolParam instead
of FunctionToolParam for the Responses adapter to reflect that custom
non-function tools are supported. Use ChatCompletionToolParam instead
of Any for the completions adapter return type. Update tests to use
typed params in expected values.
2026-04-10 10:15:39 -04:00
Mark Backman
da9a55a430 Fix translation example in README 2026-04-10 09:13:42 -04:00
Filipi da Silva Fuchter
094d36904c Merge pull request #4268 from pipecat-ai/filipi/lemonslice_improments
LemonSlice transport updates - new events, extra params
2026-04-10 08:50:39 -03:00
sathwika
746fadc2b5 thread simplification + handling interuption 2026-04-10 17:18:22 +05:30
filipi87
8cce25d2d2 Fixing openai examples. 2026-04-10 08:25:50 -03:00
filipi87
891f00cb5f Using the on_function_calls_cancelled inside the examples. 2026-04-10 07:45:20 -03:00
filipi87
1ca094dad7 Not invoking on_function_calls_started for the cancel function, and creating on_function_calls_cancelled 2026-04-10 07:40:52 -03:00
filipi87
346c585290 Enabling the option to cancel the tools for all the async examples. 2026-04-10 07:31:51 -03:00
jp-lemon
c134110399 LemonSlice transport updates 2026-04-10 07:10:41 -03:00
Aleix Conchillo Flaqué
f9117e6d4a Add changelog for PIPECAT_OBSERVER_FILES removal 2026-04-09 17:39:54 -07:00
Aleix Conchillo Flaqué
360e4480e0 Remove deprecated _load_observer_files in favor of setup files 2026-04-09 17:38:46 -07:00
Aleix Conchillo Flaqué
9b7e15c9bc Add changelog for #4267 2026-04-09 16:55:40 -07:00
Aleix Conchillo Flaqué
00ea86fda8 Fix observer cleanup ordering to stop proxy tasks before closing resources
During pipeline shutdown, proxy tasks must be cancelled before observer
resources are cleaned up. Previously, stop() was called inside
_cancel_tasks() and start() was called in _start_tasks(), which could
lead to proxy tasks still consuming frames after observer resources
were closed.

Now the lifecycle is explicit in _handle_start_frame: start() after all
observers are loaded, and stop() before cleanup() on shutdown.

Also fixes misleading variable name in TaskObserver.cleanup() where
iterating self._proxies yields observer keys, not Proxy values.

Fixes #4195
2026-04-09 16:55:40 -07:00
Aleix Conchillo Flaqué
5f75728207 EventNotifier: update docstring with single-consumer use case 2026-04-09 16:21:42 -07:00
Aleix Conchillo Flaqué
9d274f0fb3 PipelineTask: update dangling task logging 2026-04-09 16:21:05 -07:00
Aleix Conchillo Flaqué
43ddbdf1ec Merge pull request #3797 from iamjr15/fix/idle-processor-event-race
Fix asyncio.Event race conditions in idle processors
2026-04-09 16:04:03 -07:00
iamjr15
565349d332 Fix asyncio.Event race conditions in idle processors
Move event.clear() from finally block to success path in
IdleFrameProcessor and UserIdleProcessor._idle_task_handler().
The finally block unconditionally cleared signals set during
async timeout callbacks, causing false-positive idle detection.

Closes #3402
2026-04-09 13:41:01 -07:00
filipi87
2dd1170229 Updating the Anthropic stream example to allow cancel the location tracking. 2026-04-09 17:26:51 -03:00
filipi87
5cf90cba98 Addressing PR review comments. 2026-04-09 17:11:04 -03:00
Aleix Conchillo Flaqué
981b7bdcb7 Merge pull request #4255 from omChauhanDev/fix/async-gc-collect
PipelineRunner: make _gc_collect async
2026-04-09 12:09:38 -07:00
Filipi da Silva Fuchter
c4320e7f07 Merge pull request #4265 from pipecat-ai/filipi/fix_elevenlabs_token_aggregation
Using the correct default for auto_mode based on text_aggregation_mode.
2026-04-09 15:30:36 -03:00
filipi87
ea0be4d39c Changelog for the elevenlabs fix. 2026-04-09 15:25:06 -03:00
filipi87
dca4e1090a Using the correct default for auto_mode based on text_aggregation_mode. 2026-04-09 15:21:30 -03:00
Cale Shapera
ec574edd53 Add Inworld Realtime Service (#4140)
* Add Inworld Realtime LLM service

Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.

New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py

Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled

* Prefer init-provided system instruction in Inworld Realtime

Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.

* Update changelog entry with PR number

* Fix changelog format to use bullet point

* Polish PR: default model, example cleanup, changelog update

- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog

* Address PR review feedback for Inworld Realtime

- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
  _ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
  OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths

* Simplify example, add provider tracking, remove local VAD path

- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
  for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends

* Default TTS model to inworld-tts-1.5-max

* Remove dead shimmed tools code, set STT/VAD defaults

- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
2026-04-09 13:04:17 -04:00
filipi87
772fb57090 Enable async tool cancellation feature. 2026-04-09 10:29:23 -03:00
Filipi da Silva Fuchter
76601944c6 Merge pull request #4230 from pipecat-ai/filipi/async_tools_stream
Support for streaming multiple responses via function calls
2026-04-09 10:26:33 -03:00
filipi87
178985ec8a Refactoring the frame queue to avoid overhead. 2026-04-09 10:24:22 -03:00
filipi87
edc197d050 Creating a new example for async stream using Google. 2026-04-09 09:50:00 -03:00
filipi87
7ece8e3c4a Creating a new example for async stream using Anthropic. 2026-04-09 09:41:07 -03:00
filipi87
7b45a56119 Changelogs for the new feature and the fix. 2026-04-09 09:04:19 -03:00
filipi87
a544f885a3 Added new examples: function-calling-openai-async-stream.py and function-calling-openai-responses-async-stream.py 2026-04-09 09:04:06 -03:00
filipi87
375deac912 Support for streaming multiple responses via function calls. 2026-04-09 09:03:53 -03:00
filipi87
699ca38dc1 Allowing to check if a specific frame is in the queue. 2026-04-09 09:03:06 -03:00
filipi87
aeda60f761 Refactoring the FrameQueue to be able to track any Frame. 2026-04-09 09:02:47 -03:00
Om Chauhan
b010dd58d2 added changelog 2026-04-08 09:37:58 +05:30
Om Chauhan
225ea907d5 make PipelineRunner._gc_collect async 2026-04-08 09:27:18 +05:30
Om Chauhan
1443dfb070 added changelog 2026-04-08 08:48:26 +05:30
Om Chauhan
4bef85e363 added custom_tools support for OpenAI adapters 2026-04-08 08:40:03 +05:30
Mark Backman
215b2dc7f3 Add voice-mistral to evals 2026-04-07 15:37:07 -04:00
Mark Backman
874e2878be Update README with Mistral services 2026-04-07 15:36:22 -04:00
Mark Backman
9131fa5c12 Add changelog for PR #4253 2026-04-07 15:32:38 -04:00
Mark Backman
68a3070ad4 Add Mistral Voxtral Realtime STT service 2026-04-07 15:26:56 -04:00
Mark Backman
a7bf9f538c Clean up comments in MistralTTSService 2026-04-07 12:56:10 -04:00
Mark Backman
0acfb4dd49 Merge pull request #4251 from pipecat-ai/mb/mistral-tts
Add Mistral Voxtral streaming TTS service
2026-04-07 12:50:48 -04:00
Mark Backman
8594401024 Add changelog for PR #4251 2026-04-07 12:32:06 -04:00
Mark Backman
aa7a014518 Add mistral voice example 2026-04-07 12:32:06 -04:00
Filipi da Silva Fuchter
27a8a973b1 Merge pull request #4201 from pipecat-ai/mb/handle-recurring-disconnects
Fix WebsocketService infinite reconnection loop
2026-04-07 11:02:24 -03:00
sathwika
8abda808ca Add Nvidia copyright header 2026-04-07 19:27:04 +05:30
Mark Backman
7f3f23dcb9 Add Mistral Voxtral streaming TTS service
Integrate with Mistral's Voxtral TTS API (voxtral-mini-tts-2603) using
HTTP streaming with Server-Sent Events. Converts base64-encoded float32
PCM chunks from the API to int16 for the Pipecat pipeline.
2026-04-07 09:39:36 -04:00
Filipi da Silva Fuchter
be509e5647 Merge pull request #4245 from kollaikal-rupesh/fix/mixer-cancel-cleanup
Stop audio mixer on pipeline cancellation
2026-04-07 10:36:18 -03:00
sathwika
9f0b18b03d Add changelog fragments for PR #4249 2026-04-07 18:18:55 +05:30
Filipi da Silva Fuchter
6eccd16543 Merge pull request #4217 from pipecat-ai/filipi/async_tools
Supporting async function calls.
2026-04-07 09:35:03 -03:00
filipi87
d8dc6bc7d0 New example for async function calls using Google. 2026-04-07 09:31:22 -03:00
filipi87
d12a8529e2 New example for async function calls using OpenAI responses. 2026-04-07 09:28:01 -03:00
filipi87
aa061f7e2c Renaming the openai and anthropic examples to async instead of delayed. 2026-04-07 09:23:45 -03:00
Filipi da Silva Fuchter
e863293198 Improving docstring description.
Co-authored-by: kompfner <paul@daily.co>
2026-04-07 08:14:39 -04:00
filipi87
9c7d5a9de2 Improving changelog description to mention group_parallel_tools. 2026-04-07 09:13:08 -03:00
Filipi da Silva Fuchter
a451c42dc7 Merge pull request #4247 from pipecat-ai/filipi/background_sound_example
Fixing the background sound example.
2026-04-07 09:06:14 -03:00
sathwika
bc009d8f98 Add stitching support and enhancements for NvidiaTTSService 2026-04-07 14:49:45 +05:30
Rupesh
67ee802772 Remove changelog entry per review feedback 2026-04-06 21:36:53 -07:00
filipi87
ceaa27ee6e Fixing the background sound example. 2026-04-06 18:25:30 -03:00
filipi87
42335e2ef0 Renaming to async_tool and providing description. 2026-04-06 09:56:48 -03:00
Rupesh
7585864113 Stop audio mixer on pipeline cancellation to prevent 100% CPU usage 2026-04-06 01:51:29 -07:00
kompfner
18852adc28 Merge pull request #4242 from pipecat-ai/pk/gemini-live-fix-session-resumption
Fix Gemini Live session resumption hanging after reconnect
2026-04-04 11:43:24 -04:00
Paul Kompfner
f11b6d7151 Fix Gemini Live session resumption hanging after reconnect
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.

Set the flag in _handle_session_ready when we detect a reconnect, either
via session_resumption_handle (server restores state) or via existing
context (rare case where connection drops before first resumption handle).
2026-04-03 18:27:10 -04:00
Paul Kompfner
9df1e18b43 Fix Gemini Live session resumption hanging after reconnect
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.

Set the flag in _handle_session_ready when context already exists
(i.e. reconnect case) since we don't need to go through
_create_initial_response again.
2026-04-03 16:32:03 -04:00
Mark Backman
b8f9a21e0c Merge pull request #4240 from pipecat-ai/mb/remove-old-files
Remove orphaned .dockerignore and CHANGELOG.md.template
2026-04-03 15:40:57 -04:00
Mark Backman
c18d997ad8 Remove orphaned .dockerignore and CHANGELOG.md.template 2026-04-03 14:55:25 -04:00
Mark Backman
56aaebe1b0 Merge pull request #4239 from pipecat-ai/mb/remove-deprecation-module-proxy
Remove DeprecatedModuleProxy and service re-export shims
2026-04-03 14:03:17 -04:00
Mark Backman
916af84974 Remove DeprecatedModuleProxy and service re-export shims
Remove the deprecation proxy infrastructure that allowed old-style flat
imports (e.g. `from pipecat.services.openai import OpenAILLMService`).
Users must now import from specific submodules
(`from pipecat.services.openai.llm import OpenAILLMService`), which is
already the established pattern across all internal code and 179+ examples.

- Strip 32 proxy `__init__.py` files to empty
- Strip 3 non-proxy files with bare star imports (minimax, sambanova, sarvam)
- Strip google/gemini_live `__init__.py` re-exports
- Remove DeprecatedModuleProxy class and helpers from services/__init__.py
- Remove ruff per-file ignore for services/__init__.py
- Fix 2 examples using old-style imports
2026-04-03 13:43:02 -04:00
Mark Backman
3e911b5fa0 Merge pull request #4236 from pipecat-ai/mb/more-deprecation-removals-2026-04-03
Remove deprecated fields, shims, and backward-compatibility code
2026-04-03 13:28:03 -04:00
Aleix Conchillo Flaqué
7c08779a2f Merge pull request #4234 from pipecat-ai/aleix/export-runner-app
Export FastAPI app from runner for custom routes
2026-04-03 09:45:39 -07:00
Mark Backman
988c08a5b6 Merge pull request #4238 from pipecat-ai/mb/fix-daily-utils-docs
Fix Pydantic v2 + Sphinx autodoc incompatibility for Daily utils
2026-04-03 12:39:09 -04:00
Mark Backman
7351298849 Fix Pydantic v2 + Sphinx autodoc incompatibility for Daily utils
Patch Pydantic's DICT_TYPES check in conf.py to accept Union-wrapped
dict types, fixing the autodoc import failure for models using
ConfigDict(extra="allow").
2026-04-03 12:00:11 -04:00
kompfner
392134be46 Merge pull request #4231 from pipecat-ai/pk/llm-messages-transform-frame
Add a `LLMMessagesTransformFrame` to facilitate programmatically edit…
2026-04-03 11:54:34 -04:00
Paul Kompfner
9266e1e7ad Remove comment referencing removed OpenAILLMContext 2026-04-03 11:53:57 -04:00
Mark Backman
e9eff4626f Merge pull request #4237 from pipecat-ai/mb/docstring-fixes-2026-04-03
Docstring fixes for docs auto-generation
2026-04-03 11:50:20 -04:00
Mark Backman
21aa50283e Update docs build script and README for current workflow
Make -W (warnings as errors) opt-in via --strict flag instead of
default, and update README to reflect uv-based workflow and current
directory structure.
2026-04-03 11:43:44 -04:00
Paul Kompfner
70469e3c0c Assert no LLMContextFrame when run_llm is not set in message frame tests 2026-04-03 11:34:58 -04:00
Paul Kompfner
6111df947e Test LLMAssistantAggregator handling of upstream message frames
Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame,
and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator,
mirroring the existing LLMUserAggregator downstream tests. Add
frames_to_send_direction param to run_test helper to support this.
2026-04-03 11:34:58 -04:00
Paul Kompfner
4eebfd65d9 Add a LLMMessagesTransformFrame to facilitate programmatically editing context in a frame-based way.
The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages *at that point in time*, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.
2026-04-03 11:34:50 -04:00
Mark Backman
c2358b273b Use Parameters instead of Attributes in docstrings to fix duplicate object warnings
Napoleon's Attributes section creates class-level attribute docs that
duplicate the __init__ parameter docs when napoleon_include_init_with_doc
is enabled. Using Parameters avoids the duplication.
2026-04-03 10:36:36 -04:00
Mark Backman
3a10a528c0 Remove deprecated fields, shims, and backward-compatibility code
- Remove expect_stripped_words from LLMAssistantAggregatorParams and related warnings
- Remove old multi-parameter on_push_frame observer signature support in TaskObserver
- Remove deprecated context field from UserImageRequestFrame
- Remove deprecated LiveKitTransportMessageFrame and LiveKitTransportMessageUrgentFrame
- Remove deprecated pipecat.turns.mute shim module
2026-04-03 10:10:51 -04:00
Mark Backman
f078b8b867 Fix Sphinx docstring RST formatting warnings
Replace Markdown code blocks with RST syntax in genesys.py, fix
deprecated directive transitions in nvidia and summarization modules,
remove stray bullet prefix in whisper arg docs, restructure code block
in turn completion mixin, and add deepgram mock to Sphinx conf.
2026-04-03 09:57:20 -04:00
Mark Backman
5490820338 Merge pull request #4235 from pipecat-ai/mb/deprecation-docs-cleanup
Clean up docs config after deprecation pass
2026-04-03 09:57:05 -04:00
Mark Backman
10697636c9 Add changelog for #4235 2026-04-03 09:52:31 -04:00
Mark Backman
e1638a9342 Clean up docs config after riva removal and add missing modules
Remove stale riva mock imports from autodoc_mock_imports since the riva
service was removed and nvidia-riva-client is installed during doc builds.
Add pipecat.turns and pipecat.extensions to import_core_modules() and
add Turns to the index.rst toctree. Regenerate uv.lock to reflect the
riva extra removal from pyproject.toml.
2026-04-03 09:52:31 -04:00
Mark Backman
bfffefa95c Remove leftover riva and remote-smart-turn references
Clean up deprecated extras from pyproject.toml and the docs
build script.
2026-04-03 09:29:29 -04:00
Mark Backman
fbb49ffc8d Merge pull request #4233 from pipecat-ai/mb/remove-unused-imports-2026-04-02
Remove unused imports across codebase
2026-04-03 07:26:13 -04:00
filipi87
eace782752 Renaming from async_tool to tool. 2026-04-03 08:20:14 -03:00
Mark Backman
b94071d37f Merge pull request #4232 from pipecat-ai/mb/more-deprecation-removals 2026-04-03 06:52:56 -04:00
Aleix Conchillo Flaqué
796a10fe9c Add changelog for #4234 2026-04-02 21:16:49 -07:00
Aleix Conchillo Flaqué
1ab07d312f Export FastAPI app from runner so custom routes can be added
Move the FastAPI instance to module level so other packages can import
it and register routes before main() is called. main() now configures
the existing app with transport-specific routes instead of creating a
new one.
2026-04-02 21:16:17 -07:00
Mark Backman
8adb38f87c Remove unused imports across codebase 2026-04-02 22:21:16 -04:00
Mark Backman
33f145d70a Add changelog fragments for #4232 2026-04-02 22:10:09 -04:00
Mark Backman
41e46ee69e Remove deprecated vad_events and should_interrupt from DeepgramSTTService
Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of
Silero VAD. This removes vad_events from settings and LiveOptions,
the should_interrupt parameter, the vad_enabled property,
_on_speech_started/_on_utterance_end handlers, and simplifies
_on_message and process_frame accordingly.
2026-04-02 22:05:49 -04:00
Mark Backman
60933b7a56 Remove deprecated send_transcription_frames param and fix broken _warn_deprecated_param calls
Remove the send_transcription_frames parameter from OpenAI Realtime LLM
(deprecated since 0.0.92). Also fix undefined _warn_deprecated_param
calls in both OpenAI and xAI realtime services, replacing them with the
existing _warn_init_param_moved_to_settings method.
2026-04-02 21:58:57 -04:00
Mark Backman
64e09d592e Remove deprecated TranscriptionUserTurnStopStrategy alias
Replaced by SpeechTimeoutUserTurnStopStrategy since 0.0.102.
2026-04-02 21:57:03 -04:00
Mark Backman
883de8ab08 Remove dangling turn_analyzer docstring and unused imports from TransportParams 2026-04-02 21:56:11 -04:00
Mark Backman
793ed8f9e3 Remove deprecated UserBotLatencyLogObserver and UserIdleProcessor
UserBotLatencyLogObserver (deprecated 0.0.102) is replaced by
UserBotLatencyObserver. UserIdleProcessor (deprecated 0.0.100) is
replaced by LLMUserAggregator with user_idle_timeout.
2026-04-02 21:54:36 -04:00
Vanessa Pyne
d8ea33e1a4 Merge pull request #4034 from omChauhanDev/fix/mcp-persistent-session
fixed MCPClient to reuse session across tool calls
2026-04-02 18:51:31 -05:00
vipyne
1d7404ef21 Update MCP examples 2026-04-02 18:15:56 -05:00
Om Chauhan
dc909e2713 add changelog fragments 2026-04-02 18:06:28 -05:00
Om Chauhan
e22f9f84bb fixed MCPClient to reuse session across tool calls 2026-04-02 18:06:28 -05:00
filipi87
7af72eee3e Creating new delayed examples for openai and anthropic. 2026-04-02 18:40:41 -03:00
Aleix Conchillo Flaqué
57068f1b38 Merge pull request #4229 from pipecat-ai/aleix/deprecate-transport-vad-turn-analyzers
Remove deprecated transport VAD/turn analyzers and ExternalUserTurnStrategies
2026-04-02 14:30:12 -07:00
filipi87
bbb605accc Changelog entries for the fixes and improvements. 2026-04-02 16:58:42 -03:00
filipi87
929a0e33f4 Fixing the automated tests. 2026-04-02 16:58:28 -03:00
filipi87
3724ecd378 Supporting async function calls. 2026-04-02 16:58:19 -03:00
filipi87
4c8734c5e1 Fixing an issue where the BotOutputTransport was discarding the UninterruptibleFrames. 2026-04-02 16:57:46 -03:00
filipi87
283f6df205 Creating a FrameQueue so we can properly reset without discarding uninterruptible frames. 2026-04-02 16:57:22 -03:00
Aleix Conchillo Flaqué
a29be38f48 LLMUserAggregator: remove self-queued frame tracking
The _self_queued_frames set and _internal_queue_frame wrapper were used
to prevent re-processing SpeechControlParamsFrame that the aggregator
queued to itself. Now that the frame is no longer special-cased, this
tracking is unnecessary. Also removes unused FrameCallback import.
2026-04-02 12:42:06 -07:00
Aleix Conchillo Flaqué
976c644f90 Fix tests to expect SpeechControlParamsFrame from default turn strategy 2026-04-02 12:42:06 -07:00
Aleix Conchillo Flaqué
34aa37f395 Add changelog for #4229 2026-04-02 11:54:07 -07:00
Aleix Conchillo Flaqué
380867a87a LLMUserAggregator: remove auto ExternalUserTurnStrategies() 2026-04-02 11:52:26 -07:00
Aleix Conchillo Flaqué
cc3af59db4 transports: remove deprecated VAD and turn analyzers 2026-04-02 11:51:08 -07:00
Mark Backman
f93d13efff Merge pull request #4228 from pipecat-ai/mb/remove-turn-deprecations 2026-04-02 14:32:21 -04:00
Mark Backman
c28b7e8f26 Merge pull request #4219 from lukehalley/feat/bedrock-prompt-caching
feat(aws): add prompt caching support for Bedrock ConverseStream
2026-04-02 12:26:28 -04:00
Mark Backman
d1a2dee7a1 fix(aws): initialize enable_prompt_caching in default settings 2026-04-02 12:20:47 -04:00
Luke Halley
da1a1a59a4 feat(aws): handle LLMEnablePromptCachingFrame for runtime toggling
Add LLMEnablePromptCachingFrame handler to process_frame for parity
with AnthropicLLMService, enabling runtime toggling of prompt caching.
2026-04-02 12:13:46 -04:00
Luke Halley
134790b17c chore: add changelog fragment for PR #4219 2026-04-02 12:10:57 -04:00
Luke Halley
e5aa3bbc20 feat(aws): add prompt caching support for Bedrock ConverseStream
Adds `enable_prompt_caching` setting to `AWSBedrockLLMSettings`. When
enabled, appends `cachePoint` markers to system prompts and tool
definitions in ConverseStream requests.

This can reduce TTFT by up to 85% for multi-turn conversations where
the system prompt stays constant (e.g. voice agents, chat assistants).

Follows the same pattern as `AnthropicLLMService.enable_prompt_caching`.

Usage:
```python
llm = AWSBedrockLLMService(
    settings=AWSBedrockLLMSettings(
        model="au.anthropic.claude-haiku-4-5-20251001-v1:0",
        enable_prompt_caching=True,
    ),
)
```

See: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
2026-04-02 12:10:57 -04:00
Mark Backman
3be0ea05ef Add changelog entries for #4228 2026-04-02 11:34:22 -04:00
Mark Backman
0c59819682 Remove allow_interruptions from voice-sarvam example
This was missed from the allow_interruptions removal commit.
2026-04-02 11:32:44 -04:00
Mark Backman
5b67dcd9e7 Remove deprecated EmulateUser{Started,Stopped}SpeakingFrame and emulated field
Remove EmulateUserStartedSpeakingFrame, EmulateUserStoppedSpeakingFrame
(deprecated since v0.0.99), and the emulated field from
UserStartedSpeakingFrame and UserStoppedSpeakingFrame. Clean up the
handling code in base_input.py and a stale comment in nova_sonic/llm.py.
2026-04-02 11:31:29 -04:00
Mark Backman
d503383c23 Remove deprecated interruption_strategies plumbing
The interruption_strategies mechanism was deprecated in v0.0.99 in favor
of LLMUserAggregator's user_turn_strategies. All evaluation logic was
already removed — this removes the remaining field definitions, property,
StartFrame propagation, conditional check in base_input.py, strategy
files, and test.
2026-04-02 11:19:17 -04:00
Mark Backman
fa30268b84 Remove deprecated TranscriptionMessage, ThoughtTranscriptionMessage, and TranscriptionUpdateFrame 2026-04-02 11:03:23 -04:00
Mark Backman
2a118084bd Remove deprecated transcript_processor module 2026-04-02 10:57:05 -04:00
Mark Backman
87e8ed109a Remove deprecated STTMuteFilter, STTMuteConfig, and STTMuteStrategy 2026-04-02 10:52:41 -04:00
Mark Backman
a5e1bbf4a3 Remove deprecated UserResponseAggregator class 2026-04-02 10:50:05 -04:00
Mark Backman
f8267f1ea6 Remove deprecated allow_interruptions parameter
This field was deprecated in v0.0.99 in favor of LLMUserAggregator's
user_turn_strategies / user_mute_strategies parameters. Since the default
was True (interruptions allowed), removing the guards keeps the current
default behavior.
2026-04-02 10:47:44 -04:00
Mark Backman
74acb0b7d0 Remove deprecated class_decorators tracing module 2026-04-02 10:31:15 -04:00
Mark Backman
41e3afbc2f Remove deprecated add_pattern_pair method from PatternPairAggregator 2026-04-02 10:28:01 -04:00
Aleix Conchillo Flaqué
d4824ffe8a Merge pull request #4225 from pipecat-ai/aleix/transport-and-other-deprecations
Remove deprecated transport module aliases and sync package
2026-04-01 19:43:22 -07:00
Mark Backman
2426f80789 Merge pull request #4220 from pipecat-ai/mb/more-service-deprecations
Remove more deprecated service parameters and shims
2026-04-01 22:23:39 -04:00
Mark Backman
5ce46df599 Use self.create_context_id() instead of raw uuid in CartesiaTTSService 2026-04-01 22:18:41 -04:00
Aleix Conchillo Flaqué
a6013ba437 update uv.lock 2026-04-01 19:12:39 -07:00
Aleix Conchillo Flaqué
279ca5a87b Add changelog for #4225 2026-04-01 19:04:11 -07:00
Aleix Conchillo Flaqué
c6f79592d8 remove deprecated sync package 2026-04-01 19:04:11 -07:00
Aleix Conchillo Flaqué
e74e497b8d transports: remove old deprecated modules 2026-04-01 19:04:11 -07:00
Aleix Conchillo Flaqué
d245b79bba Merge pull request #3984 from pipecat-ai/aleix/update-onnxruntime
Update onnxruntime to 1.24.3
2026-04-01 19:03:57 -07:00
Mark Backman
8a794424dd Update uv.lock 2026-04-01 19:05:17 -04:00
Aleix Conchillo Flaqué
f4743a6c91 require python >= 3.11 2026-04-01 19:02:34 -04:00
Aleix Conchillo Flaqué
ba32a48510 github: remove python 3.10 from compatibility chart 2026-04-01 19:02:34 -04:00
Aleix Conchillo Flaqué
a9cafa2a3b Add changelog for #3984 2026-04-01 19:02:34 -04:00
Aleix Conchillo Flaqué
58b1b7249e Update onnxruntime to 1.24.3
This version adds support for Python 3.14.
2026-04-01 19:02:32 -04:00
Aleix Conchillo Flaqué
db8e73e5ca Merge pull request #4224 from pipecat-ai/aleix/optional-function-call-timeout
Make function_call_timeout_secs optional
2026-04-01 14:39:10 -07:00
Mark Backman
170f6dfe8b Add changelog for #4220 2026-04-01 17:03:05 -04:00
Mark Backman
c763abc4ae Add deprecation version to update_options in GoogleSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
197d96fc49 Remove deprecated enable_prompt_caching_beta from Anthropic InputParams 2026-04-01 17:03:05 -04:00
Mark Backman
c8e9bf77fd Remove deprecated simli_config and use_turn_server params from SimliVideoService 2026-04-01 17:03:05 -04:00
Mark Backman
48b25962e2 Remove deprecated english_normalization param from MiniMax TTS InputParams 2026-04-01 17:03:05 -04:00
Mark Backman
5d093c9ad7 Remove deprecated InputParams class from GoogleVertexLLMService
The location and project_id fields were deprecated since 0.0.90 in
favor of direct __init__ parameters. Now that InputParams is removed,
project_id is required and location defaults to "us-east4" directly
in the signature.
2026-04-01 17:03:05 -04:00
Mark Backman
d93f63deb5 Remove deprecated base_url param from GeminiLiveLLMService 2026-04-01 17:03:05 -04:00
Mark Backman
09a57972f5 Remove deprecated api_key param from GeminiTTSService 2026-04-01 17:03:05 -04:00
Mark Backman
f83d062df9 Remove deprecated InputParams alias from GladiaSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
a2a42b8703 Remove deprecated confidence param from GladiaSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
e60a72e2d4 Remove deprecated language param from GladiaInputParams 2026-04-01 17:03:05 -04:00
Mark Backman
83f4989a78 Remove deprecated model param from FishAudioTTSService 2026-04-01 17:03:05 -04:00
Mark Backman
5d2b288274 Remove deprecated url param from DeepgramSTTService 2026-04-01 17:03:05 -04:00
Mark Backman
52ece87ac9 Remove deprecated send_transcription_frames param from AWSNovaSonicLLMService 2026-04-01 17:03:05 -04:00
Mark Backman
bc4bbb1895 Remove deprecated PollyTTSService alias 2026-04-01 17:03:05 -04:00
Mark Backman
eb014fffc4 Flush Cartesia context on voice/model/language changes
Override _update_settings in CartesiaTTSService to flush the current
audio context and assign a new turn context ID when voice, model, or
language settings change. This prevents Context has closed errors
from Cartesia API, which locks these parameters per context.
2026-04-01 17:03:05 -04:00
Mark Backman
e74930b954 Remove deprecated text_aggregator and text_filter params from TTS
Remove the deprecated text_aggregator parameter from TTSService,
CartesiaTTSService, and RimeTTSService, and the deprecated text_filter
parameter from TTSService. Users should use LLMTextProcessor before
the TTS service instead. Update the voice-switching example to use
LLMTextProcessor with PatternPairAggregator.
2026-04-01 17:03:05 -04:00
Aleix Conchillo Flaqué
6ed4109da9 Add changelog for #4224 2026-04-01 13:58:45 -07:00
Aleix Conchillo Flaqué
53f809b7d5 Make function_call_timeout_secs optional and skip timeout task when unset
Change the default from 10s to None so deferred function calls can run
indefinitely when no timeout is configured. Only create the timeout
task when a timeout is actually provided (per-call or service-level).
2026-04-01 13:58:09 -07:00
Mark Backman
f6a3678f93 Improve tests 2026-03-30 12:46:30 -04:00
Mark Backman
3af93ed257 Add changelog for #4201 2026-03-30 12:31:26 -04:00
Mark Backman
f37bf989dd Make reconnection failure error non-fatal to allow service failover
A single service failing to reconnect should not kill the entire
pipeline. Non-fatal errors flow through the pipeline so application
code (e.g. ServiceSwitcher) can handle failover to a backup service.
2026-03-30 12:29:53 -04:00
Mark Backman
86a16d53bc Detect quick connection failures in WebsocketService to prevent infinite reconnection loops
When a WebSocket server accepts the handshake but immediately closes the
connection (e.g. invalid API key returning close code 1008), the existing
exponential backoff does not help because the handshake keeps succeeding.
This tracks how long each connection survives and emits a non-fatal
ErrorFrame after 3 consecutive sub-5s failures, allowing ServiceSwitcher
failover instead of killing the pipeline.

Fixes #3711
2026-03-30 12:23:11 -04:00
Om Chauhan
fa982a05c0 added changelog 2026-03-18 09:46:15 +05:30
Om Chauhan
419c7d4450 fix: default thinking config for Gemini 3+ Flash models 2026-03-18 09:33:54 +05:30
980 changed files with 80210 additions and 18764 deletions

1
.agents/skills/changelog Symbolic link
View File

@@ -0,0 +1 @@
../../.claude/skills/changelog

1
.agents/skills/cleanup Symbolic link
View File

@@ -0,0 +1 @@
../../.claude/skills/cleanup

1
.agents/skills/code-review Symbolic link
View File

@@ -0,0 +1 @@
../../.claude/skills/code-review

1
.agents/skills/docstring Symbolic link
View File

@@ -0,0 +1 @@
../../.claude/skills/docstring

View File

@@ -0,0 +1 @@
../../.claude/skills/pr-description

1
.agents/skills/pr-submit Symbolic link
View File

@@ -0,0 +1 @@
../../.claude/skills/pr-submit

1
.agents/skills/update-docs Symbolic link
View File

@@ -0,0 +1 @@
../../.claude/skills/update-docs

View File

@@ -1,3 +1,8 @@
---
name: cleanup
description: Review, refactor, document, and validate code changes in the current branch
---
# Code Cleanup Skill
The **Code Cleanup Skill** reviews, refactors, and documents code changes in your current branch, ensuring alignment with **Pipecat's architecture, coding standards, and example patterns**.

View File

@@ -0,0 +1,91 @@
---
name: squash-commits
description: Reorganize messy branch commits into a small set of logical, meaningful commits without changing any content. Drops merge-from-main commits. Safe: creates a backup branch first.
---
Reorganize the commits on the current branch into a small number of logical commits. Do NOT change any file content — only the commit structure changes.
## Instructions
### 1. Safety check
```bash
git status --short
```
If there are uncommitted changes, stop and tell the user to commit or stash them first.
### 2. Inspect the branch
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
```
List every file changed vs `main` and every commit on the branch (excluding merge commits from main).
### 3. Create a backup branch
```bash
git branch backup/<current-branch-name>
```
Tell the user the backup exists so they can recover if needed.
### 4. Soft-reset to main and unstage everything
```bash
git reset --soft main
git restore --staged .
```
All branch changes are now in the working tree, unstaged. No content has changed.
### 5. Plan the logical groups
Read the changed files and the original commit messages to understand what the work covers. Group related files into logical commits. Typical groups:
- Core feature or fix (new source files + modified core files)
- Secondary features or fixes (each as its own commit if distinct)
- Refactoring or renames
- Tests
- Changelogs / docs
Use the changelog files (if any) as a strong hint — each changelog entry often maps to one commit.
Present the proposed grouping to the user and ask for confirmation before committing.
### 6. Commit in logical groups
For each group, stage only the relevant files and commit with a clear message following the project's conventions:
```bash
git add <file1> <file2> ...
git commit -m "..."
```
Use conventional commit prefixes if the project uses them (`feat:`, `fix:`, `refactor:`, `test:`, `chore:`).
### 7. Verify
```bash
git log main..HEAD --oneline
git diff main..HEAD --name-only
git status --short
```
Confirm:
- Commit count is small and each message is meaningful
- The set of changed files vs `main` is identical to before
- Working tree is clean
### 8. Remind about force-push
The branch history has been rewritten. Tell the user they will need to `git push --force-with-lease` when they are ready to update the remote. Do NOT push automatically.
## Rules
- Never change file contents. If you find yourself editing a file, stop.
- Never skip the backup branch step.
- Never force-push without explicit user instruction.
- If any step fails or the result looks wrong, tell the user and suggest restoring from the backup: `git reset --hard backup/<branch-name>`.

View File

@@ -1,30 +0,0 @@
# flyctl launch added from .gitignore
**/.vscode
**/env
**/__pycache__
**/*~
**/venv
#*#
# Distribution / packaging
**/.Python
**/build
**/develop-eggs
**/dist
**/downloads
**/eggs
**/.eggs
**/lib
**/lib64
**/parts
**/sdist
**/var
**/wheels
**/share/python-wheels
**/*.egg-info
**/.installed.cfg
**/*.egg
**/MANIFEST
**/.DS_Store
**/.env
fly.toml

View File

@@ -41,7 +41,10 @@ jobs:
--extra google \
--extra langchain \
--extra livekit \
--extra pgmq \
--extra piper \
--extra redis \
--extra runner \
--extra sagemaker \
--extra tracing \
--extra websocket

View File

@@ -32,7 +32,9 @@ jobs:
run: uv python install 3.12
- name: Install development dependencies
run: uv sync --group dev
# `--all-extras` (matching the dev setup in README.md) so pyright can
# resolve types from various optional dependencies.
run: uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
- name: Ruff formatter
id: ruff-format
@@ -41,3 +43,7 @@ jobs:
- name: Ruff linter (all rules)
id: ruff-check
run: uv run ruff check
- name: Type check (pyright)
id: pyright
run: uv run pyright

View File

@@ -14,7 +14,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.10.19', '3.11.14', '3.12.12', '3.13.12']
python-version: ['3.11.15', '3.12.13', '3.13.12', '3.14.3']
name: Python ${{ matrix.python-version }}
steps:

View File

@@ -45,7 +45,10 @@ jobs:
--extra google \
--extra langchain \
--extra livekit \
--extra pgmq \
--extra piper \
--extra redis \
--extra runner \
--extra sagemaker \
--extra tracing \
--extra websocket

View File

@@ -114,6 +114,7 @@ jobs:
GH_TOKEN=$DOCS_SYNC_TOKEN gh pr create \
--repo pipecat-ai/docs \
--label auto-docs \
--label pipecat \
--title "docs: update for pipecat PR #${{ steps.pr.outputs.number }}" \
--body "$(cat <<'BODY'
Automated documentation update for [pipecat PR #${{ steps.pr.outputs.number }}](https://github.com/pipecat-ai/pipecat/pull/${{ steps.pr.outputs.number }}).

View File

@@ -11,7 +11,7 @@ build:
jobs:
post_install:
- pip install uv
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
- UV_PROJECT_ENVIRONMENT=$READTHEDOCS_VIRTUALENV_PATH uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra mlx-whisper
sphinx:
configuration: docs/api/conf.py

174
AGENTS.md Normal file
View File

@@ -0,0 +1,174 @@
# AGENTS.md
This file provides guidance to AI coding agents when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer --no-extra local
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Deprecations**: Use the `.. deprecated:: <version>` Sphinx directive in docstrings (never inline tags like `[DEPRECATED]`), and pair it with a runtime `warnings.warn(..., DeprecationWarning)` at the call site. See `CONTRIBUTING.md` for full conventions.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
# Pydantic params class with a deprecated field
class MyParams(BaseModel):
"""Configuration parameters for MyService.
Parameters:
new_setting: Replacement for ``old_setting``.
old_setting: Legacy setting, no longer used.
.. deprecated:: 1.2.0
Use ``new_setting`` instead. Will be removed in 2.0.0.
"""
new_setting: str = "default"
old_setting: str | None = None
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.

File diff suppressed because it is too large Load Diff

View File

@@ -1,62 +0,0 @@
# Changelog
All notable changes to the **&lt;project name&gt;** SDK will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Please make sure to add your changes to the appropriate categories:
## [Unreleased]
### Added
<!-- for new functionality -->
- n/a
### Changed
<!-- for changed functionality -->
- n/a
### Deprecated
<!-- for soon-to-be removed functionality -->
- n/a
### Removed
<!-- for removed functionality -->
- n/a
### Fixed
<!-- for fixed bugs -->
- n/a
### Performance
<!-- for performance-relevant changes -->
- n/a
### Security
<!-- for security-relevant changes -->
- n/a
### Other
<!-- for everything else -->
- n/a
## [0.1.0] - YYYY-MM-DD
Initial release.

158
CLAUDE.md
View File

@@ -1,157 +1 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational AI agents. It orchestrates audio/video, AI services, transports, and conversation pipelines using a frame-based architecture.
## Common Commands
```bash
# Setup development environment
uv sync --group dev --all-extras --no-extra gstreamer
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest
# Run a single test file
uv run pytest tests/test_name.py
# Run a specific test
uv run pytest tests/test_name.py::test_function_name
# Preview changelog
uv run towncrier build --draft --version Unreleased
# Lint and format check
uv run ruff check
uv run ruff format --check
# Update dependencies (after editing pyproject.toml)
uv lock && uv sync
```
## Architecture
### Frame-Based Pipeline Processing
All data flows as **Frame** objects through a pipeline of **FrameProcessors**:
```
[Processor1] → [Processor2] → ... → [ProcessorN]
```
**Key components:**
- **Frames** (`src/pipecat/frames/frames.py`): Data units (audio, text, video) and control signals. Flow DOWNSTREAM (input→output) or UPSTREAM (acknowledgments/errors).
- **FrameProcessor** (`src/pipecat/processors/frame_processor.py`): Base processing unit. Each processor receives frames, processes them, and pushes results downstream.
- **Pipeline** (`src/pipecat/pipeline/pipeline.py`): Chains processors together.
- **ParallelPipeline** (`src/pipecat/pipeline/parallel_pipeline.py`): Runs multiple pipelines in parallel.
- **Transports** (`src/pipecat/transports/`): Transports are frame processors used for external I/O layer (Daily WebRTC, LiveKit WebRTC, WebSocket, Local). Abstract interface via `BaseTransport`, `BaseInputTransport` and `BaseOutputTransport`.
- **Pipeline Task (`src/pipecat/pipeline/task.py`)**: Runs and manages a pipeline. Pipeline tasks send the first frame, `StartFrame`, to the pipeline in order for processors to know they can start processing and pushing frames. Pipeline tasks internally create a pipeline with two additional processors, a source processor before the user-defined pipeline and a sink processor at the end. Those are used for multiple things: error handling, pipeline task level events, heartbeat monitoring, etc.
- **Pipeline Runner (`src/pipecat/pipeline/runner.py`)**: High-level entry point for executing pipeline tasks. Handles signal management (SIGINT/SIGTERM) for graceful shutdown and optional garbage collection. Run a single pipeline task with `await runner.run(task)` or multiple concurrently with `await asyncio.gather(runner.run(task1), runner.run(task2))`.
- **Services** (`src/pipecat/services/`): 60+ AI provider integrations (STT, TTS, LLM, etc.). Extend base classes: `AIService`, `LLMService`, `STTService`, `TTSService`, `VisionService`.
- **Serializers** (`src/pipecat/serializers/`): Convert frames to/from wire formats for WebSocket transports. `FrameSerializer` base class defines `serialize()` and `deserialize()`. Telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) handle provider-specific protocols and audio encoding (e.g., μ-law).
- **RTVI** (`src/pipecat/processors/frameworks/rtvi.py`): Real-Time Voice Interface protocol bridging clients and the pipeline. `RTVIProcessor` handles incoming client messages (text input, audio, function call results). `RTVIObserver` converts pipeline frames to outgoing messages: user/bot speaking events, transcriptions, LLM/TTS lifecycle, function calls, metrics, and audio levels.
- **Observers** (`src/pipecat/observers/`): Monitor frame flow without modifying the pipeline. Passed to `PipelineTask` via the `observers` parameter. Implement `on_process_frame()` and `on_push_frame()` callbacks.
### Important Patterns
- **Context Aggregation**: `LLMContext` accumulates messages for LLM calls; `UserResponse` aggregates user input
- **Turn Management**: Turn management is done through `LLMUserAggregator` and
`LLMAssistantAggregator`, created with `LLMContextAggregatorPair`
- **User turn strategies**: Detection of when the user starts and stops speaking is done via user turn start/stop strategies. They push `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` respectively.
- **Interruptions**: Interruptions are usually triggered by a user turn start strategy (e.g. `VADUserTurnStartStrategy`) but they can be triggered by other processors as well, in which case the user turn start strategies don't need to. An `InterruptionFrame` carries an optional `asyncio.Event` that is set when the frame reaches the pipeline sink. If a processor stops an `InterruptionFrame` from propagating downstream (i.e., doesn't push it), it **must** call `frame.complete()` to avoid stalling `push_interruption_task_frame_and_wait()` callers.
- **Uninterruptible Frames**: These are frames that will not be removed from internal queues even if there's an interruption. For example, `EndFrame` and `StopFrame`.
- **Events**: Most classes in Pipecat have `BaseObject` as the very base class. `BaseObject` has support for events. Events can run in the background in an async task (default) or synchronously (`sync=True`) if we want immediate action. Synchronous event handlers need to execute fast.
- **Async Task Management**: Always use `self.create_task(coroutine, name)` instead of raw `asyncio.create_task()`. The `TaskManager` automatically tracks tasks and cleans them up on processor shutdown. Use `await self.cancel_task(task, timeout)` for cancellation.
- **Error Handling**: Use `await self.push_error(msg, exception, fatal)` to push errors upstream. Services should use `fatal=False` (the default) so application code can handle errors and take action (e.g. switch to another service).
### Key Directories
| Directory | Purpose |
| -------------------------- | -------------------------------------------------- |
| `src/pipecat/frames/` | Frame definitions (100+ types) |
| `src/pipecat/processors/` | FrameProcessor base + aggregators, filters, audio |
| `src/pipecat/pipeline/` | Pipeline orchestration |
| `src/pipecat/services/` | AI service integrations (60+ providers) |
| `src/pipecat/transports/` | Transport layer (Daily, LiveKit, WebSocket, Local) |
| `src/pipecat/serializers/` | Frame serialization for WebSocket protocols |
| `src/pipecat/observers/` | Pipeline observers for monitoring frame flow |
| `src/pipecat/audio/` | VAD, filters, mixers, turn detection, DTMF |
| `src/pipecat/turns/` | User turn management |
## Code Style
- **Docstrings**: Google-style. Classes describe purpose; `__init__` has `Args:` section; dataclasses use `Parameters:` section.
- **Linting**: Ruff (line length 100). Pre-commit hooks enforce formatting.
- **Type hints**: Required for complex async code.
- **Dataclass vs Pydantic**: Use `@dataclass` for frames and internal pipeline data (high-frequency, no validation needed). Use Pydantic `BaseModel` for configuration, parameters, metrics, and external API data (benefits from validation and serialization). Specifically:
- `@dataclass`: Frame types, context aggregator pairs, internal data containers
- `BaseModel`: Service `InputParams`, transport/VAD/turn params, metrics data, API request/response models, serializer params
### Docstring Example
```python
class MyService(LLMService):
"""Description of what the service does.
More detailed description.
Event handlers available:
- on_connected: Called when we are connected
Example::
@service.event_handler("on_connected")
async def on_connected(service, frame):
...
"""
def __init__(self, param1: str, **kwargs):
"""Initialize the service.
Args:
param1: Description of param1.
**kwargs: Additional arguments passed to parent.
"""
super().__init__(**kwargs)
```
## Service Implementation
When adding a new service:
1. Extend the appropriate base class (`STTService`, `TTSService`, `LLMService`, etc.)
2. Implement required abstract methods
3. Handle necessary frames
4. By default, all frames should be pushed in the direction they came
5. Push `ErrorFrame` on failures
6. Add metrics tracking via `MetricsData` if relevant
7. Follow the pattern of existing services in `src/pipecat/services/`
## Testing
Test utilities live in `src/pipecat/tests/utils.py`. Use `run_test()` to send frames through a pipeline and assert expected output frames in each direction. Use `SleepFrame(sleep=N)` to add delays between frames.
@AGENTS.md

View File

@@ -28,6 +28,10 @@
## 🌐 Pipecat Ecosystem
### 🧩 Multi-agent systems
Need multiple AI agents working together? [Pipecat Subagents](https://github.com/pipecat-ai/pipecat-subagents) lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.
### 📱 Client SDKs
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
@@ -67,7 +71,7 @@ and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/api-reference/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
@@ -79,28 +83,28 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/simple-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/simple-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/storytelling-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/storytelling-chatbot/image.png" width="400" /></a>
<br/>
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/translation-chatbot"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/translation-chatbot/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat-examples/tree/main/daily-multi-translation"><img src="https://raw.githubusercontent.com/pipecat-ai/pipecat-examples/main/daily-multi-translation/image.png" width="400" /></a>&nbsp;
<a href="https://github.com/pipecat-ai/pipecat/blob/main/examples/vision/vision-moondream.py"><img src="https://github.com/pipecat-ai/pipecat/blob/main/examples/assets/moondream.png" width="400" /></a>
</p>
## 🧩 Available services
| Category | Services |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/server/services/tts/smallest), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/server/services/video/tavus), [Simli](https://docs.pipecat.ai/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/server/utilities/audio/rnnoise-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/server/services/community-integrations) |
| Category | Services |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/api-reference/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/api-reference/server/services/stt/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/api-reference/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/api-reference/server/services/stt/gladia), [Google](https://docs.pipecat.ai/api-reference/server/services/stt/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/groq), [Mistral](https://docs.pipecat.ai/api-reference/server/services/stt/mistral), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/stt/nvidia), [OpenAI (Whisper)](https://docs.pipecat.ai/api-reference/server/services/stt/openai), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/api-reference/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/api-reference/server/services/stt/whisper), [xAI](https://docs.pipecat.ai/api-reference/server/services/stt/xai) |
| LLMs | [Anthropic](https://docs.pipecat.ai/api-reference/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/api-reference/server/services/llm/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/api-reference/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/api-reference/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/api-reference/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/api-reference/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/api-reference/server/services/llm/grok), [Groq](https://docs.pipecat.ai/api-reference/server/services/llm/groq), [Inception](https://docs.pipecat.ai/api-reference/server/services/llm/inception), [Mistral](https://docs.pipecat.ai/api-reference/server/services/llm/mistral), [Nebius](https://docs.pipecat.ai/api-reference/server/services/llm/nebius), [Novita](https://docs.pipecat.ai/api-reference/server/services/llm/novita), [NVIDIA NIM](https://docs.pipecat.ai/api-reference/server/services/llm/nvidia), [Ollama](https://docs.pipecat.ai/api-reference/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/llm/openai), [OpenAI Responses](https://docs.pipecat.ai/api-reference/server/services/llm/openai-responses), [OpenRouter](https://docs.pipecat.ai/api-reference/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/api-reference/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/api-reference/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/api-reference/server/services/llm/sambanova), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/llm/sarvam), [Together AI](https://docs.pipecat.ai/api-reference/server/services/llm/together) |
| Text-to-Speech | [Async](https://docs.pipecat.ai/api-reference/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/api-reference/server/services/tts/aws), [Azure](https://docs.pipecat.ai/api-reference/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/api-reference/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/api-reference/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/api-reference/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/api-reference/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/api-reference/server/services/tts/fish), [Google](https://docs.pipecat.ai/api-reference/server/services/tts/google), [Gradium](https://docs.pipecat.ai/api-reference/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/api-reference/server/services/tts/groq), [Hume](https://docs.pipecat.ai/api-reference/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/api-reference/server/services/tts/inworld), [Kokoro](https://docs.pipecat.ai/api-reference/server/services/tts/kokoro), [LMNT](https://docs.pipecat.ai/api-reference/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/api-reference/server/services/tts/minimax), [Mistral](https://docs.pipecat.ai/api-reference/server/services/tts/mistral), [Neuphonic](https://docs.pipecat.ai/api-reference/server/services/tts/neuphonic), [NVIDIA](https://docs.pipecat.ai/api-reference/server/services/tts/nvidia), [OpenAI](https://docs.pipecat.ai/api-reference/server/services/tts/openai), [Piper](https://docs.pipecat.ai/api-reference/server/services/tts/piper), [Resemble](https://docs.pipecat.ai/api-reference/server/services/tts/resemble), [Rime](https://docs.pipecat.ai/api-reference/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/api-reference/server/services/tts/sarvam), [Smallest](https://docs.pipecat.ai/api-reference/server/services/tts/smallest), [Soniox](https://docs.pipecat.ai/api-reference/server/services/tts/soniox), [Speechmatics](https://docs.pipecat.ai/api-reference/server/services/tts/speechmatics), [xAI](https://docs.pipecat.ai/api-reference/server/services/tts/xai), [XTTS](https://docs.pipecat.ai/api-reference/server/services/tts/xtts) |
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/api-reference/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/api-reference/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/api-reference/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/api-reference/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/api-reference/server/services/s2s/ultravox), |
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/api-reference/server/services/transport/fastapi-websocket), [LiveKit (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/livekit), [SmallWebRTCTransport](https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc), [Vonage (WebRTC)](https://docs.pipecat.ai/api-reference/server/services/transport/vonage), [WebSocket Server](https://docs.pipecat.ai/api-reference/server/services/transport/websocket-server), [WhatsApp](https://docs.pipecat.ai/api-reference/server/services/transport/whatsapp), Local |
| Serializers | [Exotel](https://docs.pipecat.ai/api-reference/server/services/serializers/exotel), [Genesys](https://docs.pipecat.ai/api-reference/server/services/serializers/genesys), [Plivo](https://docs.pipecat.ai/api-reference/server/services/serializers/plivo), [Twilio](https://docs.pipecat.ai/api-reference/server/services/serializers/twilio), [Telnyx](https://docs.pipecat.ai/api-reference/server/services/serializers/telnyx), [Vonage](https://docs.pipecat.ai/api-reference/server/services/serializers/vonage) |
| Video | [HeyGen](https://docs.pipecat.ai/api-reference/server/services/video/heygen), [LemonSlice](https://docs.pipecat.ai/api-reference/server/services/transport/lemonslice), [Tavus](https://docs.pipecat.ai/api-reference/server/services/video/tavus), [Simli](https://docs.pipecat.ai/api-reference/server/services/video/simli) |
| Memory | [mem0](https://docs.pipecat.ai/api-reference/server/services/memory/mem0) |
| Vision & Image | [fal](https://docs.pipecat.ai/api-reference/server/services/image-generation/fal), [Google Imagen](https://docs.pipecat.ai/api-reference/server/services/image-generation/google-imagen), [Moondream](https://docs.pipecat.ai/api-reference/server/services/vision/moondream) |
| Audio Processing | [Silero VAD](https://docs.pipecat.ai/api-reference/server/utilities/audio/silero-vad-analyzer), [Krisp Viva](https://docs.pipecat.ai/guides/features/krisp-viva), [Koala](https://docs.pipecat.ai/api-reference/server/utilities/audio/koala-filter), [ai-coustics](https://docs.pipecat.ai/api-reference/server/utilities/audio/aic-filter), [RNNoise](https://docs.pipecat.ai/api-reference/server/utilities/audio/rnnoise-filter) |
| Analytics & Metrics | [OpenTelemetry](https://docs.pipecat.ai/api-reference/server/utilities/opentelemetry), [Sentry](https://docs.pipecat.ai/api-reference/server/services/analytics/sentry) |
| Community | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
📚 [View full services documentation →](https://docs.pipecat.ai/api-reference/server/services/supported-services)
## ⚡ Getting started
@@ -149,8 +153,8 @@ You can get started with Pipecat running on your local machine, then move your a
### Prerequisites
**Minimum Python Version:** 3.10
**Recommended Python Version:** 3.12
**Minimum Python Version:** 3.11
**Recommended Python Version:** >= 3.12
### Setup Steps

1
changelog/4052.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `VonageVideoConnectorTransport`, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.

View File

@@ -1 +0,0 @@
- ⚠️ Added WebSocket-based `OpenAIResponsesLLMService` as the new default for the OpenAI Responses API. It maintains a persistent connection to `wss://api.openai.com/v1/responses` and automatically uses `previous_response_id` to send only incremental context, falling back to full context on reconnection or cache miss. The previous HTTP-based implementation is now available as `OpenAIResponsesHttpLLMService`.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `OpenPipeLLMService` and the `openpipe` extra. OpenPipe was acquired by CoreWeave and the package is no longer maintained. If you were using `openpipe` as an LLM provider, switch to the underlying provider directly (e.g. `openai`). The OpenPipe interface can still be used with `OpenAILLMService` by specifying a `base_url`.

View File

@@ -1 +0,0 @@
- ⚠️ Updated `langchain` extra to require langchain 1.x (from 0.3.x), langchain-community 0.4.x (from 0.3.x), and langchain-openai 1.x (from 0.3.x). If you pin these packages in your project, update your pins accordingly.

View File

@@ -1 +0,0 @@
- Fixed `InworldHttpTTSService` streaming responses crashing with `UnicodeDecodeError` when multi-byte UTF-8 characters were split across chunk boundaries. This caused TTS audio to cut off mid-sentence intermittently.

View File

@@ -1 +0,0 @@
- Fixed a crash (`JSONDecodeError`) when a user interruption occurs while the LLM is streaming function call arguments. Previously, the incomplete JSON arguments were passed directly to `json.loads()`, causing an unhandled exception. Affected services: OpenAI, Google (OpenAI-compatible), and SambaNova.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `observers` field from `PipelineParams`. Pass observers directly to `PipelineTask` constructor instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `on_pipeline_ended`, `on_pipeline_cancelled`, and `on_pipeline_stopped` events from `PipelineTask`. Use `on_pipeline_finished` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `AudioBufferProcessor.user_continuous_stream` parameter. Use `user_audio_passthrough` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `camera_in_enabled`, `camera_in_is_live`, `camera_in_width`, `camera_in_height`, `camera_out_enabled`, `camera_out_is_live`, `camera_out_width`, `camera_out_height`, and `camera_out_color` transport params. Use the `video_in_*` and `video_out_*` equivalents instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `RTVIObserver.errors_enabled` parameter.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `vad_enabled` and `vad_audio_passthrough` transport params.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `TTSService.say()`. Push a `TTSSpeakFrame` into the pipeline instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `DailyRunner.configure_with_args()`. Use `PipelineRunner` with `RunnerArguments` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated RTVI models, frames, and processor methods including `RTVIConfig`, `RTVIServiceConfig`, `RTVIServiceOptionConfig`, various `RTVI*Data` models, `RTVIActionFrame`, and `RTVIProcessor.handle_function_call`/`handle_function_call_start`. Use the updated RTVI processor API instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `FrameProcessor.wait_for_task()`. Use `create_task()` and manage tasks with the built-in `TaskManager` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `KrispFilter`. The `krisp` extra has been removed from `pyproject.toml`.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `LLMService.request_image_frame()`. Push a `UserImageRequestFrame` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `create_default_resampler()` from `pipecat.audio.utils`.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `FalSmartTurnAnalyzer` and `LocalSmartTurnAnalyzer`.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated transport frames: `TransportMessageFrame`, `TransportMessageUrgentFrame`, `InputTransportMessageUrgentFrame`, `DailyTransportMessageFrame`, and `DailyTransportMessageUrgentFrame`. Use `OutputTransportMessageFrame`, `OutputTransportMessageUrgentFrame`, `InputTransportMessageFrame`, `DailyOutputTransportMessageFrame`, and `DailyOutputTransportMessageUrgentFrame` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `KeypadEntryFrame` alias.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated interruption frames: `StartInterruptionFrame` and `BotInterruptionFrame`. Use `InterruptionFrame` and `InterruptionTaskFrame` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `LLMService.start_callback` parameter. Register an `on_llm_response_start` event handler instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed single-argument function call support from `LLMService`. Functions must use named parameters instead of a single `arguments` parameter.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `NoisereduceFilter`. Use system-level noise reduction or a service-based alternative instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.riva` package. Use `pipecat.services.nvidia.stt` and `pipecat.services.nvidia.tts` instead (`RivaSTTService``NvidiaSTTService`, `RivaTTSService``NvidiaTTSService`).

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.nim` package. Use `pipecat.services.nvidia.llm` instead (`NimLLMService``NvidiaLLMService`).

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.gemini_multimodal_live` package. Use `pipecat.services.google.gemini_live` instead. Note that class names no longer include "Multimodal" (e.g. `GeminiMultimodalLiveLLMService``GeminiLiveLLMService`).

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.aws_nova_sonic` package. Use `pipecat.services.aws.nova_sonic` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.openai_realtime` package. Use `pipecat.services.openai.realtime` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `OpenAIRealtimeBetaLLMService` and `AzureRealtimeBetaLLMService`. Use `OpenAIRealtimeLLMService` and `AzureRealtimeLLMService` from `pipecat.services.openai.realtime` and `pipecat.services.azure.realtime` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.deepgram.stt_sagemaker` and `pipecat.services.deepgram.tts_sagemaker` modules. Use `pipecat.services.deepgram.sagemaker.stt` and `pipecat.services.deepgram.sagemaker.tts` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `GoogleLLMOpenAIBetaService` from `pipecat.services.google.openai`. Use `GoogleLLMService` from `pipecat.services.google.llm` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.google.llm_vertex` module. Use `pipecat.services.google.vertex.llm` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.google.gemini_live.llm_vertex` module. Use `pipecat.services.google.gemini_live.vertex.llm` instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated `pipecat.services.ai_services` module. Import from `pipecat.services.ai_service`, `pipecat.services.llm_service`, `pipecat.services.stt_service`, `pipecat.services.tts_service`, etc. instead.

View File

@@ -1 +0,0 @@
- Changed `GrokLLMService` default model from `grok-3-beta` to `grok-3`, now that the model is generally available.

View File

@@ -1 +0,0 @@
- `GoogleImageGenService` now defaults to `imagen-4.0-generate-001` (previously `imagen-3.0-generate-002`).

View File

@@ -1 +0,0 @@
- ⚠️ `BaseOpenAILLMService.get_chat_completions()` now accepts an `LLMContext` instead of `OpenAILLMInvocationParams`. If you override this method, update your signature accordingly.

View File

@@ -1,22 +0,0 @@
- ⚠️ Removed deprecated service-specific context and aggregator machinery, which was superseded by the universal `LLMContext` system.
Service-specific classes removed: `AnthropicLLMContext`, `AnthropicContextAggregatorPair`, `AWSBedrockLLMContext`, `AWSBedrockContextAggregatorPair`, `OpenAIContextAggregatorPair`, and their user/assistant aggregators. Also removed `create_context_aggregator()` from `LLMService`, `OpenAILLMService`, `AnthropicLLMService`, and `AWSBedrockLLMService`.
Base aggregator classes removed (from `pipecat.processors.aggregators.llm_response`): `BaseLLMResponseAggregator`, `LLMContextResponseAggregator`, `LLMUserContextAggregator`, `LLMAssistantContextAggregator`, `LLMUserResponseAggregator`, `LLMAssistantResponseAggregator`.
From the developer's point of view, migrating will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```python
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated frame types `LLMMessagesFrame` and `OpenAILLMContextAssistantTimestampFrame` from `pipecat.frames.frames`. Instead of `LLMMessagesFrame`, use `LLMContextFrame` with the new messages, or `LLMMessagesUpdateFrame` with `run_llm=True`.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `GatedOpenAILLMContextAggregator` (from `pipecat.processors.aggregators.gated_open_ai_llm_context`). Use `GatedLLMContextAggregator` (from `pipecat.processors.aggregators.gated_llm_context`) instead.

View File

@@ -1 +0,0 @@
- ⚠️ Removed `VisionImageFrameAggregator` (from `pipecat.processors.aggregators.vision_image_frame`). Vision/image handling is now built into `LLMContext` (from `pipecat.processors.aggregators.llm_context`). See the `12*` examples for the recommended replacement pattern.

View File

@@ -1 +0,0 @@
- ⚠️ Removed deprecated compatibility modules: `pipecat.services.openai_realtime_beta` (use `pipecat.services.openai.realtime`), `pipecat.services.openai_realtime.context`, `pipecat.services.openai_realtime.frames`, `pipecat.services.openai.realtime.context`, `pipecat.services.openai.realtime.frames`, `pipecat.services.gemini_multimodal_live` (use `pipecat.services.google.gemini_live`), `pipecat.services.aws_nova_sonic.context` (use `pipecat.services.aws.nova_sonic`), `pipecat.services.google.openai` and `pipecat.services.google.llm_openai` (use `pipecat.services.google.llm`).

View File

@@ -1,18 +0,0 @@
- ⚠️ Removed `OpenAILLMContext`, `OpenAILLMContextFrame`, and `OpenAILLMContext.from_messages()`. Use `LLMContext` (from `pipecat.processors.aggregators.llm_context`) and `LLMContextFrame` (from `pipecat.frames.frames`) instead. All services now exclusively use the universal `LLMContext`.
From the developer's point of view, migrating will usually be a matter of going from this:
```python
context = OpenAILLMContext(messages, tools)
context_aggregator = llm.create_context_aggregator(context)
```
To this:
```python
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import LLMContextAggregatorPair
context = LLMContext(messages, tools)
context_aggregator = LLMContextAggregatorPair(context)
```

1
changelog/4306.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's `TTSTextFrame` to arrive after `TTSStoppedFrame`. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.

View File

@@ -0,0 +1 @@
- Fixed `BaseOutputTransport` reordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.

View File

@@ -0,0 +1 @@
- Fixed Cartesia word timestamps leaking SSML tag text (e.g. `<spell>`, `<emotion>`, `<break>`) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.

View File

@@ -0,0 +1 @@
- Fixed `TTSTextFrame` entries losing their original text structure when word timestamps are enabled. Each `TTSTextFrame` now carries a `raw_text` field containing the corresponding span of the original LLM-produced text (including pattern delimiters such as `<card>4111 1111 1111 1111</card>`), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.

1
changelog/4380.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed skipped TTS frames (e.g. code blocks filtered via `skip_aggregator_types`) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.

1
changelog/4423.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `InceptionLLMService` for Inception's Mercury 2 diffusion reasoning model, with support for `reasoning_effort` and `realtime` settings.

View File

@@ -0,0 +1 @@
- Added `GET /status` endpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via `-t`).

1
changelog/4442.added.md Normal file
View File

@@ -0,0 +1 @@
- Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the `/ws-client` endpoint alongside other transports.

View File

@@ -0,0 +1 @@
- ⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The `/start` endpoint accepts a `"transport"` field to select the transport per-request; omitting `-t` at startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved from `GET /` to `GET /daily`.

1
changelog/4493.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `pipecat.workers`, a worker-based agent framework folded in from the standalone `pipecat-subagents` package. Workers inherit from `BaseWorker`, share a `WorkerBus`, register in a `WorkerRegistry`, and exchange typed work via `@job` handlers. `LLMWorker` and `LLMContextWorker` provide ready-made LLM-driven workers. `PipelineRunner.spawn(worker)` registers fire-and-forget workers alongside the main pipeline worker.

View File

@@ -0,0 +1 @@
- ⚠️ `FrameProcessorSetup.pipeline_worker` and `FunctionCallParams.pipeline_worker` are now mandatory fields, and `FrameProcessor.pipeline_worker` raises if read before `setup()` instead of returning `None`. Real-world code (frame processors set up by `PipelineWorker`, tool handlers invoked by `LLMService`) is unaffected; only callers that construct these dataclasses by hand (typically tests) now have to supply a `pipeline_worker` reference.

View File

@@ -0,0 +1 @@
- `PipelineWorker` now inherits from `BaseWorker`, so every pipeline worker is also a bus participant. It accepts a new optional `bridged=()` parameter that auto-wraps the pipeline with bus edge processors, letting the worker exchange frames with other bridged workers over the shared `WorkerBus`. The bus is supplied by `PipelineRunner` via `worker.attach(registry=..., bus=...)` instead of through the constructor.

1
changelog/4507.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed `ElevenLabsSTTService` crashing when `language` was passed as `None`. When `language` is not set, the service now lets ElevenLabs auto-detect the audio language.

1
changelog/4514.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing `ServiceSwitcher` failover to keep agents running.

1
changelog/4521.added.md Normal file
View File

@@ -0,0 +1 @@
- Added `max_endpoint_delay_ms` to `SonioxSTTService.Settings`, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.

View File

@@ -0,0 +1 @@
- `SonioxSTTService` now applies settings updates (e.g. via `STTUpdateSettingsFrame`) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.

View File

@@ -0,0 +1 @@
- Removed the unsupported Georgian (`Language.KA`) language mapping from `SonioxSTTService`.

View File

@@ -0,0 +1 @@
- Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.

View File

@@ -0,0 +1 @@
- Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.

1
changelog/4524.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.

1
changelog/4527.fixed.md Normal file
View File

@@ -0,0 +1 @@
- Fixed a race in `ElevenLabsTTSService` where the periodic keepalive could be sent for a new turn's context before that context's `voice_settings` initialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (`voice_settings field must be provided in the first message ...`). The keepalive now only targets a context once its context-init has been sent.

View File

@@ -0,0 +1 @@
- Bumped `pipecat-ai-prebuilt` to 1.0.1 in the `runner` extra, updating the prebuilt client UI served by the development runner.

View File

@@ -5,7 +5,7 @@
{% for text, values in sections[section][category].items() %}
{{ text }}
(PR {{ values|join(', ') }})
(PR {{ values|join(', ') }})
{% endfor %}
{% endfor %}

View File

@@ -0,0 +1 @@
- Added `LLMService.append_system_instruction(...)`, which composes durable text onto a user-provided system instruction (alongside the turn-completion and async-tool-cancellation instructions) so it is prepended on every inference and survives context-message resets.

3
changelog/xxxx.added.md Normal file
View File

@@ -0,0 +1,3 @@
- Added `pipecat.workers.ui.UIWorker`, an `LLMContextWorker` that observes and drives a client GUI over the RTVI UI channel: it stores live accessibility snapshots, auto-injects `<ui_state>` into the LLM context before every inference (via the LLM's `on_before_process_frame` hook), dispatches client events to `@on_ui_event` handlers, and sends UI commands (`scroll_to`, `highlight`, `select_text`, `click`, `set_input_value`) back to the client. The optional `ReplyToolMixin` exposes a bundled `reply` tool, and `user_job_group(...)` surfaces fan-out work to the client as cancellable task cards. A native RTVI⇄bus UI bridge is built into `PipelineWorker` (active whenever RTVI is enabled), so no decorator or manual wiring is needed: inbound UI messages are broadcast on the bus as `BusUIEventMessage`, and outbound `BusUICommandMessage` / `BusUITask*` carriers are translated into RTVI frames for the client.
- `UIWorker` auto-injects the UI wire-format guide (`UI_STATE_PROMPT_GUIDE`) into its LLM's system instruction by default, via a `prompt_guide` parameter — pass your own string to override the guide, or `None` to disable. Apps no longer need to concatenate `UI_STATE_PROMPT_GUIDE` into the LLM's `system_instruction` by hand.

View File

@@ -1,108 +1,60 @@
# Pipecat Documentation
# Pipecat API Documentation
This directory contains the source files for auto-generating Pipecat's server API reference documentation.
## Setup
1. Install documentation dependencies:
```bash
pip install -r requirements.txt
```
2. Make the build scripts executable:
```bash
chmod +x build-docs.sh rtd-test.py
```
This directory contains the source files for auto-generating Pipecat's API reference documentation.
## Building Documentation
From this directory, you can build the documentation in several ways:
### Local Build
From this directory:
```bash
# Using the build script (automatically opens docs when done)
./build-docs.sh
# Build docs (warnings shown but don't fail the build)
cd docs/api && uv run ./build-docs.sh
# Or directly with sphinx-build
sphinx-build -b html . _build/html -W --keep-going
# Build with strict mode (warnings treated as errors)
cd docs/api && uv run ./build-docs.sh --strict
```
### ReadTheDocs Test Build
The build script will:
To test the documentation build process exactly as it would run on ReadTheDocs:
```bash
./rtd-test.py
```
This script:
- Creates a fresh virtual environment
- Installs all dependencies as specified in requirements files
- Handles conflicting dependencies (like grpcio versions for Riva)
- Builds the documentation in an isolated environment
- Provides detailed logging of the build process
Use this script to verify your documentation will build correctly on ReadTheDocs before pushing changes.
## Viewing Documentation
The built documentation will be available at `_build/html/index.html`. To open:
```bash
# On MacOS
open _build/html/index.html
# On Linux
xdg-open _build/html/index.html
# On Windows
start _build/html/index.html
```
1. Install documentation dependencies via `uv sync --group docs`
2. Clean previous build output
3. Run `sphinx-build` to generate HTML documentation
4. Open the result in your browser (macOS)
## Directory Structure
```
.
├── api/ # Auto-generated API documentation
├── _build/ # Built documentation
├── _static/ # Static files (images, css, etc.)
├── conf.py # Sphinx configuration
├── api/ # Auto-generated API documentation (created during build)
├── _build/ # Built documentation output
├── conf.py # Sphinx configuration (mock imports, extensions, etc.)
├── index.rst # Main documentation entry point
├── requirements-base.txt # Base documentation dependencies
├── requirements-riva.txt # Riva-specific dependencies
├── build-docs.sh # Local build script
└── rtd-test.py # ReadTheDocs test build script
└── rtd-test.sh # ReadTheDocs test build script (uses pip, not uv)
```
## Notes
## How It Works
- Documentation is auto-generated from Python docstrings
- Service modules are automatically detected and included
- The build process matches our ReadTheDocs configuration
- Warnings are treated as errors (-W flag) to maintain consistency
- The --keep-going flag ensures all errors are reported
- Dependencies are split into multiple requirements files to handle version conflicts
- `conf.py` runs `sphinx-apidoc` during Sphinx's `setup()` phase to generate `.rst` files from Python source
- Sphinx autodoc imports each module to extract docstrings
- Modules with unavailable dependencies are listed in `autodoc_mock_imports` in `conf.py`
- Napoleon extension converts Google-style docstrings to reStructuredText
## Troubleshooting
If you encounter missing service modules:
**Module not appearing in docs:**
1. Verify the service is installed with its extras: `pip install pipecat-ai[service-name]`
2. Check the build logs for import errors
3. Ensure the service module is properly initialized in the package
4. Run `./rtd-test.py` to test in an isolated environment matching ReadTheDocs
1. Check the build output for `autodoc: failed to import` warnings
2. If the module has an unresolvable import dependency, add it to `autodoc_mock_imports` in `conf.py`
3. Verify the module is importable: `uv run python -c "import pipecat.module.name"`
For dependency conflicts:
**Duplicate object warnings:**
1. Check the requirements files for version specifications
2. Use `rtd-test.py` to verify dependency resolution
3. Consider adding service-specific requirements files if needed
These come from re-export modules or Sphinx discovering the same class through multiple import paths. Usually cosmetic.
For more information:
**Docstring formatting warnings:**
- [ReadTheDocs Configuration](.readthedocs.yaml)
- [Sphinx Documentation](https://www.sphinx-doc.org/)
Docstrings use reStructuredText, not Markdown. Common issues:
- Use `Example::` with indented code blocks, not `` ```python ``
- Ensure blank lines between directive content and subsequent sections
- Use `Parameters:` (not `Attributes:`) for dataclass field documentation to avoid duplicate entries

View File

@@ -1,8 +1,16 @@
#!/bin/bash
# Usage: ./build-docs.sh [--strict]
# --strict: Treat warnings as errors (default: warnings only)
SPHINX_OPTS=""
if [ "$1" = "--strict" ]; then
SPHINX_OPTS="-W --keep-going"
fi
# Build docs using uv
echo "Installing dependencies with uv..."
uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra riva --no-extra mlx-whisper
uv sync --group docs --all-extras --no-extra gstreamer --no-extra local_smart_turn --no-extra moondream --no-extra mlx-whisper
# Check if sphinx-build is available
if ! uv run sphinx-build --version &> /dev/null; then
@@ -14,8 +22,7 @@ fi
rm -rf _build
echo "Building documentation..."
# Build docs matching ReadTheDocs configuration
uv run sphinx-build -b html -d _build/doctrees . _build/html -W --keep-going
uv run sphinx-build -b html -d _build/doctrees . _build/html $SPHINX_OPTS
if [ $? -eq 0 ]; then
echo "Documentation built successfully!"

View File

@@ -4,6 +4,19 @@ import sys
from datetime import datetime
from pathlib import Path
# Fix Pydantic v2 + Sphinx autodoc incompatibility: ConfigDict(extra="allow") fails
# during Sphinx's import because __pydantic_extra__ annotation on BaseModel resolves to
# `Dict[str, Any] | None` whose get_origin() is Union, not dict. Patch the check to
# accept Union-wrapped dict types (i.e., Optional[Dict[str, Any]]).
import pydantic._internal._generate_schema as _pydantic_gs
_ORIG_DICT_TYPES = _pydantic_gs.DICT_TYPES
# Expand the accepted types to include Union (Optional[Dict[str, Any]])
import types
import typing
_pydantic_gs.DICT_TYPES = [*_ORIG_DICT_TYPES, typing.Union, types.UnionType]
# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger("sphinx-build")
@@ -76,16 +89,6 @@ autodoc_mock_imports = [
"einops",
"intel_extension_for_pytorch",
"huggingface_hub",
# riva dependencies
"riva",
"riva.client",
"riva.client.Auth",
"riva.client.ASRService",
"riva.client.StreamingRecognitionConfig",
"riva.client.RecognitionConfig",
"riva.client.AudioEncoding",
"riva.client.proto.riva_tts_pb2",
"riva.client.SpeechSynthesisService",
# MLX dependencies (Apple Silicon specific)
"mlx",
"mlx_whisper", # Note: might need underscore format too
@@ -107,6 +110,8 @@ autodoc_mock_imports = [
"fastapi.middleware",
"fastapi.responses",
"uvicorn",
# Deepgram dependencies
"deepgram",
]
# HTML output settings
@@ -133,6 +138,8 @@ def import_core_modules():
"pipecat.runner",
"pipecat.serializers",
"pipecat.transcriptions",
"pipecat.turns",
"pipecat.extensions",
"pipecat.utils",
]
@@ -177,7 +184,6 @@ def setup(app):
logger.info(f"Source directory: {source_dir}")
excludes = [
str(project_root / "src/pipecat/pipeline/to_be_updated"),
str(project_root / "src/pipecat/examples"),
str(project_root / "src/pipecat/tests"),
"**/test_*.py",

View File

@@ -32,4 +32,5 @@ Quick Links
Services <api/pipecat.services>
Transcriptions <api/pipecat.transcriptions>
Transports <api/pipecat.transports>
Turns <api/pipecat.turns>
Utils <api/pipecat.utils>

View File

@@ -1,5 +1,5 @@
# AI-COUSTICS
AICOUSTICS_LICENSE_KEY=...
AIC_LICENSE_KEY=...
# Anthropic
ANTHROPIC_API_KEY=...
@@ -91,6 +91,9 @@ HEYGEN_LIVE_AVATAR_API_KEY=...
HUME_API_KEY=...
HUME_VOICE_ID=...
# Inception
INCEPTION_API_KEY=...
# Inworld
INWORLD_API_KEY=...
@@ -132,6 +135,10 @@ NOVITA_API_KEY=...
# NVIDIA
NVIDIA_API_KEY=...
# For a full example of how to deploy to SageMaker, see:
# https://github.com/pipecat-ai/pipecat-examples/tree/main/nvidia_sagemaker_example/deployment/aws-sagemaker-nvidia
SAGEMAKER_ASR_ENDPOINT_NAME=...
SAGEMAKER_MAGPIE_ENDPOINT_NAME=...
# OpenAI
OPENAI_API_KEY=...
@@ -207,6 +214,11 @@ TWILIO_AUTH_TOKEN=...
# Ultravox Realtime
ULTRAVOX_API_KEY=...
# Vonage
VONAGE_APPLICATION_ID=...
VONAGE_SESSION_ID=...
VONAGE_TOKEN=...
# WhatsApp
WHATSAPP_TOKEN=...
WHATSAPP_WEBHOOK_VERIFICATION_TOKEN=...
@@ -214,4 +226,10 @@ WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_APP_SECRET=...
# xAI / Grok
XAI_API_KEY=...
XAI_API_KEY=...
# PIPECAT_SCTP_MAX_CHUNK_SIZE controls the maximum SCTP DATA-chunk payload
# size (bytes) used by aiortc's data channel. The default is 1100.
# All the details here:
# https://docs.pipecat.ai/api-reference/server/services/transport/small-webrtc#pipecat_sctp_max_chunk_size
#PIPECAT_SCTP_MAX_CHUNK_SIZE=1100

View File

@@ -16,7 +16,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame, MixerEnableFrame, MixerUpdateSettingsFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -34,7 +34,7 @@ from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
load_dotenv(override=True)
OFFICE_SOUND_FILE = os.path.join(
os.path.dirname(__file__), "assets", "office-ambience-24000-mono.mp3"
os.path.dirname(__file__), "../assets", "office-ambience-24000-mono.mp3"
)
# We use lambdas to defer transport parameter creation until the transport
@@ -71,17 +71,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
@@ -105,7 +105,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -120,27 +120,27 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Listening for background sound for a bit...")
await asyncio.sleep(5.0)
logger.info(f"Reducing volume...")
await task.queue_frame(MixerUpdateSettingsFrame({"volume": 0.5}))
await worker.queue_frame(MixerUpdateSettingsFrame({"volume": 0.5}))
await asyncio.sleep(5.0)
logger.info(f"Disabling background sound for a bit...")
await task.queue_frame(MixerEnableFrame(False))
await worker.queue_frame(MixerEnableFrame(False))
await asyncio.sleep(5.0)
logger.info(f"Re-enabling background sound and starting bot...")
await task.queue_frame(MixerEnableFrame(True))
await worker.queue_frame(MixerEnableFrame(True))
# Kick off the conversation.
context.add_message(
{"role": "developer", "content": "Please introduce yourself to the user."}
)
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -54,7 +54,7 @@ from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -108,17 +108,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"), audio_passthrough=True)
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"], audio_passthrough=True)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121",
),
)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
@@ -146,7 +146,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
params=PipelineParams(
enable_metrics=True,
@@ -161,12 +161,12 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
# Start recording audio
await audiobuffer.start_recording()
# Start conversation - empty prompt to let LLM follow system instructions
await task.queue_frames([LLMRunFrame()])
await worker.queue_frames([LLMRunFrame()])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
# Handler for merged audio
@audiobuffer.event_handler("on_audio_data")
@@ -191,7 +191,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
await save_audio_file(bot_audio, bot_filename, sample_rate, 1)
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

View File

@@ -20,7 +20,7 @@ from pipecat.frames.frames import (
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.worker import PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
@@ -102,17 +102,17 @@ transport_params = {
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info(f"Starting bot")
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(
system_instruction="You are a helpful assistant in a voice conversation. Your responses will be spoken aloud, so avoid emojis, bullet points, or other formatting that can't be spoken. Respond to what the user said in a creative, helpful, and brief way.",
),
)
tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice="71a7ad14-091c-4e8e-a314-022ece01c121", # British Reading Lady
),
@@ -144,7 +144,7 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
]
)
task = PipelineTask(
worker = PipelineWorker(
pipeline,
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@@ -153,17 +153,17 @@ async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
async def on_client_connected(transport, client):
logger.info(f"Client connected")
# Kick off the conversation.
await task.queue_frame(TTSSpeakFrame("Hi, I'm listening!"))
await worker.queue_frame(TTSSpeakFrame("Hi, I'm listening!"))
await transport.send_audio(sounds["ding1.wav"])
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info(f"Client disconnected")
await task.cancel()
await worker.cancel()
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
await runner.run(task)
await runner.run(worker)
async def bot(runner_args: RunnerArguments):

Some files were not shown because too many files have changed in this diff Show More