Empirical testing showed the previous design — grafting a verbose
re-invocation reminder into the payload's `description` field for
started and intermediate messages — was actually making Nova Sonic
*worse*, not better: more spurious re-invocations of the same tool,
not fewer. Plausibly the long, instruction-shaped description text
reads as content the model has to respond to, where a terse status
update reads as ambient state.
Replace the reminder grafting with a caller-supplied `template`
keyword argument on `prepare_message_payload_for_realtime`. When
`None` (the default), the payload is serialized to its canonical
JSON form. When provided, `template.format(tool_call_id=…, status=…,
result=…, description=…)` is applied. The template is honored across
all kinds, so callers route per kind based on which wire channel
they're using.
Nova Sonic now defines its own bracketed plain-text template
(`_ASYNC_TOOL_RESULT_TEXT_TEMPLATE`) and applies it on the
cross-modal user-text channel (intermediate / final). The started
path stays on raw JSON (the formal AWS tool-result channel requires
valid JSON). A code comment at the template constant captures the
empirical finding for the next person — short framing yields much
better behavior, surprising as it sounds.
Tests updated for the new template behavior across all kinds. Also
reverts a stream-tool example sleep-duration tweak (20s → 10s) and
adds a commented-out alternative in the function-calling-openai-async-stream
example for parallel testing.
Reshape the helper module so AsyncToolMessagePayload is the canonical
in-memory form and the on-the-wire JSON is always derived from it
(never stored). This eliminates a drift risk that came with caching
the JSON in raw_content, and it lets prepare_message_payload_for_realtime
edit the payload (graft the re-invocation reminder into 'description')
and then serialize cleanly — which fixes a 'Tool Response parsing error'
from AWS Nova Sonic that was caused by wrapping the JSON with extra
prose.
Other changes:
- Builders construct an AsyncToolMessagePayload internally and convert
via shared private _payload_to_message and _payload_to_json helpers
(centralizing field-omission rules, e.g. no 'result' on 'started').
- prepare_message_payload_for_realtime replaces format_text_for_provider,
dispatching to per-kind helpers. Reminder is now appended after the
canonical description so the model reads the protocol explanation
first and the directive flows from it.
- Final-result payloads are pass-through; the task is done at that
point and re-invocation is no longer a mistake.
- Stream-tool example: lengthen intermediate sleeps 10s → 20s for more
interesting empirical testing.
Add support to AWSNovaSonicLLMService for the new "async tool call"
mechanism activated by `cancel_on_interruption=False`, which includes:
- delivering results asynchronously
- delivering result streams
- cancelling running async tools
Note that the introduction of the new mechanism had actually caused a
regression in AWS Nova Sonic, which previously supported
`cancel_on_interruption=False` with the old mechanism (simply avoiding
discarding tool calls on interruptions).
Support for the other major realtime services (`GeminiLiveLLMService`,
`OpenAIRealtimeLLMService`) will follow in a separate PR — Gemini Live
in particular needs more work before it can support long-running tool
calls reliably.
The pyright job in `format.yaml` previously installed only `--extra
daily --extra tracing`. That was sufficient when most optional-dep-
using files were in the pyright ignore list, but as this PR has
cleared dozens of files, those files now reference symbols from
optional-dep modules (`aiortc.RTCIceServer` via `IceServer`,
`google.genai.types.HttpOptions`, etc.). `reportMissingImports: false`
tolerates the failed imports themselves, but the imported names
become `Unknown` and using them as type expressions trips
`reportInvalidTypeForm` / `reportAttributeAccessIssue` — errors
that aren't gated by that flag.
Switch to `--all-extras --no-extra gstreamer --no-extra local`
(matching the dev setup in README.md), so pyright sees the same
dependency set the code is intended to be type-checked against and
the install-set scales naturally as more files leave the ignore list.
Also reconcile CLAUDE.md's setup command, which only excluded
`gstreamer`. README.md is canonical and additionally excludes
`local` (pyaudio requires `portaudio` native libs that aren't
installed by default on a clean Ubuntu CI runner).
A fourth pass over low-error-count files. Drops 8 files (57 → 49) and
full-pyright errors from 525 → 496. Default pyright stays clean.
Optional access on transport/client receivers (4 files). Same fix
shape as #4359 — a receiver typed `X | None` accessed without a
guard. For "should never happen" cases (caller's lifecycle ensures
the field is non-None when the method runs), used `assert` rather
than silent early-return so an invariant violation surfaces loudly:
- `transports/whatsapp/client.py` (5 errors): `_validate_whatsapp_webhook_request`
was typed `bytes` / `str` but called with `bytes | None` / `str | None`.
Widened the helper signature and pushed the explicit None-check
inside (matching its existing empty-string check). Also handled
`pipecat_connection.get_answer()` returning `None` — would have
crashed at `.get("sdp")` before.
- `transports/websocket/client.py` (5 errors): four are the deprecated
`websockets.WebSocketClientProtocol` alias (same `# pyright: ignore[reportAttributeAccessIssue]`
as the `services/websocket_service.py` fix from earlier in this PR).
The fifth was `async for message in self._websocket` — traced the
call chain and confirmed `_client_task` is created only after
`self._websocket` is assigned and cancelled before it's cleared, so
the field is never None when `_client_task_handler` runs. Used `assert`.
- `services/openai/stt.py` (4 errors): same pattern. `_receive_messages`
is started by `_connect()` only when `self._websocket` is set, and
the reconnect loop in `WebsocketService._receive_task_handler`
re-establishes it before each retry. `assert` at entry. Plus L478/L483:
the `try`/`except ModuleNotFoundError` import-guard makes
`websocket_connect` and `State` `<type> | None`; `__init__` already
raises `ImportError` if either is None, so an `assert` at the
`_connect_websocket` use site is honest. Plus an L538 `Language | str`
cast (same shape as last batch).
- `services/deepgram/flux/base.py` (2 errors): `event = data.get("event")`
flowed into `_handle_turn_resumed(event: str)` as `Any | None`.
Tightened with an `isinstance(event, str)` guard before the
`FluxEventType(event)` lookup. The other error (`average_confidence > min_confidence`
where `min_confidence: float | None`) was a latent crash on missing
confidence data — restored the original `not min_confidence` (which
treats both `None` and `0.0` as "no filter") and added an explicit
drop-on-missing-confidence-data branch.
`gemini_live` Settings/InputParams (vertex). The deprecated `InputParams`
declares `modalities: GeminiModalities | None` and `media_resolution: GeminiMediaResolution | None`,
but their downstream usage at `services/google/gemini_live/llm.py:952,959`
calls `.value` on each — `None` would crash. Rather than touching the
deprecated input model, translate `None` to the canonical defaults
(`GeminiModalities.AUDIO`, `GeminiMediaResolution.UNSPECIFIED`) at the
assignment site in `vertex/llm.py`. Also fixed an unrelated annotation
bug: `_get_credentials` was annotated `-> str` but actually returns
`service_account.Credentials` (used correctly by the caller — only
the annotation was wrong).
`moondream/vision.py` (3 errors). `frame.format` is `str | None` but
`Image.frombytes(mode, ...)` requires `str`; raise instead of crashing
on missing format. The other two errors are pyright thinking the
moondream2-custom `encode_image` and `query` methods are `Tensor`
(rather than callables) — those are provided by the model code via
`trust_remote_code=True` and aren't visible to pyright on the base
`AutoModelForCausalLM` type. Scoped `# pyright: ignore[reportCallIssue]`
on the two call sites.
`transports/base_output.py` (3 errors). Two are `self._mixer.mix(...)`
calls in `with_mixer`, a closure invoked only when `self._mixer` is
truthy at the call site — captured the mixer to a local variable
inside the closure with an `assert`, then used that. Third is the
PIL `frombytes(mode, ...)` shape — `frame.format is None` early-
return guard at the top of `resize_frame` so the main resize logic
reads cleanly.
`elevenlabs/tts.py` (4 errors). The payload-building dict at L1271
was typed `dict[str, str | dict[str, float | bool]]` — an aspirational
shape that matched only the first two assignments. Subsequent code
assigned `list[dict[...]]` (pronunciation locators) and bools, all
violating the annotation. Same pattern at L926 (the WebSocket-init
`msg`). Both widened to `dict[str, Any]`, which is the honest shape
for a JSON request payload and what similar code uses elsewhere.
Files dropped from the ignore list (57 → 49):
services/deepgram/flux/base.py, services/elevenlabs/tts.py,
services/google/gemini_live/vertex/llm.py,
services/moondream/vision.py, services/openai/stt.py,
transports/base_output.py, transports/websocket/client.py,
transports/whatsapp/client.py.
A third pass over low-error-count files in the ignore list. Drops 10
files (67 → 57) and full-pyright errors from 555 → 525. Default
pyright stays clean.
Optional access guards (4 files). The same fix shape as 9e9b1f39e:
a receiver typed `X | None` accessed without a guard, fixed with a
local-var capture or an early return.
- `mistral/stt.py`: `_connection.send_audio` could crash if
`_connect()` swallowed an exception and left `_connection` unset;
drop the audio chunk with a warning instead. `_receive_events`
iterating `_connection.events()` got the same defensive narrowing.
- `deepgram/flux/stt.py`: `_websocket_url` is set in `_connect`
before `_connect_websocket` is called, but pyright doesn't track
that across methods — assert at the use site. `websocket.response`
is `Response | None` in the websockets stubs even though it's
always populated post-handshake; guarded with a fallback.
- `audio/filters/rnnoise_filter.py`: the module-level import sets
`RNNoise` to `None` if `pyrnnoise` isn't installed; raise
`ImportError` explicitly instead of relying on the existing try-
block to catch the `None(...)` call. Also gated `filter()` with
`or self._rnnoise is None` so pyright sees the narrowing.
- `transports/smallwebrtc/request_handler.py`: `get_answer()`
legitimately returns `None`; raise instead of crashing on three
subscript accesses.
`TTSService` `audio-context` API tightening. Mirroring the
`append_to_audio_context` fix from the previous batch:
`remove_audio_context` was typed `str` but is called with `str | None`
from `get_active_audio_context_id()` results. Widened to `str | None`
and the `None` handling lives in the function body (early debug log
+ return) — matching `append_to_audio_context`'s shape.
`audio_context_available` keeps its narrow `str` signature; asking
"is `None` available?" isn't a meaningful question (`_audio_contexts`
is `dict[str, asyncio.Queue]`). The internal call site in
`on_turn_context_completed` narrows `_turn_context_id` explicitly
before passing it. Side effect: deepgram/tts.py's L307 error clears
without local changes.
`deepgram/tts.py` (4 errors → 0): the same `push_error(ErrorFrame(...))`
latent bug we fixed in resembleai earlier in this PR — `push_error`
takes a string; there's a separate `push_error_frame` for frames.
Two sites switched. The Optional `_websocket.response` access is
guarded the same way as deepgram/flux/stt.py. The `remove_audio_context`
error was cleared by the tightening above.
`aws/utils.py` (3 errors → 0): `AWSTranscribePresignedURL` declared
`session_token: str` but the dict source is `str | None` (AWS
supports long-term IAM creds without a session token). Same for
`vocabulary_name`/`vocabulary_filter_name` on `get_request_url`,
which were typed `str = ""` even though the body uses truthy checks
to skip them. Widened to `str | None = None` — matches actual
runtime semantics.
`audio/dtmf/utils.py` (2 errors → 0): `files("...").joinpath(...)`
returns a `Traversable`, but `aiofiles.open` wants a real path. For
regular pip installs this worked in practice (Traversable was a
`Path`), but it would fail for zipped distributions (zipapp,
zipimport) where the resource isn't on disk. Wrapped in
`importlib.resources.as_file(...)` — the canonical bridge that
extracts to a temp file when the resource isn't already on the
filesystem. Validated end-to-end: regular install still reads bytes;
ad-hoc zipapp test confirmed `as_file` extracts the resource and
returns a real Path.
`openai/image.py` (2 errors → 0): the `size` arg to
`images.generate` is `Literal[...] | None` in the SDK but our
settings field is `str | None`. Mirrored the `groq/tts.py`
hint-not-constraint pattern from the previous batch: defined a
module-level `OpenAIImageSize = Literal[...]` alias with a comment
attributing the upstream symbol and documenting the cast contract
(callers can pass any string; invalid values surface as an OpenAI
API error). Also guarded `image.data[0]` (response.data is
`list[Image] | None`).
`processors/frameworks/{langchain,strands_agents}.py` (4 + 4 → 0):
both processors do `messages[-1]["content"]` on a value typed
`LLMStandardMessage | LLMSpecificMessage` (the latter is a dataclass,
not a dict, so `__getitem__` errors). Historically these only
handled plain-text user messages, so the fix is two explicit guards
(skip if the last message isn't a dict; skip if `content` isn't a
string) plus a TODO noting that other shapes (multi-modal content,
provider-specific messages) aren't supported yet. langchain's
`__get_token_value` also got a small fix where `AIMessageChunk.content`
is `str | list[parts]` but the function declares `-> str`; stringify
the list case. strands_agents' surfaced two unrelated narrows: a
`graph_exit_node: str | None` arg gated by an `__init__`-time assert,
and `agent.stream_async` reached only when we're not in graph mode.
Files dropped from the ignore list (67 → 57):
audio/dtmf/utils.py, audio/filters/rnnoise_filter.py,
processors/frameworks/langchain.py,
processors/frameworks/strands_agents.py, services/aws/utils.py,
services/deepgram/flux/stt.py, services/deepgram/tts.py,
services/mistral/stt.py, services/openai/image.py,
transports/smallwebrtc/request_handler.py.
A second pass over the low-error-count files in the ignore list. Drops
10 files (77 → 67) and full-pyright errors from 580 → 555. Default
pyright stays clean.
Three coherent shapes plus a handful of one-offs:
`Language | str | None` → `Language | None` at STT frame boundaries.
`assert_given(self._settings.language)` returns `Language | str | None`
(strips `_NotGiven`, keeps the rest), but `TranscriptionFrame.language`
expects `Language | None`. In practice both `_settings.language` and
SDK-supplied codes resolve to a `Language` enum value, but technically
they could be raw strings — and `Language` is a StrEnum, so downstream
consumers (which mostly compare/serialize as strings) handle either.
Used `cast("Language | None", ...)` at each call site rather than a
runtime-validating helper, so an unrecognised code (e.g. one we
haven't added to the enum yet) still flows through unchanged. Cleared
azure/stt.py, aws/stt.py, gradium/stt.py; mistral/stt.py keeps the
cast at the SDK boundary (storing under `_detected_language: Language
| None`) but stays in the ignore list because of two unrelated
Optional-access errors.
aiobotocore `async with` stub gap. `aioboto3.Session().client(...)`
is an async context manager at runtime but its stubs don't advertise
`__aenter__`/`__aexit__` to pyright. Scoped
`# pyright: ignore[reportGeneralTypeIssues]` on the two affected
sites: aws/agent_core.py and aws/tts.py. aws/tts.py also had a latent
bug on the no-`AudioStream` path: the original code set
`audio_data = None` and then crashed in `resample(...)` and
`len(audio_data)` below; replaced with an early `return` after
logging — matches the convention elsewhere (OpenAI TTS, etc.) of not
recording usage metrics on the error path.
heygen `event_id: str | None` → `str` at transport→client boundary.
Three call sites in transports/heygen/transport.py passed `self._event_id`
(`str | None`) into client methods that take `str`. Added a guard at
each: `agent_speak_end` and `interrupt` only fire when `_event_id` is
set; `write_audio_frame` warn-and-drops when there's no active bot
event rather than sending a malformed message.
`OpenAIResponsesLLMInvocationParams` TypedDict.
`get_llm_invocation_params` always sets both `input` and `tools` in
the same dict literal, but the TypedDict was `total=False` so direct
subscript access (`invocation_params["input"]`) tripped
`reportTypedDictNotRequiredAccess` in services/openai/responses/llm.py.
Marked both keys `Required[...]`; `instructions` stays non-required
since it's only added when a system instruction is present.
Latent bug in heygen/api_interactive_avatar.py: the code accessed
`request_data.voice.voiceId` and `request_data.voice.elevenlabsSettings`,
but those names are Pydantic *aliases*; the actual attribute names
(used for attribute access) are `voice_id` and `elevenlabs_settings`.
Switched to the field names — those camelCase accesses would have
raised AttributeError at runtime if `voice` was set.
Other small fixes:
- assemblyai/stt.py: the deprecated `connection_params=` init path
was reading `formatted_finals` and `word_finalization_max_wait_time`
off `AssemblyAIConnectionParams`, but those fields were never on
the deprecated input model — they were added to Settings later.
Removed the reads (with a comment noting they're only available
via the canonical `settings=...` API); the deprecated input model
is unchanged.
- rtvi/processor.py: two `about: Mapping[str, Any] = None` parameter
signatures — declared `Mapping`, defaulted to `None`, and both
function bodies already handled the None case. Widened to
`Mapping[str, Any] | None = None`.
- aws/stt.py: `subprotocols=["mqtt"]` failed against websockets'
`Sequence[Subprotocol] | None` (Subprotocol is a NewType wrapper).
Wrapped: `subprotocols=[Subprotocol("mqtt")]`.
Files dropped from the ignore list (77 → 67):
processors/frameworks/rtvi/processor.py, services/assemblyai/stt.py,
services/aws/agent_core.py, services/aws/stt.py, services/aws/tts.py,
services/azure/stt.py, services/gradium/stt.py,
services/heygen/api_interactive_avatar.py,
services/openai/responses/llm.py, transports/heygen/transport.py.
Several adjacent fix shapes that together drop 19 files from the
pyrightconfig.json ignore list (96 → 77) and full-pyright errors from
605 → 580. Default pyright stays clean.
TTS voice/context_id None handling — most files in this batch had a
single error of the shape "value typed `T | None` passed where `T` is
required" coming out of `assert_given(self._settings.voice)` (which
strips `_NotGiven` but not `None`) or `get_active_audio_context_id()`.
Two patterns:
- For services where a missing voice means the request can't proceed
(hume, openai, xtts, groq, kokoro, piper), added an explicit None
check. Inside `run_tts` we yield an `ErrorFrame` and return — matching
each service's existing error-emission style (a few wrap `Exception`
broadly and were fine; openai/hume/xtts had narrower or no try blocks
so a bare `raise ValueError` would have escaped uncaught). Piper
validates in `__init__`, where failing fast at construction is the
right shape. OpenAI also gained a `voice not in VALID_VOICES` guard
with a clear message listing supported voices.
- For services where a missing audio context just means "skip this
message" (fish, lmnt, smallest, sarvam, neuphonic), widened
`TTSService.append_to_audio_context`'s `context_id` signature to
`str | None`. The function body already explicitly handled the None
case with a debug log + early return, so the prior `str` annotation
was a lie; making it honest cleared call sites without local guards.
inworld's `_close_context` got the same treatment.
google.genai imports — switched `from google import genai` to
`import google.genai as genai` in google/image.py and google/llm.py.
The dotted form sidesteps a PEP 420 namespace-package stub gap (the
`google` namespace stubs come from a different distribution and don't
declare `genai`), which means pyright now resolves `genai` to the
real module rather than `Unknown`. IDE autocomplete on `genai.<x>`
works for the first time. In image.py this surfaced three latent
bugs that the `Unknown` resolution had been hiding (model was
`str | _NotGiven | None` not narrowed before passing to the SDK; two
spots accessed `.image_bytes` on an `Image | None` without a guard) —
all fixed. llm.py's dotted import surfaced 8 errors (Content-list
typing nuances, internal `_api_client` access, a few small Optionals);
deferred to a future pass since they're outside this commit's scope,
so the file stays in the ignore list with the dotted import.
Latent bug fixes spotted along the way:
- resembleai/tts.py was calling `push_error(ErrorFrame(...))`, but
`push_error` takes a string — there's a separate `push_error_frame`
for the frame case. Switched to the right method.
- openai/base_llm.py: `max_completion_tokens` was the only sibling
field on `OpenAILLMSettings` missing `| None` in its type, which
caused the assignment in openai/llm.py from `params.max_completion_tokens`
(`int | None`) to fail. Added `| None` for consistency with
`max_tokens` etc.
- heygen/base_api.py: `livekit_url: str = None` and `ws_url: str = None`
declared `str` while defaulting to `None`. Removed the bogus
defaults — both fields are required at construction in every
in-tree call site, and the previous `str = None` was a Pydantic
footgun.
Other small ones: gladia/stt.py needed a None guard on `_session_url`
before `websocket_connect`; openrouter/llm.py's
`build_chat_completion_params` override widened to `dict[str, Any]`
diverging from the parent's `OpenAILLMInvocationParams` — restored
the parent's type; neuphonic/tts.py guarded the receive loop's
`async for message in self._websocket` with a local-variable narrowing
matching the pattern from 9e9b1f39e.
groq/tts.py: tightened `output_format`'s typing to
`Literal["flac","mp3","mulaw","ogg","wav"] | str = "wav"`. The literal
side gives IDE autocomplete hints for the currently-supported set;
the `| str` side keeps callers unblocked if groq adds a new format
before this list is updated. A `cast` at the API boundary satisfies
groq's stricter `Literal` parameter type. The literal alias mirrors
the inlined Literal on `groq.resources.audio.speech.AsyncSpeech.create`'s
`response_format` (the SDK doesn't export it as a named symbol).
websocket_service.py: scoped `# pyright: ignore[reportAttributeAccessIssue]`
on `websockets.WebSocketClientProtocol`. That alias is now a deprecated
re-export from the legacy submodule and pyright doesn't surface it
on the top-level `websockets` namespace; runtime is fine. Migrating
to `websockets.ClientConnection` is a separate piece of work
(transports/websocket/client.py uses the same alias four times) and
left for a future commit.
Files dropped from the ignore list: fish/tts.py, gladia/stt.py,
google/image.py, groq/tts.py, heygen/base_api.py, hume/tts.py,
inworld/tts.py, kokoro/tts.py, lmnt/tts.py, neuphonic/tts.py,
openai/llm.py, openai/tts.py, openrouter/llm.py, piper/tts.py,
resembleai/tts.py, sarvam/tts.py, smallest/tts.py,
websocket_service.py, xtts/tts.py.
Same approach as the previous round — apply boundary casts where the
code does dict-style mutation on TypedDict-typed values, narrow at
return sites, and document the LLMSpecificMessage limitation in
realtime adapters that pack history into a single text message.
aws_nova_sonic_adapter.py — pure typing + small narrowing fixes:
- Filter LLMSpecific items in `_from_universal_context_messages`
(documented).
- `_from_universal_context_message` now declared
`-> AWSNovaSonicConversationHistoryMessage | None` (it already had
paths returning None implicitly).
- `get_messages_for_logging` returns `dict[str, Any]` per element
via `dataclasses.asdict`, matching the declared return type.
- Use a local `role` variable so pyright keeps the narrowing across
the truthy-content guard.
grok_realtime_adapter.py / inworld_realtime_adapter.py — same shape
of fix as `open_ai_realtime_adapter.py` from the previous batch.
The two files are essentially copies of the OpenAI Realtime adapter,
so the same template applies: cast at the boundary, filter
LLMSpecificMessage with a documented note, replace the implicit-None
fallthrough with `raise ValueError`, and switch the `text_content +=`
pattern (which fails when one of the parts is None) to a
`text_parts.append(...)` + `" ".join(...)` pattern.
open_ai_adapter.py — pure typing. Cast at the
`OpenAILLMInvocationParams` return, narrow the system-instruction
warning's `initial_content` to `str | None`, and cast the custom-tools
list to `list[ChatCompletionToolParam]`.
open_ai_responses_adapter.py — pure typing. Same shape: narrow
`first_content` to `str | None` for the warning resolver, cast the
constructed dict literals at append sites where the target is
`ResponseInputItemParam`, and cast `get_messages_for_logging`'s
return to the declared `list[dict[str, Any]]`.
processors/aggregators/llm_context.py — pure typing. Cast the
deepcopied message in the redaction loop in `get_messages` to
`dict[str, Any]` and the create_image/audio_message return-dict
literals to `LLMContextMessage`.
Removes 6 newly-clean files from the pyright ignore list.
Net: -77 pyright errors (full-config: 680 -> 603).
Same shape of fix we applied to anthropic_adapter.py earlier — these
adapters do dict-style mutation on values typed as
ChatCompletionMessageParam (a union of TypedDicts) or against Optional
fields. Apply boundary casts (`cast(dict[str, Any], ...)` for the
mutation block, cast back to the TypedDict at return sites). Most
changes are pure typing (rename + cast); a handful in gemini and
openai_realtime are small defensive bug fixes for code paths that
were latently broken by Optional fields slipping through:
perplexity_adapter.py — pure typing. Cast the deepcopied messages to
`list[dict[str, Any]]` for the role-merging / system-conversion /
trailing-assistant-removal transformations and cast back to
ChatCompletionMessageParam at the return.
bedrock_adapter.py — pure typing. Cast the message to
`dict[str, Any]` at the top of `_from_standard_message` for the
tool-result / tool-use / image-content transformations. Cast the
constructed dict at the return site of `get_llm_invocation_params`.
gemini_adapter.py — typing + several None guards on Content.parts and
related Optional fields. Each guard turns a latent
`TypeError`/`AttributeError` (when the type-system-allowed None
showed up at runtime) into a defensive skip — the type annotations
say these can be None and we now handle that.
open_ai_realtime_adapter.py:
- Typing: cast the deepcopied messages, cast back where needed.
- LLMSpecificMessage handling: previously the function would crash on
the first `.get()` call if any LLMSpecificMessage was in the list.
Filter them out and document the limitation — this adapter's
pack-into-single-text-message strategy doesn't compose with opaque
per-provider payloads.
- Real bug fix: `events.ConversationItem` is a Pydantic BaseModel,
not a TypedDict. The bulk-packing path was constructing a raw dict
where a ConversationItem was expected. Replaced with proper
constructor calls (matches what the single-user-message path
already does).
- Real bug fix: `_from_universal_context_message` was declared
`-> events.ConversationItem` but on the unhandled-message
fallthrough it logged and returned None implicitly. Raise
ValueError so the violation is loud, not silent.
Removes 4 newly-clean files from the pyright ignore list:
adapters/services/{perplexity,bedrock,gemini,open_ai_realtime}_adapter.py.
Net: -95 pyright errors (full-config: 775 -> 680).
Six pyright errors followed the same pattern: a value flowed out of
`self._settings.X` (typed `T | _NotGiven`) into a context that wanted
the plain `T`. Wrap each with `assert_given(...)` so the sentinel
gets stripped at the boundary:
- aws/nova_sonic/llm.py: `_settings.model` (in InvokeModel...Input)
and `_settings.system_instruction` (passed to the adapter).
- deepgram/flux/base.py: iterating `_settings.keyterm`.
- google/stt.py: iterating `_settings.languages`.
- google/tts.py: iterating `_settings.speaker_configs`.
- openai/base_llm.py: `_settings.system_instruction` passed to the
adapter.
Also takes a deeper pass at the related Google STT issue: the override
of `language_to_service_language` had been broadened to take
`Language | list[Language]` and return `str | list[str]`, a Liskov
violation against the base's `Language -> str | None` contract.
External callers always pass a single Language, and the only consumer
of the list path was Google STT's own `_get_language_codes`. Restore
the override to a single-Language signature and let
`_get_language_codes` iterate. The override is also tightened to
return `str` (narrower than the base's `str | None`, which is
LSP-compatible) since it always falls back to `"en-US"` rather than
returning None.
Net: -7 pyright errors (full-config run: 782 -> 775).
These provider-specific helpers are all thin wrappers around
`resolve_language(...)`, which itself returns `str` — never `None`.
The `str | None` annotations were misleading and were producing
spurious pyright errors at the call sites that assigned the result
into a `str` field. Update each helper's signature to `str` and
rewrite the `Returns:` docstring to describe the actual fallback
behaviour (resolve to base or full code, with a warning).
Importantly, the per-class `language_to_service_language(...)`
methods on `STTService`/`TTSService` subclasses keep `str | None` as
their return type. That signature is an extension hook for future
and/or third-party subclasses that may genuinely not be able to
produce a code for some languages, even though all in-tree first-
party services currently return a string.
Also includes one small unrelated tightening in azure/stt.py: wrap
`self._settings.language` with `assert_given(...)` so the truthy
fallback to `language_to_azure_language(Language.EN_US)` doesn't
silently swallow a NotGiven sentinel.
Net: -3 pyright errors (full-config run: 785 -> 782).
Pyright flagged 19 sites where `await self._<connection>.send/recv/...`
was called on a receiver typed `X | None`. Each kind of call site
needed a slightly different fix to be both type-safe and behaviour-
preserving:
Streaming/user-facing paths (early return + warn — drop and warn is
the right runtime fail-safe when reconnect didn't succeed):
- cartesia/stt.py (run_stt)
- soniox/stt.py (_send_keepalive)
- elevenlabs/tts.py (run_tts — yields ErrorFrame and returns)
- deepgram/sagemaker/tts.py (run_tts)
- transports/lemonslice/transport.py (send_message)
- transports/tavus/transport.py (send_message)
"Should never happen" cases (early return with comment, no warn —
caller already gated on a separate `_is_*` check, so a warn would be
noise):
- deepgram/flux/stt.py (transport methods, gated by _transport_is_active)
- deepgram/flux/sagemaker/stt.py (same)
- stt_service.py (_send_keepalive, gated by _is_keepalive_ready)
- elevenlabs/stt.py (_send_keepalive, same)
- llm_service.py (_ws_recv — raises ConnectionError to match
_ensure_connected's contract)
- heygen/client.py (receive loop, gated by self._connected)
Just-assigned-above (use a local variable so pyright keeps the
narrowing across statements):
- lmnt/tts.py
- gradium/stt.py
- fish/tts.py
Other:
- transports/websocket/server.py — used the existing local `websocket`
parameter in scope instead of `self._websocket` for the close call.
- websocket_service.py — `send_with_retry` raises ConnectionError when
`self._websocket` is None inside the existing try-block, so the
broad `except Exception` triggers reconnect just as it would on a
real send failure (preserving the prior behaviour where None
silently fell through to the AttributeError-driven reconnect path).
Drops three now-clean files from the pyright ignore list: cartesia/stt.py,
elevenlabs/stt.py, and soniox/stt.py.
After making LLMService generic, an unparameterized subclass
(`class MyService(LLMService):` with no bracket — the third-party
provider pattern) saw `get_llm_adapter()` return `Unknown` rather
than `BaseLLMAdapter` as it did before the refactor.
Add `default=BaseLLMAdapter` (PEP 696) on the TypeVar — via
`typing_extensions.TypeVar` so older Python targets keep working —
so unparameterized callers get `LLMService[BaseLLMAdapter]` and
`get_llm_adapter()` returns `BaseLLMAdapter`, matching the
pre-refactor type precision.
Two internal fallouts of having a default (where the default makes
unannotated `LLMService` resolve invariantly to
`LLMService[BaseLLMAdapter]`):
- `FunctionCallParams.llm` is now `LLMService[Any]` so concrete
parameterizations like `LLMService[OpenAILLMAdapter]` can be
passed where the field is set.
- The explicit `LLMService.__init__(self, **kwargs)` in
`WebsocketLLMService.__init__` gets a `pyright: ignore[reportArgumentType]`
comment — pyright's invariance handling can't see through the
multi-inheritance + generic + default combination, but the
runtime call is correct (generics are erased).
Two follow-ups now that LLMService is generic over its adapter:
- Add an explicit backward-compat test verifying that an LLMService
subclass with no generic parameter (the third-party-provider
pattern) instantiates and returns a usable adapter. The existing
MockLLMService (declared without brackets) already exercised this
implicitly, but it's worth a named assertion.
- Drop the now-redundant `params: SomeLLMInvocationParams = ...`
variable annotations on `adapter.get_llm_invocation_params()`
results. Since `get_llm_adapter()` now returns the precise adapter
type, and `BaseLLMAdapter` is generic in its invocation-params
type, the call already infers the right TypedDict.
Previously, `LLMService.get_llm_adapter()` returned `BaseLLMAdapter`,
which forced every caller that wanted the precise adapter type to
write `adapter: SomeAdapter = self.get_llm_adapter()` and accept
pyright's complaint that the assignment doesn't match the declared
type. That pattern existed in 17 places across the LLM services.
Make `LLMService` generic over its adapter type — `LLMService(...,
Generic[TAdapter])` with `TAdapter = TypeVar("TAdapter",
bound=BaseLLMAdapter)` — so subclasses opt in via
`LLMService[XAdapter]` and callers get the precise type back from
`get_llm_adapter()` automatically.
Backward-compatible for third-party providers: code that says
`class MyService(LLMService):` (no bracket) still type-checks, with
TAdapter resolving to BaseLLMAdapter from the bound — identical to
the pre-refactor behavior. The `adapter_class` attribute keeps its
loose `type[BaseLLMAdapter] = OpenAILLMAdapter` typing so the default
remains usable; one localized cast in `__init__` bridges the loose
class attr to the precise instance attr.
In-tree subclasses opted in:
- AnthropicLLMService -> LLMService[AnthropicLLMAdapter]
- AWSBedrockLLMService -> LLMService[AWSBedrockLLMAdapter]
- AWSNovaSonicLLMService -> LLMService[AWSNovaSonicLLMAdapter]
- BaseOpenAILLMService -> LLMService[OpenAILLMAdapter] (propagates to
~15 OpenAI-compatible providers like Cerebras, Groq, Together)
- GeminiLiveLLMService -> LLMService[GeminiLLMAdapter]
- GoogleLLMService -> LLMService[GeminiLLMAdapter]
- GrokRealtimeLLMService -> LLMService[GrokRealtimeLLMAdapter]
- InworldRealtimeLLMService -> LLMService[InworldRealtimeLLMAdapter]
- OpenAIRealtimeLLMService -> LLMService[OpenAIRealtimeLLMAdapter]
- _BaseOpenAIResponsesLLMService -> LLMService[OpenAIResponsesLLMAdapter]
- WebsocketLLMService is also generic so the multi-inheritance case
(OpenAIResponsesLLMService) can keep both bases agreeing on TAdapter.
All 17 redundant `adapter: SomeAdapter = self.get_llm_adapter()`
annotations are now plain `adapter = self.get_llm_adapter()`.
Same pattern as the earlier get_setup_params fix: when context tools
are absent, the fallback `adapter.from_standard_tools(self._tools)`
can return the NotGiven sentinel, and `_send_prompt_start_event`
expects a list. Coerce via `or []` so the NotGiven case becomes an
empty list.
Three small changes that resolve pyright errors and sharpen the logic:
- Guard `self._context` with the codebase's "should never happen"
early-return pattern, so we don't blindly call `.get_messages()` on
None.
- Skip `LLMSpecificMessage` items in the iteration. They're opaque
provider-specific payloads with no `.get()`, and the surrounding
logic only applies to standard tool-result messages.
- Match `role == "tool"` explicitly. The previous truthy-only check
was working by accident — the `tool_call_id` filter further down
was effectively narrowing to tool messages, but the intent is
clearer when stated upfront.
reset_conversation is part of the public AWSNovaSonicLLMService API and
is also called internally from the receive-task error handler.
Previously it captured `self._context` (typed `LLMContext | None`) and
unconditionally passed it to `_handle_context`, which expects a real
context — silently doing the wrong thing if no initial context had
been received yet.
Treat that as developer error: log a warning and return early. Nothing
to preserve means nothing to reset.
The service implements the NovaSonicSessionSender protocol so the
session-continuation helper can target either the current or next
session. The protocol declares
`get_setup_params(self) -> tuple[str | None, list]`, but the
implementation was unannotated and could return NotGiven in the tools
position when from_standard_tools fell through to its NotGiven
sentinel. Add the matching return annotation and coerce the NotGiven
case to an empty list.
Same MessageParam content-typing issue as the consecutive-message merge
fix: pyright doesn't carry the str-to-list narrowing forward, and
Iterable has no `[-1]` access. Cast to `list[Any]` and document the
chain of assumptions (list, non-empty, dict-typed last item) and where
each is upheld upstream.
This brings anthropic_adapter.py to 0 pyright errors (down from 115).
The function takes an OpenAI ChatCompletionMessageParam (a union of
TypedDicts) and returns an Anthropic MessageParam (a different
TypedDict). It does the conversion via dict-level mutations that don't
type-check against either side's TypedDict schema. Work with the
deepcopied message as a plain dict and cast to MessageParam at the
return sites — matching the boundary-cast convention noted in
llm_context.py.
Drops anthropic_adapter.py from 20 to 2 pyright errors.
The fallback path in `_from_universal_context_message` returns
`message.message` from an `LLMSpecificMessage`, which is typed loosely
(`Any | dict`). The surrounding comment already documents the
assumption that the message is already in Anthropic format — make that
assumption explicit to pyright with a cast.
MessageParam types content as `str | Iterable[...]`, and Iterable has
no `.extend()`. After the str-to-list conversions, pyright re-reads
the TypedDict field as the original wide type rather than carrying the
narrowing forward. Cast to `list[Any]` to express the codebase's
existing str-or-list assumption.
Drops anthropic_adapter.py from 23 to 21 pyright errors.
Content items in MessageParam have a heterogeneous union type (Pydantic
ContentBlock variants and TypedDict *BlockParam variants), neither of
which supports the dict-style access and mutation this sanitizer does.
Treat the deepcopied message as a plain dict and guard each content
item with isinstance(item, dict) — matches the runtime shape produced
by _from_standard_message and avoids crashing if a non-dict ever flows
through the LLMSpecificMessage path.
Drops anthropic_adapter.py from 115 to 23 pyright errors.
Adds a `mip_opt_out` init parameter to both `DeepgramTTSService` (WebSocket)
and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model
Improvement Program. When set, the value is forwarded as a query parameter
on the request, matching the pattern used by the Deepgram STT services.
Broaden `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s.
Involves 3 changes:
- A rename: `tool_resources` -> `app_resources`
- A new property on `PipelineTask`: `app_resources`
- A new property on `FrameProcessor`: `pipeline_task`
Usage in tool handler:
async def get_weather(params: FunctionCallParams):
resources = cast(MyAppResources, params.app_resources)
...
Usage in custom `FrameProcessor`:
class MyProcessor(FrameProcessor):
async def process_frame(self, frame, direction):
await super().process_frame(frame, direction)
if self.pipeline_task is not None:
resources = cast(MyAppResources, self.pipeline_task.app_resources)
...
The previous `tool_resources` aliases (on `PipelineTask`,
`FunctionCallParams`, and `FrameProcessorSetup`) keep working but are
deprecated as of 1.2.0 and emit `DeprecationWarning`s.
The four krisp test files installed a process-wide mock of
importlib.metadata.version with `patch(...).start()` at module level and
never called .stop(). Once any of these files was collected, the mock
leaked across the rest of the test session, returning '0.0.0-dev' for
every version check. This corrupted unrelated tests that triggered
transformers' import-time dependency check (e.g. lazy imports of
LocalSmartTurnAnalyzerV3) — transformers saw tqdm=='0.0.0-dev' and
refused to load.
Wrap the pipecat imports in `with patch(...)` so the mock is active
during import (where pipecat's krisp version check needs it) and torn
down before any tests run.
Importing pipecat.turns.user_turn_strategies pulled in
LocalSmartTurnAnalyzerV3 → transformers → onnxruntime at module load
time. Since this module is imported by llm_response_universal (and
therefore most LLM services), any LLM service import paid the cost of
loading transformers and triggered its missing-backend warning in
environments without PyTorch/TF/Flax.
Move the LocalSmartTurnAnalyzerV3 import into
default_user_turn_stop_strategies() so it only loads when the default
smart-turn strategy is actually constructed.
Fixes#4392
The non-200 branch yielded an ErrorFrame and then raised, which the outer
except caught and yielded a second, less informative "Unknown error" frame.
Return after the yield and fold the status code into the message.
Pyright flagged the .post() call on a possibly-None _session. Raise a
clear RuntimeError if start() wasn't called instead of crashing on the
attribute access.
SPELL/EMOTION_TAG/PAUSE_TAG/VOLUME_TAG/SPEED_TAG are stateless and worked
only via class-level access. Decorating them lets instance access work too
and silences the missing-self lint warning.
- Bump default cartesia_version to 2026-03-01.
- Replace deprecated use_original_timestamps with use_normalized_timestamps
so word timestamps match what was actually spoken.
- Add max_buffer_delay_ms init arg; auto-derive 0 in SENTENCE mode to avoid
the doc-warned "middle ground" of client + server buffering, leave unset
in TOKEN mode for managed buffering.
- Silently consume flush_done messages now emitted per transcript when
server-side buffering is disabled.
Adds a `session_id: str | None` field to `RunnerArguments` so bots can
log/trace a per-session identifier in local development the same way
they can in Pipecat Cloud (where it is provided via the
`x-daily-session-id` header).
The local runner now mints a UUID at every `*RunnerArguments`
construction site. For paths that already returned a `sessionId` to the
caller (Daily `/start`, dial-in webhook), a single UUID is now generated
and shared between `runner_args.session_id` and the response body
instead of being thrown away. The SmallWebRTC `/api/offer` endpoint
accepts an optional `session_id` so the `/sessions/{session_id}/...`
proxy can thread it through.
This is the prerequisite step for collapsing pipecat-cloud's
`SessionArguments` / `*SessionArguments` hierarchy onto the upstream
runner types.
So the rendered changelog has the (PR [...]) line aligned as a list
continuation under its bullet. Verified with both short and wrapped
entries via `towncrier build --draft`.
Introduce SonioxTTSService, a WebSocket TTS provider that streams text and
receives audio over a persistent connection, multiplexing up to 5 concurrent
streams per socket via Soniox's `stream_id`. Also updates the README service
table and the Soniox voice example to use the new TTS end-to-end.
Replaces the hardcoded camera publishing send settings in
DailyTransport with a new DailyParams.camera_out_send_settings dict that
applications can pass through verbatim to the Daily client. This makes
the encoding/codec/bitrate configuration user-controllable instead of
being driven solely by the generic TransportParams fields.
As a consequence, TransportParams.video_out_bitrate is deprecated for
the Daily transport (now configured via camera_out_send_settings) and
its default is changed to None.
Adds a dedicated screen video track alongside the existing camera track
so applications can publish to Daily's built-in "screenVideo" destination
via video_out_destinations. The track is created at join time and wired
into the client settings (inputs and publishing) when "screenVideo" is
configured; write_video_frame routes frames to the appropriate track
based on the frame's transport_destination.
Bound methods are created fresh on each attribute access, so
'self._missing_function_call_handler is self._missing_function_call_handler'
is always False. Using 'is' meant the placeholder branch never fired and
both warnings logged when a function was missing at queue time.
Switch to == so equality compares the underlying function and instance.
Strengthen the missing-at-queue-time test to assert the second warning
does not fire.
Address review feedback: a function may be unregistered between when
run_function_calls queues it and when _run_function_call executes it.
Restore the live lookup, falling back to the missing-function handler
when the entry is gone, so the call still terminates with a normal
tool result. Factor the missing-handler item construction into a
helper since it's now built in two places.
Runner-created Daily rooms previously had no expiration when callers
posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`).
The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's
cron never cleaned them up, so rooms accumulated indefinitely.
Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`,
inject `exp` and `eject_at_room_exp=True` into user-supplied properties via
`setdefault` (so explicit caller values still win), and pass
`room_exp_duration` to all four `configure()` call sites.
* VIVA SDK TT v3 support
* Format fix.
* Renamed the API naming, removed '3' from the name.
* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.
* TT demo tool
* Some improvements for demo scripts, audio recordin, etc.
* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.
* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.
* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.
* Add audio resampling functionality and update demo scripts for improved audio processing
- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.
* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic
- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.
* Refactor audio processing scripts for improved readability and consistency
- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.
* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity
- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.
* more format fixes.
* removed demo scripts.
* reverted wrongly removed file.
* Corrected the IP integration logic.
* style fix.
* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy
- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.
* FIxed formatting
---------
Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7`
pin, blocking users whose dependency graph already constrains protobuf to
the 5.x line. The original bump (PR #4136) was only needed because
`nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1.
Changes:
- Widen base pin to `protobuf>=5.29.6,<7`.
- Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per
Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x
and 6.x runtimes, so this single artifact serves all users.
- Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can
install `pipecat[dev,nvidia]` without resolver conflict. Comment in
`frames.proto` documents the 1.67.x requirement for regeneration.
- Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This
compensates for nvidia-riva-client's missing `protobuf` install
requirement (upstream packaging gap, see
https://github.com/nvidia-riva/python-clients/issues/172). When that
issue is resolved, the explicit protobuf entry in the `nvidia` extra
can be removed.
Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6;
`tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds
when `pipecat[nvidia]` is installed.
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.
Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s
Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
(independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
optional stream/prompt_name params, eliminating duplicate SC methods
Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.
Introduce two module-private cast helpers in open_ai_adapter.py:
- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)
Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.
No behavior change, no pyright error delta.
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.
Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.
Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):
- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.
Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.
Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).
- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.
Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.
Fields touched across 11 files:
- services/settings.py: TTSSettings.voice (base class; covers
asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
split_sentences, enable_diarization, speaker_*, max_speakers,
prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.
Clears 94 pyright errors (1035 -> 941).
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).
- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).
Two idioms used:
- Inline wrap for single uses:
func(assert_given(self._settings.enable_prompt_caching), ...)
- Extract-and-reuse when the same value is used multiple times:
thinking = assert_given(self._settings.thinking)
if thinking:
params["thinking"] = thinking.model_dump(...)
43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.
assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.
resolved_model = assert_given(self._settings.model) # str | None
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.
- adapters/services/anthropic_adapter.py: narrow converted.system for
_resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
is_given (aliased as openai_is_given alongside the existing
settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.
Each helper is co-located with the NotGiven flavor it guards:
- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
LLMContext's NotGiven. Treat LLMContext's re-exported types
(LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
LLMContext's own — independent definitions that happen to coincide
with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
NotGiven.
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.
On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).
Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
Mistral imposes three conversation-history quirks on top of the
OpenAI-compatible wire format: tool messages must be followed by an
assistant message; non-initial system messages are rejected; trailing
assistant messages require `prefix=True`. These rules were applied
inline in `MistralLLMService.build_chat_completion_params`, which is the
wrong layer — every other provider with OpenAI-compatible-but-quirky
shape (Perplexity, etc.) owns its transformations in a
`BaseLLMAdapter` subclass that runs during `get_llm_invocation_params`.
Create `MistralLLMAdapter(OpenAILLMAdapter)` on the Perplexity template
and wire it in via the existing `adapter_class` dispatch. The service
now only handles Mistral-specific request-level mapping (`random_seed`
in place of `seed`), and the message shape concerns live with other
provider format logic.
No behavior change. The transform function casts to `list[dict[str,
Any]]` internally because mutating `role` and attaching Mistral's
non-standard `prefix` field both step outside OpenAI's TypedDict
contract; the cast at the return boundary encodes that we're emitting
Mistral's extended schema, not OpenAI's.
`inspect.getdoc()` returns `str | None`, but `docstring_parser.parse()`
requires `str`. Functions without a docstring produced `None`, which
the type checker correctly flagged.
Coerce to `""` at the call site. `docstring_parser.parse("")` returns
an empty docstring whose `.description` and `.params` are already
handled by the surrounding `or ""` fallbacks, so runtime behavior is
unchanged.
`ToolsSchema.__init__` declared `standard_tools: list[FunctionSchema |
DirectFunction]`. Callers (`BaseLLMAdapter`, `MCPService`) pass in
`list[FunctionSchema]`, which is not assignable to the union list
because `list` is invariant in its element type.
Widen the parameter to `Sequence[...]` (covariant) so `list[X]` and
`list[X | Y]` both fit. A narrower `list[FunctionSchema]` is still
accepted, and nothing in this class mutates the argument — the
constructor immediately copies it via `_map_standard_tools`.
Also correct the `custom_tools` property return type to include
`None`, matching the stored `_custom_tools` field.
This single edit clears the pyright errors for three ignore-list
entries: `tools_schema.py`, `base_llm_adapter.py`, and `mcp_service.py`.
Two services were reading `_settings.model` (typed `str | _NotGiven |
None` because NOT_GIVEN is the default) and coercing it with `or ""`
or similar. `_NotGiven.__bool__` returns False, so the runtime
behavior happened to work, but the type was a lie — pyright saw
`str | _NotGiven` flowing into APIs that required `str` or `str | None`.
- `AIService._sync_model_name_to_metrics`: use `isinstance(model, str)`
narrowing with an empty-string fallback. Equivalent runtime behavior,
honest type, no truthiness dependency on a sentinel.
- `SarvamLLMService.__init__`: validate the model is a real string
before handing it to `_validate_model(str)`. A non-string model at
this point is a configuration bug; raise `ValueError` so the error
is clear and survives `python -O` (unlike an assert).
Three spots had the same shape: a field starts None, a later method
populates it, a read site later reads it. Pyright can't track the
cross-method invariant. Rather than spray assertions at the read
sites, fix each site at the structural level:
- `FastAPIWebsocketInputTransport._monitor_websocket` now takes the
session timeout as an argument. The task-creation site already
guards on truthiness, so the call can pass the non-None value
directly and the method's signature tells the truth.
- `FrameProcessorMetrics.task_manager` raises `RuntimeError` instead
of asserting. Asserts are stripped under `python -O`; a real raise
keeps the runtime safety net and still narrows the type for pyright.
- `SOXRStreamAudioResampler._maybe_initialize_sox_stream` returns the
initialized stream. Callers use the return value and never touch
the Optional `_soxr_stream` attribute, so narrowing stays inside
the init method where the invariant is established.
`ImageGenService.run_image_gen` and `VisionService.run_vision` were
declared `async def ... -> AsyncGenerator[Frame, None]` with `pass`
bodies. Without a `yield` anywhere in the body, Python treats the
function as a coroutine returning an `AsyncGenerator`, not as an async
generator itself, so callers got a coroutine where they expected an
iterator.
Add `raise NotImplementedError; yield` so the body contains a yield
(making this a real async generator) while still raising cleanly if a
subclass ever calls `super().run_*` by mistake.
Deepgram STT, Gradium TTS, Smallest STT, and xAI STT/TTS had exactly
one pyright error each, all of them the AsyncGenerator return-type
mismatch resolved in 08fe9157c. Remove them from the ignore list.
AssemblyAI, Cartesia, Gradium, and Soniox STT services sent audio over
the WebSocket without catching transient send failures, so a single
network hiccup could propagate an exception up through process_frame
and end the pipeline. Other push-based STT services (Deepgram, xAI,
Azure, Smallest, etc.) already guard their sends.
Follow the deepgram/stt.py pattern: log a warning and continue. The
existing connection-state check at the top of each call handles
recovery on the next invocation.
The push-based STT/TTS implementations send audio/text over a socket and
receive results via a separate receive task, so there is nothing to
yield inline. They yield `None` by design. The previous declaration of
`AsyncGenerator[Frame, None]` disagreed with that, while the consumer
(`AIService.process_generator`) already accepted `Frame | None`. Widen
the producer side (abstract base and every subclass) so the type honestly
describes the contract.
Pure annotation change; no runtime behavior difference.
Previously, six modules (adapters, audio, processors, serializers,
services, transports) were ignored wholesale. Many files in those
modules already pass type checking, but we had no way to protect them
from regressions or make the remaining work visible.
Switch the include list to src/pipecat so any new module is checked by
default, and replace directory-level ignores with the 140 specific
files that still fail. This puts 189 previously-untyped files under
type checking immediately and turns the remaining work into a concrete,
shrinking TODO list.
Moves src/pipecat/serializers into pyright's include list. Narrows
self._params to each subclass's InputParams in exotel, vonage, plivo,
twilio, genesys, and telnyx. In protobuf.py, renames the reassigned
frame local to avoid clobbering its Frame type and silences two dynamic
attribute accesses on the generated frames_pb2 module.
Also aligns telnyx and plivo hangup validation with twilio: if
auto_hang_up=True (the default) but required credentials are missing,
__init__ now raises ValueError instead of silently logging a warning
at call-end time. Previously a misconfigured serializer would construct
fine and fail to hang up the call later, leaving a phantom billable
session.
Collapse the separate fallback timer into the existing user_speech_timeout
timer, restarted when a transcript arrives without a VAD stop. stt_timeout
has no meaning on the fallback path, so the stt wait is marked done
immediately. This drops the _fallback_timeout_task / _fallback_expired
bookkeeping and the branched trigger condition.
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.
Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.
Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.
Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
Remove `examples/` from the `pyrightconfig.json` ignore list and fix
the resulting type errors across all example files. Common fixes:
- Required API keys: `os.getenv("X")` -> `os.environ["X"]` so the
return type is `str` rather than `str | None`, and misconfiguration
fails fast.
- Narrow `LLMContextMessage` union members with `isinstance(..., dict)`
before dict-style access.
- `assert isinstance(params.llm, ...)` before calling service-specific
methods that aren't on the base `LLMService`.
- Guard optional frame fields (e.g. `LLMSearchResponseFrame.search_result`)
before use.
If the WebSocket handshake is cancelled or fails before `keepalive_task`
is assigned (e.g. an STTUpdateSettingsFrame triggers a reconnect during
initial connect), the `finally` block tried to cancel an unbound local.
Initialize `keepalive_task = None` before the try and guard the cancel.
New `XAISTTService` wraps xAI's real-time speech-to-text WebSocket
(`wss://api.x.ai/v1/stt`). It extends `WebsocketSTTService`, authenticates
with the `XAI_API_KEY` as a Bearer token on the WS handshake, and streams
raw audio (PCM/mu-law/A-law) with configurable interim results, endpointing,
language, multichannel, and diarization settings.
- `src/pipecat/services/xai/stt.py`: new service, settings dataclass, and
`language_to_xai_stt_language` helper.
- `src/pipecat/services/stt_latency.py`: `XAI_TTFS_P99` default.
- `pyproject.toml` / `uv.lock`: `xai` extra now pulls in `websockets-base`.
- `README.md`: link to xAI STT in the services table.
- `examples/voice/voice-xai.py`: swap DeepgramSTTService for XAISTTService so
the xAI voice example is fully xAI.
- `examples/transcription/transcription-xai.py`: new transcription-only
example using the new service.
SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into
max(stt_timeout, user_speech_timeout), which over-waited for finalizing
STT services and could also end the turn early in a legacy code path.
Run them as independent timers instead:
- user_speech_timeout: policy floor, always runs to completion.
- stt_timeout: latency safety net, short-circuited by a finalized
transcript since STT has signaled it has nothing more to send.
The no-VAD fallback now waits only user_speech_timeout rather than
max(stt_timeout, user_speech_timeout); stt_timeout is defined relative
to VAD stop and has no meaning when no VAD event occurred. This
shortens the fallback wait for users who set stt_timeout greater than
user_speech_timeout.
* Fix Smallest AI TTS WebSocket endpoint URL to match API documentation
Update base URL from waves-api.smallest.ai to api.smallest.ai and
fix path prefix from /api/v1/ to /waves/v1/ per the v4.0.0 docs.
* Update keepalive using silent space message instead of unsupported flush
Pylance analyzes open files even when they're outside the `include`
set, producing noise in the editor. Adding these paths to `ignore`
suppresses diagnostics without affecting import resolution.
Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and
punctuation are already embedded in the token text. When downstream consumers
join these tokens with an extra space they produce "hello , world" instead of
"hello, world".
Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to
`add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through
`_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`.
Defaults to `False` — no behaviour change for existing services.
`InworldTTSService` passes `includes_inter_frame_spaces=True` and stops
pre-processing tokens in `_calculate_word_times`, returning them verbatim.
Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket
delivery paths: verbatim text preservation, PTS ordering, text-before-audio
ordering, and the Inworld punctuation-token scenario.
Made-with: Cursor
The two logger.error lines in krisp_instance.py fired at module-load time
whenever anything transitively imported it (e.g. pipecat.turns.user_start
pulling in krisp_viva_ip_user_turn_start_strategy), producing noisy output
for users who never asked for Krisp. Drop the log calls and raise a more
informative ImportError that names the affected classes so direct
importers still get clear guidance.
- Fall back to Language.EN in _primary_detected_language when model is
flux-general-en, preserving prior behavior on the default model.
- Standardize example on DeepgramFluxSTTService.Settings and drop the
now-redundant DeepgramFluxSTTSettings import.
- Narrow the changed-behavior changelog to reflect that flux-general-en
frames still carry Language.EN.
Enables the flux-general-multi model with one or more language_hints.
Hints are sent as repeatable URL params at connect time and via a
Configure control message when updated mid-stream (detect-then-lock).
TranscriptionFrame.language now reflects the language Flux detected
for each turn via the TurnInfo `languages` field.
Add changelog entries for the pyright introduction and the
LiveKitRunnerArguments.token signature tightening. Restore the
indented multi-line format for the WhatsApp missing-env error,
now listing only the vars that are actually missing.
Make required parameters non-optional: LiveKitRunnerArguments.token,
_create_telephony_transport args. Use os.environ[] instead of
os.getenv() for required WhatsApp env vars. Guard spec/loader None
in module loading. Tighten sip_caller_phone guard in daily.py.
* VIVA SDK TT v3 support
* Format fix.
* Renamed the API naming, removed '3' from the name.
* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.
* Typo fix in voice-krisp-viva example to use KrispVivaFilter class
* style fix.
* test run error fixes.
* some test related changes.
* Fixed tests
* Stule fixes.
SentryMetrics.stop_ttfb_metrics and stop_processing_metrics called the
base FrameProcessorMetrics implementation but discarded its return
value (implicit `return None`). FrameProcessorMetrics.stop_ttfb_metrics
/ stop_processing_metrics build and return a MetricsFrame, which
FrameProcessor.stop_ttfb_metrics / stop_processing_metrics then pushes
downstream so observers (e.g. UserBotLatencyObserver,
MetricsLogObserver) can see TTFB / processing metrics.
Because SentryMetrics returned None, the FrameProcessor never pushed
the MetricsFrame, so any pipeline using metrics=SentryMetrics() on STT
/ LLM / TTS services silently lost all downstream TTFB and processing
MetricsFrames. The metrics were still calculated and logged
internally, and Sentry transactions still finished correctly, but
observers never saw them.
Forward the MetricsFrame returned by the base class so FrameProcessor
can push it into the pipeline.
Use Sequence[FrameProcessor] instead of list[FrameProcessor] in Pipeline,
ServiceSwitcher, and ServiceSwitcherStrategy parameters to accept subtype
lists. Add cast() in LLMSwitcher for narrowed return types. Guard against
None in task_observer._send_to_proxy and replace hasattr with truthiness
check in task._cleanup.
Widen base strategy process_frame return types to ProcessFrameResult |
None to match actual behavior (None treated as CONTINUE). Give
UserTurnCompletionLLMServiceMixin a FrameProcessor base class so pyright
can see create_task, cancel_task, process_frame, and push_frame.
Tighten LLMMessagesAppendFrame and LLMMessagesUpdateFrame message fields
from list[dict] to list[LLMContextMessage] to match actual usage. Add
type annotations on inline message lists in IVR navigator and voicemail
detector.
In token-streaming mode, _push_tts_frames previously stripped only
leading newlines and dropped any pure-whitespace frame. That silently
discarded meaningful inter-token whitespace (e.g. a standalone "\n"
token between "hello" and "world"), losing prosody cues and any
downstream sentence-boundary semantics.
Track whether a non-whitespace character has been sent in the current
context. While the flag is false, strip all leading whitespace; once
true, let whitespace tokens flow through. Reset the flag on
LLMFullResponseEndFrame/EndFrame and on interruption, and save/restore
it around TTSSpeakFrame since each utterance is its own context.
Sentence-aggregation mode preserves the existing behavior.
Group three co-assigned fields (_start_frame_id, _start_frame_arrival_ns,
_start_wall_clock) into a single _StartFrameInfo dataclass. This makes
the "always set together" invariant structural rather than implicit, and
fixes the incorrect str | None annotation on _start_frame_id (Frame.id
is int).
Add pyrightconfig.json with basic type checking for zero-error modules
(clocks, metrics, transcriptions, frames) and enforce via CI. The
include list will expand as modules are fixed.
* Improve HeyGen LiveAvatar plugin reliability and performance
- Add WebSocket ready gate: wait for session.state_updated connected
event before sending commands (prevents silently dropped messages)
- Add keep-alive mechanism: send session.keep_alive every 2.5 min to
prevent 5-minute inactivity timeout
- Optimize audio chunking: 600ms first chunk for faster initial
response, 1s subsequent chunks for efficient streaming
- Fix audio buffer flush: send remaining buffered audio on utterance
end instead of discarding it
- Fix WS state cleanup: properly reset connected/ready state when
WebSocket drops unexpectedly
- Add livekit_config passthrough in LiveAvatar session token creation
- Replace stray print() with logger.debug()
* Fix HeyGenOutputTransport.start() signature and use 400ms first chunk
- Update transport.py to match new client.start() signature (no
audio_chunk_size param)
- Change first chunk size from 600ms to 400ms per feedback
* Fix transport audio resampling and client.start() error propagation
- Add audio resampling in HeyGenOutputTransport.write_audio_frame() to
ensure audio is always 24kHz before sending to HeyGen (was sending
at pipeline sample rate, causing garbled audio)
- Raise exception on WS ready timeout instead of silently returning,
preventing transport from appearing ready when WS connection failed
* Fix session readiness gate to work with LITE mode
LITE mode does not send session.state_updated WS events. Instead,
use a dual-signal _session_ready event that fires on either:
- WS session.state_updated connected (FULL mode)
- LiveKit participant connected (LITE mode)
Also reorder start() to connect both WS and LiveKit before waiting,
since the WS events may depend on LiveKit being connected.
Verified with live sandbox session - all tests pass.
* Simplify session readiness to use only WS ready gate
Remove _session_ready dual-signal and use only _ws_ready, which fires
on the session.state_updated connected WS event. Increase timeout to
30s. LiveKit is connected before waiting so the WS event can arrive.
* Reduce WS ready gate timeout back to 10s
* Remove WS ready gate (session.state_updated not reliably received)
The session.state_updated connected event is not reliably received
via the websockets library. Remove the gate for now and assume the
session is ready after WS + LiveKit connect. Keep-alive, chunking,
buffer flush, state cleanup, and other improvements remain.
Mirrors the existing `from_string` classmethod and lets callers
turn a frame's `buttons` list back into a dial string like `"123#"`.
`__str__` and the Daily transport's native DTMF path reuse it.
The single-key `button` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` is kept as a first-class ergonomic shortcut
for the common single-keypress case, equivalent to
`buttons=[button]`. `buttons` takes precedence when both are set.
Replaces the string-based `tones` field with a type-safe
`buttons: list[KeypadEntry]` on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame`, matching the existing singular `button`
field on `InputDTMFFrame`. A `from_string` classmethod builds the
list from a dial string like `"123#"` (invalid characters raise
ValueError from the `KeypadEntry` constructor).
The base output audio fallback now iterates `frame.buttons`
directly, LiveKit sends `frame.buttons[0].value`, and the Daily
transport joins the button values into the single string Daily's
`send_dtmf` expects.
Introduces a new `tones` field on `OutputDTMFFrame` and
`OutputDTMFUrgentFrame` for sending multi-digit DTMF sequences and
deprecates the existing single-key `button` field. When only `button`
is set, it is used as a single-character `tones` string for backward
compatibility.
`DTMFFrame` is kept as an empty marker class so both input and output
DTMF frames can still be identified via isinstance. `InputDTMFFrame`
keeps its required `button` field (single keypress semantics).
The Daily-specific `DailyOutputDTMFFrame` and
`DailyOutputDTMFUrgentFrame` frames no longer need to override
`button` and simply add `session_id` and `digit_duration_ms`, which
are forwarded to Daily's `send_dtmf` as `sessionId` and
`digitDurationMs`.
The base output audio fallback now iterates `tones` and generates a
tone per character; LiveKit's native DTMF path sends `tones[0]` since
its API is single-tone.
Introduces Daily-specific DTMF output frames that carry explicit
`tones`, `session_id` and `digit_duration_ms` fields, forwarded to
Daily's `send_dtmf` as `tones`, `sessionId` and `digitDurationMs`.
The inherited `button` and `transport_destination` fields are
ignored for these frames in the Daily transport.
When the LLM returned zero text tokens (e.g. it was interrupted before producing
tokens or about to push tokens), push_aggregation() returned an empty string and
on_assistant_turn_stopped was never emitted. This left consumers waiting for an
event that would never arrive.
Now on_assistant_turn_stopped always fires, with an empty content string when
the LLM produced no text tokens.
Fixes#4292
Only treat messages[0] as the initial system prompt when determining the
summarization range. Previously, the code scanned the entire context for
the first system-role message, which caused failures when the only system
message was a mid-conversation injection (e.g. "The user has been quiet").
In that case summary_start exceeded summary_end, producing an empty range
and "No messages to summarize" errors.
Fixes#4286
The enable_logging and enable_ssml_parsing URL params used truthy checks,
so False was treated the same as None (both skipped). Also, Python's
str(False) produces "False" but the API expects lowercase "false".
Additionally, add enable_logging support to ElevenLabsHttpTTSService
which was missing entirely.
When the STT p99 timeout fires without a transcript, the turn stop
strategy previously did nothing — falling through to the 5-second
user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the
timeout has elapsed so that a late transcript triggers the turn stop
immediately instead of waiting for the fallback.
Previously settings updates were ignored with a TODO comment. Now when
model/language changes via STTUpdateSettingsFrame the service disconnects
and reconnects with the new query parameters.
Key changes:
- Implement _update_settings to disconnect/reconnect on changes
- Check `is not State.OPEN` in run_stt to catch CLOSING state
- Send `done` command before closing for clean session shutdown
- Capture websocket reference in _disconnect_websocket to prevent a
concurrent _connect from having its new connection nulled by a stale
finally block
The strategy schedules background tasks during setup. Fast-running
tests could observe state before those tasks had a chance to run;
yielding once via asyncio.sleep(0) ensures they do.
Enable callers to get a compact version of context messages suitable
for serialization, logging, and debugging tools. For standard
messages, known binary data (base64 images, audio) is fully elided.
For LLM-specific messages, long string values are recursively
truncated. Adapter get_messages_for_logging() methods now use this.
Example files can live under subdirectories (e.g. foundational/01.py),
so the recording path needs its parent directory created before the
audio file is written.
Replaces the per-frame asyncio.Event signaling with a monotonic
timestamp updated on each audio frame. The handler sleeps until the
next deadline (last_audio_time + timeout), recomputing on each wake-up
to account for audio arriving during sleep.
This avoids waking the handler on every audio frame (~50/s at 20ms
chunks), and guarantees detection latency is bounded by timeout rather
than 2 * timeout.
Also renames audio_starvation_timeout to audio_idle_timeout and
associated identifiers for consistency with existing pipecat naming
(user_idle_timeout, etc.).
These are TypedDicts (plain dicts at runtime), so no behavioral change
— just more descriptive type hints for readers. Use ToolParam instead
of FunctionToolParam for the Responses adapter to reflect that custom
non-function tools are supported. Use ChatCompletionToolParam instead
of Any for the completions adapter return type. Update tests to use
typed params in expected values.
During pipeline shutdown, proxy tasks must be cancelled before observer
resources are cleaned up. Previously, stop() was called inside
_cancel_tasks() and start() was called in _start_tasks(), which could
lead to proxy tasks still consuming frames after observer resources
were closed.
Now the lifecycle is explicit in _handle_start_frame: start() after all
observers are loaded, and stop() before cleanup() on shutdown.
Also fixes misleading variable name in TaskObserver.cleanup() where
iterating self._proxies yields observer keys, not Proxy values.
Fixes#4195
Move event.clear() from finally block to success path in
IdleFrameProcessor and UserIdleProcessor._idle_task_handler().
The finally block unconditionally cleared signals set during
async timeout callbacks, causing false-positive idle detection.
Closes#3402
* Add Inworld Realtime LLM service
Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.
New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py
Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled
* Prefer init-provided system instruction in Inworld Realtime
Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.
* Update changelog entry with PR number
* Fix changelog format to use bullet point
* Polish PR: default model, example cleanup, changelog update
- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog
* Address PR review feedback for Inworld Realtime
- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
_ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths
* Simplify example, add provider tracking, remove local VAD path
- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends
* Default TTS model to inworld-tts-1.5-max
* Remove dead shimmed tools code, set STT/VAD defaults
- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
Integrate with Mistral's Voxtral TTS API (voxtral-mini-tts-2603) using
HTTP streaming with Server-Sent Events. Converts base64-encoded float32
PCM chunks from the API to int16 for the Pipecat pipeline.
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.
Set the flag in _handle_session_ready when we detect a reconnect, either
via session_resumption_handle (server restores state) or via existing
context (rare case where connection drops before first resumption handle).
After a reconnect, _ready_for_realtime_input was never set back to True
because _create_initial_response (which sets the flag) is only called on
initial connection. This caused all audio/video/text to be silently
dropped after reconnecting, making the bot appear to hang.
Set the flag in _handle_session_ready when context already exists
(i.e. reconnect case) since we don't need to go through
_create_initial_response again.
Remove the deprecation proxy infrastructure that allowed old-style flat
imports (e.g. `from pipecat.services.openai import OpenAILLMService`).
Users must now import from specific submodules
(`from pipecat.services.openai.llm import OpenAILLMService`), which is
already the established pattern across all internal code and 179+ examples.
- Strip 32 proxy `__init__.py` files to empty
- Strip 3 non-proxy files with bare star imports (minimax, sambanova, sarvam)
- Strip google/gemini_live `__init__.py` re-exports
- Remove DeprecatedModuleProxy class and helpers from services/__init__.py
- Remove ruff per-file ignore for services/__init__.py
- Fix 2 examples using old-style imports
Patch Pydantic's DICT_TYPES check in conf.py to accept Union-wrapped
dict types, fixing the autodoc import failure for models using
ConfigDict(extra="allow").
Make -W (warnings as errors) opt-in via --strict flag instead of
default, and update README to reflect uv-based workflow and current
directory structure.
Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame,
and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator,
mirroring the existing LLMUserAggregator downstream tests. Add
frames_to_send_direction param to run_test helper to support this.
The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages *at that point in time*, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.
Napoleon's Attributes section creates class-level attribute docs that
duplicate the __init__ parameter docs when napoleon_include_init_with_doc
is enabled. Using Parameters avoids the duplication.
- Remove expect_stripped_words from LLMAssistantAggregatorParams and related warnings
- Remove old multi-parameter on_push_frame observer signature support in TaskObserver
- Remove deprecated context field from UserImageRequestFrame
- Remove deprecated LiveKitTransportMessageFrame and LiveKitTransportMessageUrgentFrame
- Remove deprecated pipecat.turns.mute shim module
Replace Markdown code blocks with RST syntax in genesys.py, fix
deprecated directive transitions in nvidia and summarization modules,
remove stray bullet prefix in whisper arg docs, restructure code block
in turn completion mixin, and add deepgram mock to Sphinx conf.
Remove stale riva mock imports from autodoc_mock_imports since the riva
service was removed and nvidia-riva-client is installed during doc builds.
Add pipecat.turns and pipecat.extensions to import_core_modules() and
add Turns to the index.rst toctree. Regenerate uv.lock to reflect the
riva extra removal from pyproject.toml.
Move the FastAPI instance to module level so other packages can import
it and register routes before main() is called. main() now configures
the existing app with transport-specific routes instead of creating a
new one.
Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of
Silero VAD. This removes vad_events from settings and LiveOptions,
the should_interrupt parameter, the vad_enabled property,
_on_speech_started/_on_utterance_end handlers, and simplifies
_on_message and process_frame accordingly.
Remove the send_transcription_frames parameter from OpenAI Realtime LLM
(deprecated since 0.0.92). Also fix undefined _warn_deprecated_param
calls in both OpenAI and xAI realtime services, replacing them with the
existing _warn_init_param_moved_to_settings method.
UserBotLatencyLogObserver (deprecated 0.0.102) is replaced by
UserBotLatencyObserver. UserIdleProcessor (deprecated 0.0.100) is
replaced by LLMUserAggregator with user_idle_timeout.
The _self_queued_frames set and _internal_queue_frame wrapper were used
to prevent re-processing SpeechControlParamsFrame that the aggregator
queued to itself. Now that the frame is no longer special-cased, this
tracking is unnecessary. Also removes unused FrameCallback import.
Adds `enable_prompt_caching` setting to `AWSBedrockLLMSettings`. When
enabled, appends `cachePoint` markers to system prompts and tool
definitions in ConverseStream requests.
This can reduce TTFT by up to 85% for multi-turn conversations where
the system prompt stays constant (e.g. voice agents, chat assistants).
Follows the same pattern as `AnthropicLLMService.enable_prompt_caching`.
Usage:
```python
llm = AWSBedrockLLMService(
settings=AWSBedrockLLMSettings(
model="au.anthropic.claude-haiku-4-5-20251001-v1:0",
enable_prompt_caching=True,
),
)
```
See: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
Remove EmulateUserStartedSpeakingFrame, EmulateUserStoppedSpeakingFrame
(deprecated since v0.0.99), and the emulated field from
UserStartedSpeakingFrame and UserStoppedSpeakingFrame. Clean up the
handling code in base_input.py and a stale comment in nova_sonic/llm.py.
The interruption_strategies mechanism was deprecated in v0.0.99 in favor
of LLMUserAggregator's user_turn_strategies. All evaluation logic was
already removed — this removes the remaining field definitions, property,
StartFrame propagation, conditional check in base_input.py, strategy
files, and test.
This field was deprecated in v0.0.99 in favor of LLMUserAggregator's
user_turn_strategies / user_mute_strategies parameters. Since the default
was True (interruptions allowed), removing the guards keeps the current
default behavior.
The location and project_id fields were deprecated since 0.0.90 in
favor of direct __init__ parameters. Now that InputParams is removed,
project_id is required and location defaults to "us-east4" directly
in the signature.
Override _update_settings in CartesiaTTSService to flush the current
audio context and assign a new turn context ID when voice, model, or
language settings change. This prevents Context has closed errors
from Cartesia API, which locks these parameters per context.
Remove the deprecated text_aggregator parameter from TTSService,
CartesiaTTSService, and RimeTTSService, and the deprecated text_filter
parameter from TTSService. Users should use LLMTextProcessor before
the TTS service instead. Update the voice-switching example to use
LLMTextProcessor with PatternPairAggregator.
Change the default from 10s to None so deferred function calls can run
indefinitely when no timeout is configured. Only create the timeout
task when a timeout is actually provided (per-call or service-level).
Add SmallestSTTService using the Pulse WebSocket API for real-time
transcription. Includes SmallestSTTSettings dataclass, 32-language
support with resolve_language fallback, VAD-driven finalize signal,
and SMALLEST_TTFS_P99 latency constant.
Also adds X-Source and X-Pipecat-Version headers to Smallest STT
and TTS WebSocket connections.
Example files like openai.py shadow installed packages when Python adds the
script directory to sys.path. Prepend the parent folder name to each example
file (e.g. openai.py -> function-calling-openai.py). Also split
thinking-and-mcp/ into separate mcp/ and thinking/ directories.
Now that LLMContextFrame is the only frame that provides a context,
remove the intermediate `context = None` / `if context:` pattern
and handle context processing directly in the isinstance branch.
Replace the nested services/speech/ and services/function-calling/ with
top-level voice/ and function-calling/ directories. Update eval script
paths and README to match.
Move 304 examples from a flat numbered directory into 14 descriptive
subfolders: getting-started, services (speech + function-calling),
transcription, vision, realtime, persistent-context,
context-summarization, update-settings (stt/tts/llm), turn-management,
thinking-and-mcp, transports, video-avatar, video-processing, and
features.
Strip numbered prefixes from filenames (e.g. 07c-interruptible-deepgram.py
becomes services/speech/deepgram.py) since the folder context makes them
redundant. Keep numbered prefixes only in getting-started/ where ordering
matters.
Update eval script paths and README to match the new structure.
Add WebsocketLLMService as a base class for WebSocket-based LLM services,
parallel to WebsocketTTSService/WebsocketSTTService but codifying a
transactional request-response model rather than a continuous background
receive loop.
WebsocketLLMService provides:
- Connection lifecycle (start/stop/cancel → connect/disconnect)
- _ws_send/_ws_recv with transparent ConnectionClosed handling
(auto-reconnect via exponential backoff → WebsocketReconnectedError)
- _ensure_connected with retry via _try_reconnect
OpenAIResponsesLLMService now inherits from WebsocketLLMService, removing
duplicated connection management code (_connect, _disconnect, _reconnect,
_ensure_connected, _ws_send, start, stop, cancel) and simplifying
_process_context from a loop with attempt tracking to a flat try/except
with a single retry.
When a user interruption causes the LLM chunk stream to exit early,
function call arguments may be incomplete JSON. Wrap json.loads() in
try/except JSONDecodeError to skip malformed function calls with a
warning instead of crashing. Fixes#2461.
Buffer raw bytes and only decode after splitting on newline boundaries,
preventing multi-byte UTF-8 characters from being split at chunk edges.
Fixes#3538
A single service failing to reconnect should not kill the entire
pipeline. Non-fatal errors flow through the pipeline so application
code (e.g. ServiceSwitcher) can handle failover to a backup service.
When a WebSocket server accepts the handshake but immediately closes the
connection (e.g. invalid API key returning close code 1008), the existing
exponential backoff does not help because the handshake keeps succeeding.
This tracks how long each connection survives and emits a non-fatal
ErrorFrame after 3 consecutive sub-5s failures, allowing ServiceSwitcher
failover instead of killing the pipeline.
Fixes#3711
- Use finally block in _disconnect to ensure state is always cleaned
up, even if websocket.close() throws — prevents stale cancellation
state (e.g. _cancel_pending_response) from polluting a new connection
- Catch ConnectionClosed in _drain_cancelled_response alongside
TimeoutError — prevents _needs_drain from staying True and bricking
the service on every subsequent inference attempt
- Fall back to OPENAI_API_KEY env var when api_key is not passed,
since the WebSocket connection uses raw websockets (not the
AsyncOpenAI client which handles this automatically)
- Use _clear_cancellation_state() instead of piecemeal resets where
appropriate
Instead of trying to filter stale events inline (unreliable — the API
doesn't provide a way to correlate events to a specific response),
drain remaining events from a cancelled response before starting the
next one. On cancellation, send response.cancel and set a drain flag.
At the start of the next _process_context, read and discard events
until a terminal event arrives, ensuring a clean connection. Falls
back to reconnecting if draining times out.
Over HTTP, previous_response_id requires store=True (30-day OpenAI-side
conversation storage). The WebSocket variant avoids this via a
connection-local in-memory cache that works with store=False. Add
comments explaining this in both class docstrings, at the store=False
parameter, and in the adapter's previous_response_id note.
Add detailed trace-level logging to _apply_previous_response_optimization
showing why the optimization was applied or fell back to full context,
including the relevant data for debugging.
Use append_to_context=False for the filler TTSSpeakFrame in the
function-calling example to avoid altering the conversation history
and breaking the previous_response_id prefix match.
When using previous_response_id, the server already knows its own
output from the previous response. Store the raw response output and,
on the next call, compare it against the items following the matched
input prefix — checking role and text content for messages, and call_id
for function calls. If the items match, skip them and send only truly
new input (user messages, tool results). Falls back to full context if
either the prefix or the output comparison fails.
Introduce a WebSocket variant of the OpenAI Responses API service that
maintains a persistent connection to wss://api.openai.com/v1/responses
for lower-latency inference. The WebSocket variant automatically uses
previous_response_id to send only incremental context when possible,
falling back to full context on reconnection or cache miss.
The WebSocket variant becomes the new default OpenAIResponsesLLMService,
and the HTTP variant is renamed to OpenAIResponsesHttpLLMService. Both
share a private base class with common settings, parameter building,
and run_inference (always HTTP) logic.
Update langchain 0.3→1.2, langchain-community 0.3→0.4, and
langchain-openai 0.3→1.1. This also unblocks openai>=2.26 which
was previously constrained by the now-removed openpipe package.
OpenPipe was acquired by CoreWeave in September 2025. The Python package
hasn't been updated since June 2025 and the repo since 2024. The openpipe
package caps openai<=1.97.1, creating dependency conflicts with other
extras. Remove the dead integration to clean up the codebase.
- Add Nebius LLM service wrapping OpenAI-compatible Token Factory API
- Set supports_developer_role = False (Nebius rejects developer role)
- Default to openai/gpt-oss-120b model (supports function calling)
- Add Nebius function-calling example and env.example entry
- Fix Sarvam developer role support
- Update examples to use developer role for intro messages
Adds an OpenAI-compatible LLM service for Nebius Token Factory, supporting
open-source models (Meta Llama, Qwen, DeepSeek) via their OpenAI-compatible
REST API at https://api.tokenfactory.nebius.com/v1/.
When the remote side disconnects while send() is in flight, send() was
setting _closing=True. This prevented the receive loop from firing
on_client_disconnected, causing the pipeline to hang waiting for a
disconnect signal that never came.
The fix removes _closing from send() (that flag means we initiated the
close) and instead checks Starlette application_state in _can_send()
to suppress subsequent sends after a failure.
Fixes#3912
Add `await asyncio.sleep(0)` after `create_task()` calls in
UserIdleController, SpeechTimeoutUserTurnStopStrategy,
TurnAnalyzerUserTurnStopStrategy, and UserTurnCompletionLLMServiceMixin
so the event loop schedules the newly created timer tasks before the
caller continues.
The heartbeat monitor timeout (`HEARTBEAT_MONITOR_SECS`) was a static
module-level constant that never derived from the user-configurable
`heartbeats_period_secs`. This meant overriding the heartbeat interval
had no effect on the monitor window, causing spurious warnings or
delayed detection depending on the configured interval.
Add a new `heartbeats_monitor_secs` parameter to `PipelineParams` so
the monitor timeout is independently configurable (defaults to 10s).
The monitor handler now reads from the instance param instead of the
hard-coded constant.
Made-with: Cursor
@@ -23,7 +23,7 @@ Create your integration following the patterns and examples shown in the "Integr
Your repository must contain these components:
- **Source code** - Complete implementation following Pipecat patterns
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational))
- **Foundational example** - Single file example showing basic usage (see [Pipecat examples](https://github.com/pipecat-ai/pipecat/tree/main/examples))
- **README.md** - Must include:
- Introduction and explanation of your integration
- Installation instructions
@@ -225,6 +225,17 @@ Vision services process images and provide analysis such as descriptions, object
### Naming Conventions
#### Package and Repository Naming
Use the `pipecat-{vendor}` naming convention for your PyPI package and repository:
-`pipecat-{vendor}` — for single-service integrations (e.g., `pipecat-deepdub`)
-`pipecat-{vendor}-{type}` — when a vendor offers multiple service types (e.g., `pipecat-upliftai-stt`, `pipecat-upliftai-tts`)
This convention makes community packages easily discoverable via PyPI search and clearly identifies them as part of the Pipecat ecosystem.
#### Class Naming
- **STT:** `VendorSTTService`
- **LLM:** `VendorLLMService`
- **TTS:**
@@ -406,8 +417,9 @@ Use Pipecat's tracing decorators:
### Packaging and Distribution
- Name your package `pipecat-{vendor}` (see [Naming Conventions](#naming-conventions))
- Use [uv](https://docs.astral.sh/uv/) for packaging (encouraged)
-Consider releasing to PyPI for easier installation
**Pipecat** is an open-source Python framework for building real-time voice and multimodal conversational agents. Orchestrate audio and video, AI services, different transports, and conversation pipelines effortlessly—so you can focus on what makes your agent unique.
> Want to dive right in? Try the [quickstart](https://docs.pipecat.ai/getting-started/quickstart).
> Want to dive right in? Run `pipecat init quickstart` or follow the [quickstart guide](https://docs.pipecat.ai/getting-started/quickstart).
## 🚀 What You Can Build
@@ -28,6 +28,10 @@
## 🌐 Pipecat Ecosystem
### 🧩 Multi-agent systems
Need multiple AI agents working together? [Pipecat Subagents](https://github.com/pipecat-ai/pipecat-subagents) lets you build distributed multi-agent systems where each agent runs its own pipeline and communicates through a shared message bus. Hand off conversations between specialists, dispatch background tasks, and scale agents across processes or machines.
### 📱 Client SDKs
Building client applications? You can connect to Pipecat from any platform using our official SDKs:
@@ -67,7 +71,7 @@ and install any of the available plugins.
### 🧩 Community Integrations
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
Build and share your own Pipecat service integrations! Browse existing [community integrations](https://docs.pipecat.ai/api-reference/server/services/community-integrations) or check out our [guide](COMMUNITY_INTEGRATIONS.md) to create your own.
### 📺️ Pipecat TV Channel
@@ -79,28 +83,28 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
| Community | [Browse community integrations →](https://docs.pipecat.ai/api-reference/server/services/community-integrations) |
📚 [View full services documentation →](https://docs.pipecat.ai/server/services/supported-services)
📚 [View full services documentation →](https://docs.pipecat.ai/api-reference/server/services/supported-services)
## ⚡ Getting started
@@ -142,15 +146,15 @@ You can get started with Pipecat running on your local machine, then move your a
## 🧪 Code examples
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples/foundational) — small snippets that build on each other, introducing one or two concepts at a time
- [Foundational](https://github.com/pipecat-ai/pipecat/tree/main/examples) — small snippets that build on each other, introducing one or two concepts at a time
- [Example apps](https://github.com/pipecat-ai/pipecat-examples) — complete applications that you can use as starting points for development
## 🛠️ Contributing to the framework
### Prerequisites
**Minimum Python Version:** 3.10
**Recommended Python Version:** 3.12
**Minimum Python Version:** 3.11
**Recommended Python Version:** >= 3.12
### Setup Steps
@@ -166,7 +170,6 @@ You can get started with Pipecat running on your local machine, then move your a
- Added `on_turn_context_created(context_id)` hook to `TTSService`. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created.
-`TTSService`: the default `stop_frame_timeout_s` (idle time before an automatic `TTSStoppedFrame` is pushed when `push_stop_frames=True`) has changed from `2.0` to `3.0` seconds.
- Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use `system_instruction` to set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion).
- ⚠️ `GeminiLLMAdapter` now only treats `messages[0]` as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction.
- Fixed Gemini Live (`GoogleGeminiLiveLLMService`) not honoring `settings.system_instruction`. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored.
- Fixed `AWSBedrockLLMAdapter` sending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior.
- Added `SmallestTTSService`, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings.
- Added warnings in turn stop strategies when `VADParams.stop_secs` differs from the recommended default (0.2s) or when `stop_secs >= STT p99 latency`, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the [stt-benchmark](https://github.com/pipecat-ai/stt-benchmark) with their VAD settings.
- Added `cleanup()` method to `VADAnalyzer` and `VADController` so VAD analyzer resources are properly released when no longer needed. Custom `VADAnalyzer` subclasses can override `cleanup()` to free any held resources.
- Fixed Gemini Live pipeline hanging indefinitely when an `EndFrame` was deferred while waiting for the bot to finish responding and `turn_complete` never arrived. As a possible root-cause fix, `turn_complete` messages are now handled even if they lack `usage_metadata`. As a fallback, the deferred `EndFrame` now has a 30-second safety timeout.
- Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit.
- Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The `LLMFullResponseEndFrame` was racing ahead of the last `TTSTextFrame`, causing the `LLMAssistantAggregator` to finalize the context before the final sentence arrived.
- Added `on_end_of_turn` event handler to `AssemblyAISTTService`. This fires after the final transcript is pushed, providing a reliable hook for end-of-turn logic that doesn't race with `TranscriptionFrame`. Works in both Pipecat and AssemblyAI turn detection modes.
- ⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefer `system_instruction` from service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set.
- Fixed audio crackling and popping in recordings when both user and bot are speaking. `AudioBufferProcessor` no longer injects silence into a track's buffer while that track is actively producing audio, preventing mid-utterance interruptions in the recorded output.
- Unrecognized language strings (e.g. Deepgram's `"multi"`) no longer produce a warning at startup. The log message has been downgraded to debug level since these are valid service-specific values that are passed through correctly.
-`GrokLLMService` and `GrokRealtimeLLMService` now live in the `pipecat.services.xai` module alongside `XAIHttpTTSService`, since all three use the same xAI API. Update imports from `pipecat.services.grok.*` to `pipecat.services.xai.*` (e.g. `from pipecat.services.xai.llm import GrokLLMService`).
-`pipecat.services.grok.llm`, `pipecat.services.grok.realtime.llm`, and `pipecat.services.grok.realtime.events` are deprecated. The old import paths still work but emit a `DeprecationWarning`; use `pipecat.services.xai.llm`, `pipecat.services.xai.realtime.llm`, and `pipecat.services.xai.realtime.events` instead.
- Added `DeepgramFluxSageMakerSTTService` for running Deepgram Flux speech-to-text on AWS SageMaker endpoints. Use with `ExternalUserTurnStrategies` to take advantage of Flux's turn detection.
- Fixed a race condition in `InterruptibleTTSService` where, if `run_tts` had been invoked but `BotStartedSpeakingFrame` had not yet been received, a user interruption could allow stale audio to leak through.
- ⚠️ `TTSService.add_word_timestamps()` no longer supports the `"Reset"` and `"TTSStoppedFrame"` sentinel strings. If you have a custom TTS service that called `await self.add_word_timestamps([("Reset", 0)])` or `await self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id)`, replace them with `await self.append_to_audio_context(ctx_id, TTSStoppedFrame(context_id=ctx_id))` and let `_handle_audio_context` manage the word-timestamp reset automatically.
- Fixed Gemini Live local VAD mode (`GeminiVADParams(disabled=True)` with external VAD) not working. The bot now correctly detects user speech and signals turn boundaries to the Gemini API.
- Fixed Gemini Live message handling to process all `server_content` fields independently. Gemini 3.x can bundle multiple fields (e.g. `model_turn` and `output_transcription`) on the same message, but the previous `elif` chain only processed the first match, silently dropping the rest.
- Fixed `ServiceSwitcher` with `ServiceSwitcherStrategyFailover` incorrectly triggering failover when `ErrorFrame`s from other pipeline stages (e.g. TTS) propagated upstream through the switcher. Previously, any non-fatal error passing through would be misattributed to the active service and trigger an unwanted service switch. Now only errors originating from the switcher's own managed services trigger failover.
- Fixed `LiveKitOutputTransport` not clearing the `rtc.AudioSource` internal buffer on interruption, causing the bot to continue speaking for several seconds after being interrupted.
- Fixed a crash in OpenAI LLM processing when the provider returns `chunk.choices[0].delta.audio = None`, which caused `'NoneType' object has no attribute 'get'` errors during audio transcript handling.
- Fixed error floods in `DeepgramSTTService` when the WebSocket connection drops. With Deepgram SDK 6.x, `send_media()` raises exceptions on a dead connection instead of silently failing, causing every queued audio frame to log an error. Now `send_media()` failures are caught gracefully — a single warning is logged and audio frames are skipped until the existing reconnection logic restores the connection.
- Added `Mem0MemoryService.get_memories()` convenience method for retrieving all stored memories outside the pipeline (e.g. to build a personalized greeting at connection time). This avoids the need to manually handle client type branching, filter construction, and async wrapping.
- Fixed `Mem0MemoryService` failing to store messages when the context contained system or developer role messages. The Mem0 API only accepts user and assistant roles, so other roles are now filtered out before storing.
-`Mem0MemoryService` no longer blocks the event loop during memory storage and retrieval. All Mem0 API calls now run in a background thread, and message storage is fire-and-forget so it doesn't delay downstream processing.
- Added missing `on_dtmf_event` callback to `LemonSliceTransportClient.setup()``DailyCallbacks` construction, fixing a `ValidationError` at pipeline setup time.
- Fixed duplicate `TTSStoppedFrame` being pushed in TTS services using `push_stop_frames=True`. When the stop-frame timeout fired, a second `TTSStoppedFrame` could be pushed after the normal one at context completion.
-`RimeTTSService` now handles Rime's `done` WebSocket message to complete audio contexts immediately, eliminating the 3-second idle timeout that previously added latency at the end of each utterance.
- ⚠️ Fixed `DeepgramSTTService` compatibility with deepgram-sdk 6.1.0. The SDK now requires explicit message objects for `send_keep_alive()`, `send_close_stream()`, and `send_finalize()`. The minimum deepgram-sdk version is now 6.1.0.
- Fixed RTVI events not being delivered to clients when using WebSocket transports. `ProtobufFrameSerializer` now sets `ignore_rtvi_messages=False` by default.
- Added a `session_id` field to `RunnerArguments` so bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned a `sessionId` to the caller (Daily `/start`, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC `/api/offer` endpoint also accepts an optional `session_id` query parameter so the `/sessions/{session_id}/...` proxy can thread it through.
- Added a `max_buffer_delay_ms` constructor argument to `CartesiaTTSService` for controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based on `text_aggregation_mode`: `0` in `SENTENCE` mode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset in `TOKEN` mode (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to override.
- Default `cartesia_version` for `CartesiaTTSService` bumped from `2025-04-16` to `2026-03-01`, matching `CartesiaHttpTTSService` and unlocking the `use_normalized_timestamps` and `max_buffer_delay_ms` fields.
- ⚠️ `CartesiaTTSService` now sends `use_normalized_timestamps: true` instead of the deprecated `use_original_timestamps` field. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change for `sonic-3` users, who were previously receiving timestamps tied to the input transcript.
- Fixed `CartesiaHttpTTSService` pushing two `ErrorFrame`s on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly.
- Fixed Cartesia tag helpers (`SPELL`, `EMOTION_TAG`, `PAUSE_TAG`, `VOLUME_TAG`, `SPEED_TAG`) raising `TypeError` when called on an instance (e.g. `tts.SPELL("hi")`). They're now `@staticmethod` and callable from both the class and an instance.
- Fixed `CartesiaTTSService` surfacing `flush_done` messages from Cartesia as `ErrorFrame`s. The latest API emits a `flush_done` per transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its own `context_id`.
- Fixed an issue where `LocalSmartTurnAnalyzerV3` was imported unconditionally for user turn stop strategies. It is now only imported when `default_user_turn_stop_strategies()` is called. This improves startup time and removes the `transformers` "PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used.
- Broadened `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Three changes: a rename (`tool_resources` → `app_resources`), a new `app_resources` property on `PipelineTask`, and a new `pipeline_task` property on `FrameProcessor`. Tool handlers now read `params.app_resources`; custom processors read `self.pipeline_task.app_resources`. The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.
- Added a `mip_opt_out` constructor argument to `DeepgramTTSService` and `DeepgramHttpTTSService` so callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults to `None`, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling.
- Added support to `AWSNovaSonicLLMService` for the new "async tool call" mechanism activated by `cancel_on_interruption=False`, which includes delivering results asynchronously, delivering result streams, and cancelling running async tools. Support for the other major realtime services (`GeminiLiveLLMService`, `OpenAIRealtimeLLMService`) will be added in a follow-up PR.
- Fixed a regression in `AWSNovaSonicLLMService` where `cancel_on_interruption=False` (which previously worked under the old async-tool-call mechanism, by simply avoiding discarding tool calls on interruptions) stopped working after the introduction of the new "async tool call" mechanism.
This directory contains examples to help you learn how to build with Pipecat.
This directory contains examples showing how to build voice and multimodal agents with Pipecat.
## Getting Started
## Setup
New to Pipecat? Start here:
1. Follow the [README](https://github.com/pipecat-ai/pipecat/blob/main/README.md#%EF%B8%8F-contributing-to-the-framework) steps to get your local environment configured.
- **[Quickstart](quickstart/)** - Get your first voice AI bot running in 5 minutes _(coming soon)_
- **[Client/Server Web](client-server-web/)** - Learn to build web applications with Pipecat's client SDKs _(coming soon)_
- **[Phone Bot with Twilio](phone-bot-twilio/)** - Connect your bot to a phone number _(coming soon)_
> **Run from root directory**: Make sure you are running the steps from the root directory.
## Foundational Examples
> **Using local audio?**: The `LocalAudioTransport` requires a system dependency for `portaudio`. Install the dependency to use the transport.
Single-file examples that introduce core Pipecat concepts one at a time. These examples:
2. Copy the [`env.example`](../env.example) file and add API keys for services you plan to use:
- Build on each other progressively
- Focus on specific features or integrations
- Are used for testing with every Pipecat release
```bash
cp env.example .env
# Edit .env with your API keys
```
See the **[Foundational Examples README](foundational/)** for the complete list.
3. Run any example:
## More Advanced Examples
```bash
uv run python getting-started/01-say-one-thing.py
```
Ready to explore complex use cases? Visit **[pipecat-examples](https://github.com/pipecat-ai/pipecat-examples)** for:
4. Open the web interface at http://localhost:7860/client/ and click "Connect"
- Production-ready applications
- Multi-platform client implementations
- Telephony integrations
- Multimodal and creative applications
- Deployment and monitoring examples
## Running examples with other transports
Most examples support running with other transports, like Twilio or Daily.
### Daily
You need to create a Daily account at https://dashboard.daily.co/u/signup. Once signed up, you can create your own room from the dashboard and set the environment variables `DAILY_ROOM_URL` and `DAILY_API_KEY`. Alternatively, you can let the example create a room for you (still needs `DAILY_API_KEY` environment variable). Then, start any example with `-t daily`:
```bash
uv run getting-started/06-voice-agent.py -t daily
```
### Twilio
It is also possible to run the example through a Twilio phone number. You will need to setup a few things:
1. Install and run [ngrok](https://ngrok.com/download).
```bash
ngrok http 7860
```
2. Configure your Twilio phone number. One way is to setup a TwiML app and set the request URL to the ngrok URL from step (1). Then, set your phone number to use the new TwiML app.
Then, run the example with:
```bash
uv run getting-started/06-voice-agent.py -t twilio -x NGROK_HOST_NAME
```
## Directory Structure
### [`getting-started/`](./getting-started/)
Progressive introduction to Pipecat, from minimal TTS to a full voice agent with function calling.
### [`voice/`](./voice/)
Full STT + LLM + TTS voice agent pipelines showcasing different speech service providers (Deepgram, ElevenLabs, Cartesia, etc.)
### [`function-calling/`](./function-calling/)
Function calling with different LLM providers (OpenAI, Anthropic, Google, etc.)
### [`transcription/`](./transcription/)
Speech-to-text examples with various STT providers.
### [`vision/`](./vision/)
Image description and vision capabilities with different multimodal LLMs.
### [`realtime/`](./realtime/)
Realtime and multimodal live APIs (OpenAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox, Grok).
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.