Commit Graph

9179 Commits

Author SHA1 Message Date
Gökmen Görgen
e25dccfc6b update aic-sdk to ~=2.2.0 and rename AICOUSTICS_LICENSE_KEY to AIC_LICENSE_KEY. 2026-04-25 10:13:06 +02:00
Gökmen Görgen
3bbfc42854 remove adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 10:05:47 +02:00
Gökmen Görgen
3b2127f912 rename environment variables and references from AICOUSTICS to AIC. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
ea12b10742 rename mcp-aic-adaptive.py to mcp-aicoustics-adaptive.py. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
a2fbed86cf add adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
f75f361629 bump aic-sdk to 2.2.0 and update AICFilter with model_id and enhancement_level changes. 2026-04-25 09:51:23 +02:00
Filipi da Silva Fuchter
38a02271c5 Merge pull request #4368 from pipecat-ai/filipi/stt_service
Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.
2026-04-24 14:31:36 -03:00
filipi87
2ce203aeb8 Renaming the method to _maybe_reconnect_on_user_stopped_speaking. 2026-04-24 13:08:32 -03:00
filipi87
b30df95f13 Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService. 2026-04-24 13:00:38 -03:00
kompfner
6be8deee2a Merge pull request #4361 from pipecat-ai/pk/pyright-fixes
Some pyright fixes
2026-04-24 11:58:28 -04:00
Paul Kompfner
c113cacd59 refactor(types): name the LLMContext/OpenAI boundary with explicit cast helpers
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.

Introduce two module-private cast helpers in open_ai_adapter.py:

- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)

Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.

No behavior change, no pyright error delta.
2026-04-24 10:10:03 -04:00
Paul Kompfner
d0495eeef6 fix(types): narrow voice in SpeechmaticsTTSSettings to disallow None
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.

Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
2026-04-23 21:08:47 -04:00
Paul Kompfner
c3eb69165c fix(types): accept SDK NotGiven in LLM Settings fields used for passthrough
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.

Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):

- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
  temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
  max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.

Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.

Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
2026-04-23 18:32:46 -04:00
Paul Kompfner
0302f6d05c chore(pyright): drop newly-clean files from ignore list
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).

- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
2026-04-23 18:18:00 -04:00
Paul Kompfner
b9ff333654 fix(types): admit None on settings fields that accept it as a default
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.

Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.

Fields touched across 11 files:

- services/settings.py: TTSSettings.voice (base class; covers
  asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
  neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
  style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
  max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
  split_sentences, enable_diarization, speaker_*, max_speakers,
  prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.

Clears 94 pyright errors (1035 -> 941).
2026-04-23 18:18:00 -04:00
Paul Kompfner
92610944af chore(pyright): drop newly-clean files from ignore list
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).

- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
2026-04-23 17:44:17 -04:00
Paul Kompfner
6a337f1bc6 fix(types): assert_given at store-mode settings read sites
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).

Two idioms used:

- Inline wrap for single uses:
    func(assert_given(self._settings.enable_prompt_caching), ...)

- Extract-and-reuse when the same value is used multiple times:
    thinking = assert_given(self._settings.thinking)
    if thinking:
        params["thinking"] = thinking.model_dump(...)

43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
2026-04-23 17:39:17 -04:00
Filipi da Silva Fuchter
ef7fa07bf7 Merge pull request #4358 from pipecat-ai/filipi/fix_aiortc_sctp
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU
2026-04-23 17:49:18 -03:00
filipi87
ce1506792e Linking to the docs instead of full explanation. 2026-04-23 17:46:54 -03:00
Paul Kompfner
70f3d32734 feat(types): add assert_given for narrowing store-mode settings reads
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.

assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.

    resolved_model = assert_given(self._settings.model)  # str | None
2026-04-23 16:40:07 -04:00
Paul Kompfner
356618b448 fix(types): use is_given at call sites pyright flagged
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.

- adapters/services/anthropic_adapter.py: narrow converted.system for
  _resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
  is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
  is_given (aliased as openai_is_given alongside the existing
  settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
2026-04-23 16:15:07 -04:00
Paul Kompfner
1624d7a474 feat(types): add is_given TypeGuard helpers for NotGiven sentinels
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.

Each helper is co-located with the NotGiven flavor it guards:

- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
  LLMContext's NotGiven. Treat LLMContext's re-exported types
  (LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
  LLMContext's own — independent definitions that happen to coincide
  with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
  NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
  NotGiven.
2026-04-23 15:33:43 -04:00
Paul Kompfner
092b1dcb0f fix(types): widen TLLMInvocationParams bound to Mapping[str, Any]
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
2026-04-23 14:35:59 -04:00
Mark Backman
b90ea9bf6a Merge pull request #4352 from pipecat-ai/mb/pyright-fixes-1-per-file
More pyright fixes
2026-04-23 14:14:36 -04:00
kompfner
05c97804d5 Merge pull request #4359 from pipecat-ai/pk/changelog-4355-rename
chore: rebind Gemini Live reconnect changelog fragment to PR #4355
2026-04-23 14:10:36 -04:00
Paul Kompfner
7a8357a569 chore: rebind Gemini Live reconnect changelog fragment to PR #4355
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
2026-04-23 12:00:56 -04:00
filipi87
44756de15a Adding changelog for the SmallWebRTC fix. 2026-04-23 12:19:56 -03:00
filipi87
94304ec74e Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU. 2026-04-23 12:18:33 -03:00
kompfner
a3fe34f4a2 Merge pull request #4355 from pipecat-ai/pk/gemini-live-context-reseed-on-reconnect
Re-seed Gemini Live context on reconnect without session resumption
2026-04-23 11:00:22 -04:00
Sathwika Reddy Geereddy
21f6c2afa5 Update NVIDIA STT services for Nemotron Speech defaults and config parity (#4269)
* Update NVIDIA STT services for Nemotron Speech defaults and config parity

* Add changelog entry for PR #4269

* initialize boosted LM settings defaults in streaming STT

* Align NVIDIA STT language handling with other STT services

* add finalised flag to Nvidia stt final transcripts, remove processing latency logs

* Changing interim transcription logging to tracing.

---------

Co-authored-by: sathwika <geereddysath@nvidia.com>
Co-authored-by: filipi87 <filipi87@gmail.com>
2026-04-23 09:01:27 -04:00
Filipi da Silva Fuchter
4d14251f4a Merge pull request #4354 from pipecat-ai/filipi/includes_inter_frame_spaces
feat(tts): add includes_inter_frame_spaces flag to word-timestamp API - follow-up
2026-04-23 08:49:26 -03:00
Paul Kompfner
1421c4ba22 fix: handle Gemini Live 2.5 quirks when re-seeding context on reconnect
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.

On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
2026-04-22 15:58:54 -04:00
filipi87
6b1d8d9fa5 Fixing merge conflicts. 2026-04-22 15:22:32 -03:00
filipi87
ac810e57ed Merge branch 'main' into filipi/includes_inter_frame_spaces
# Conflicts:
#	uv.lock
2026-04-22 15:22:06 -03:00
filipi87
bba7ca80e3 Bumping to small-webrtc-prebuilt 2.5.0 to fix karaoke highlighting. 2026-04-22 15:20:37 -03:00
filipi87
79250f1fe0 Making includes_inter_frame_spaces optional for word-timestamp. 2026-04-22 14:20:30 -03:00
Mark Backman
4f6e76e6fd Add changelog entries for #4352 2026-04-22 12:23:33 -04:00
Mark Backman
b0962861c8 Acknowledge Tkinter's GC-reference idiom with a scoped type ignore
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).

Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
2026-04-22 12:19:16 -04:00
Mark Backman
ec7c35fe98 Move Mistral message fixups into MistralLLMAdapter
Mistral imposes three conversation-history quirks on top of the
OpenAI-compatible wire format: tool messages must be followed by an
assistant message; non-initial system messages are rejected; trailing
assistant messages require `prefix=True`. These rules were applied
inline in `MistralLLMService.build_chat_completion_params`, which is the
wrong layer — every other provider with OpenAI-compatible-but-quirky
shape (Perplexity, etc.) owns its transformations in a
`BaseLLMAdapter` subclass that runs during `get_llm_invocation_params`.

Create `MistralLLMAdapter(OpenAILLMAdapter)` on the Perplexity template
and wire it in via the existing `adapter_class` dispatch. The service
now only handles Mistral-specific request-level mapping (`random_seed`
in place of `seed`), and the message shape concerns live with other
provider format logic.

No behavior change. The transform function casts to `list[dict[str,
Any]]` internally because mutating `role` and attaching Mistral's
non-standard `prefix` field both step outside OpenAI's TypedDict
contract; the cast at the return boundary encodes that we're emitting
Mistral's extended schema, not OpenAI's.
2026-04-22 12:17:46 -04:00
Mark Backman
10b86b4bbe Coerce inspect.getdoc() None to empty string before parsing
`inspect.getdoc()` returns `str | None`, but `docstring_parser.parse()`
requires `str`. Functions without a docstring produced `None`, which
the type checker correctly flagged.

Coerce to `""` at the call site. `docstring_parser.parse("")` returns
an empty docstring whose `.description` and `.params` are already
handled by the surrounding `or ""` fallbacks, so runtime behavior is
unchanged.
2026-04-22 12:01:00 -04:00
Mark Backman
8ec56092c0 Remove duplicate ResponseCreated type 2026-04-22 11:58:15 -04:00
Mark Backman
0c3c5e5c7d Widen ToolsSchema.standard_tools to Sequence for covariance
`ToolsSchema.__init__` declared `standard_tools: list[FunctionSchema |
DirectFunction]`. Callers (`BaseLLMAdapter`, `MCPService`) pass in
`list[FunctionSchema]`, which is not assignable to the union list
because `list` is invariant in its element type.

Widen the parameter to `Sequence[...]` (covariant) so `list[X]` and
`list[X | Y]` both fit. A narrower `list[FunctionSchema]` is still
accepted, and nothing in this class mutates the argument — the
constructor immediately copies it via `_map_standard_tools`.

Also correct the `custom_tools` property return type to include
`None`, matching the stored `_custom_tools` field.

This single edit clears the pyright errors for three ignore-list
entries: `tools_schema.py`, `base_llm_adapter.py`, and `mcp_service.py`.
2026-04-22 11:54:20 -04:00
Mark Backman
b64ed3f9e2 Narrow settings.model at service boundaries, not via truthiness
Two services were reading `_settings.model` (typed `str | _NotGiven |
None` because NOT_GIVEN is the default) and coercing it with `or ""`
or similar. `_NotGiven.__bool__` returns False, so the runtime
behavior happened to work, but the type was a lie — pyright saw
`str | _NotGiven` flowing into APIs that required `str` or `str | None`.

- `AIService._sync_model_name_to_metrics`: use `isinstance(model, str)`
  narrowing with an empty-string fallback. Equivalent runtime behavior,
  honest type, no truthiness dependency on a sentinel.
- `SarvamLLMService.__init__`: validate the model is a real string
  before handing it to `_validate_model(str)`. A non-string model at
  this point is a configuration bug; raise `ValueError` so the error
  is clear and survives `python -O` (unlike an assert).
2026-04-22 11:52:20 -04:00
Mark Backman
5872006d6b Encode lazy-init invariants at the right site, not at read sites
Three spots had the same shape: a field starts None, a later method
populates it, a read site later reads it. Pyright can't track the
cross-method invariant. Rather than spray assertions at the read
sites, fix each site at the structural level:

- `FastAPIWebsocketInputTransport._monitor_websocket` now takes the
  session timeout as an argument. The task-creation site already
  guards on truthiness, so the call can pass the non-None value
  directly and the method's signature tells the truth.
- `FrameProcessorMetrics.task_manager` raises `RuntimeError` instead
  of asserting. Asserts are stripped under `python -O`; a real raise
  keeps the runtime safety net and still narrows the type for pyright.
- `SOXRStreamAudioResampler._maybe_initialize_sox_stream` returns the
  initialized stream. Callers use the return value and never touch
  the Optional `_soxr_stream` attribute, so narrowing stays inside
  the init method where the invariant is established.
2026-04-22 11:45:18 -04:00
Mark Backman
457eb7aa92 Mark abstract image/vision generators as real async generators
`ImageGenService.run_image_gen` and `VisionService.run_vision` were
declared `async def ... -> AsyncGenerator[Frame, None]` with `pass`
bodies. Without a `yield` anywhere in the body, Python treats the
function as a coroutine returning an `AsyncGenerator`, not as an async
generator itself, so callers got a coroutine where they expected an
iterator.

Add `raise NotImplementedError; yield` so the body contains a yield
(making this a real async generator) while still raising cleanly if a
subclass ever calls `super().run_*` by mistake.
2026-04-22 11:19:23 -04:00
Mark Backman
14cd476b20 Drop pyright ignores for services fixed by run_stt/run_tts widening
Deepgram STT, Gradium TTS, Smallest STT, and xAI STT/TTS had exactly
one pyright error each, all of them the AsyncGenerator return-type
mismatch resolved in 08fe9157c. Remove them from the ignore list.
2026-04-22 11:09:27 -04:00
Mark Backman
3b0affe5b4 Guard run_stt WebSocket sends with try/except
AssemblyAI, Cartesia, Gradium, and Soniox STT services sent audio over
the WebSocket without catching transient send failures, so a single
network hiccup could propagate an exception up through process_frame
and end the pipeline. Other push-based STT services (Deepgram, xAI,
Azure, Smallest, etc.) already guard their sends.

Follow the deepgram/stt.py pattern: log a warning and continue. The
existing connection-state check at the top of each call handles
recovery on the next invocation.
2026-04-22 11:03:41 -04:00
Mark Backman
08fe9157cc Widen run_stt/run_tts return type to AsyncGenerator[Frame | None, None]
The push-based STT/TTS implementations send audio/text over a socket and
receive results via a separate receive task, so there is nothing to
yield inline. They yield `None` by design. The previous declaration of
`AsyncGenerator[Frame, None]` disagreed with that, while the consumer
(`AIService.process_generator`) already accepted `Frame | None`. Widen
the producer side (abstract base and every subclass) so the type honestly
describes the contract.

Pure annotation change; no runtime behavior difference.
2026-04-22 11:01:50 -04:00
Mark Backman
3f3d3c9203 Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy
Split user-turn stop timeout into independent speech and STT timers
2026-04-22 10:23:03 -04:00
Mark Backman
6b6896a543 Merge pull request #4350 from pipecat-ai/mb/pyright-precise-ignore-list
Expand pyright coverage to full src/pipecat with per-file ignores
2026-04-22 09:56:59 -04:00