Commit Graph

9191 Commits

Author SHA1 Message Date
Paul Kompfner
005fe33b25 Update docs URLs in README to reflect new docs site structure and avoid redirects 2026-04-27 10:22:49 -04:00
Paul Kompfner
24154474c9 Add OpenAI Responses to the README's list of LLM services 2026-04-27 10:19:13 -04:00
kompfner
86effc4d10 Merge pull request #4015 from prettyprettyprettygood/feat/nova-sonic-session-continuation
feat(nova-sonic): add proactive session continuation for conversation…
2026-04-27 09:36:48 -04:00
Mark Backman
58e50882d8 Merge pull request #4374 from pipecat-ai/mb/fix-daily-runner-room-props
Expire runner-created Daily rooms after 4h
2026-04-27 09:07:31 -04:00
Mark Backman
ef183d0c96 Add changelog for #4374 2026-04-27 09:00:17 -04:00
Mark Backman
f078df7805 runner: expire Daily rooms after 4h to mirror Pipecat Cloud session limit
Runner-created Daily rooms previously had no expiration when callers
posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`).
The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's
cron never cleaned them up, so rooms accumulated indefinitely.

Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`,
inject `exp` and `eject_at_room_exp=True` into user-supplied properties via
`setdefault` (so explicit caller values still win), and pass
`room_exp_duration` to all four `configure()` call sites.
2026-04-27 09:00:17 -04:00
Mark Backman
815cd44c2a Merge pull request #4372 from pipecat-ai/mb/relax-frames-proto-5x
Relax protobuf pin to support both 5.x and 6.x runtimes
2026-04-27 08:58:23 -04:00
Garegin Harutyunyan
e5941926be Krisp tt demo tool (#4335)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* TT demo tool

* Some improvements for demo scripts, audio recordin, etc.

* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.

* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.

* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.

* Add audio resampling functionality and update demo scripts for improved audio processing

- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.

* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic

- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.

* Refactor audio processing scripts for improved readability and consistency

- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.

* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity

- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.

* more format fixes.

* removed demo scripts.

* reverted wrongly removed file.

* Corrected the IP integration logic.

* style fix.

* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy

- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.

* FIxed formatting

---------

Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
2026-04-27 08:14:00 -04:00
Mark Backman
6266c026a6 Merge pull request #4362 from ai-coustics/ai-coustics/aic-sdk-py-v2.2.0-update
Update aic-sdk to v2.2.0
2026-04-25 06:51:41 -04:00
Gökmen Görgen
e25dccfc6b update aic-sdk to ~=2.2.0 and rename AICOUSTICS_LICENSE_KEY to AIC_LICENSE_KEY. 2026-04-25 10:13:06 +02:00
Gökmen Görgen
3bbfc42854 remove adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 10:05:47 +02:00
Gökmen Görgen
3b2127f912 rename environment variables and references from AICOUSTICS to AIC. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
ea12b10742 rename mcp-aic-adaptive.py to mcp-aicoustics-adaptive.py. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
a2fbed86cf add adaptive audio enhancement example and support for runtime enhancement level updates in AICFilter. 2026-04-25 09:51:23 +02:00
Gökmen Görgen
f75f361629 bump aic-sdk to 2.2.0 and update AICFilter with model_id and enhancement_level changes. 2026-04-25 09:51:23 +02:00
Mark Backman
4c153e5d3c Add changelog for #4372 2026-04-24 21:20:46 -04:00
Mark Backman
4088992d97 Relax protobuf pin to support both 5.x and 6.x runtimes
Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7`
pin, blocking users whose dependency graph already constrains protobuf to
the 5.x line. The original bump (PR #4136) was only needed because
`nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1.

Changes:

- Widen base pin to `protobuf>=5.29.6,<7`.
- Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per
  Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x
  and 6.x runtimes, so this single artifact serves all users.
- Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can
  install `pipecat[dev,nvidia]` without resolver conflict. Comment in
  `frames.proto` documents the 1.67.x requirement for regeneration.
- Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This
  compensates for nvidia-riva-client's missing `protobuf` install
  requirement (upstream packaging gap, see
  https://github.com/nvidia-riva/python-clients/issues/172). When that
  issue is resolved, the explicit protobuf entry in the `nvidia` extra
  can be removed.

Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6;
`tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds
when `pipecat[nvidia]` is installed.
2026-04-24 21:15:32 -04:00
Osman Ipek
f1b16a672a feat(nova-sonic): add proactive session continuation for conversations >8min
Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds
transparent session continuation that rotates sessions in the background
before the limit is reached, preserving conversation context with no
user-perceptible interruption.

Implementation follows the AWS reference architecture:
- Monitor loop detects when session age exceeds threshold
- On assistant AUDIO contentStart: start buffering user audio, create next
  session (sessionStart + promptStart + system instruction)
- Track SPECULATIVE/FINAL text counts as completion signal
- On completion signal: send conversation history + audioInputStart +
  buffered audio to next session, then promote immediately
- Close old session in background (non-blocking)
- Dead session detection: recreate next session if idle >30s

Key design decisions:
- Session continuation enabled by default (fundamental for long conversations)
- Conversation history tracked in real-time via _sc_conversation_history
  (independent of pipeline context aggregator which updates asynchronously)
- Completion signal check in _handle_content_end_event (after history update)
  to ensure latest text is included in handoff
- Rolling audio buffer (default 3s) captures user audio during transition
- transition_threshold_seconds capped at 420s (7min) for safety margin
- Unified event methods (_send_text_event, _send_client_event, etc.) accept
  optional stream/prompt_name params, eliminating duplicate SC methods

Also adds:
- SessionContinuationParams config (enabled, threshold, buffer, timeout)
2026-04-24 14:55:55 -07:00
Filipi da Silva Fuchter
38a02271c5 Merge pull request #4368 from pipecat-ai/filipi/stt_service
Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.
2026-04-24 14:31:36 -03:00
filipi87
2ce203aeb8 Renaming the method to _maybe_reconnect_on_user_stopped_speaking. 2026-04-24 13:08:32 -03:00
filipi87
b30df95f13 Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService. 2026-04-24 13:00:38 -03:00
kompfner
6be8deee2a Merge pull request #4361 from pipecat-ai/pk/pyright-fixes
Some pyright fixes
2026-04-24 11:58:28 -04:00
Paul Kompfner
c113cacd59 refactor(types): name the LLMContext/OpenAI boundary with explicit cast helpers
LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are
currently aliased to their OpenAI equivalents, so passing values
between the two sides type-checks implicitly. That works today but
obscures the fact that these are meant to be conceptually distinct —
if LLMContext ever diverges from OpenAI's types, every implicit
crossing would silently break.

Introduce two module-private cast helpers in open_ai_adapter.py:

- _openai_from_llm_context_tool_choice(tool_choice)
- _openai_from_llm_standard_message(message)

Both are typed no-ops today (implemented with typing.cast) but each
carries a docstring explaining why the cast is present, and every
boundary crossing now routes through a named function. Future readers
(and future greps) can find the crossings; a later divergence becomes
a mechanical find-and-update rather than hunting through adapter code.

No behavior change, no pyright error delta.
2026-04-24 10:10:03 -04:00
Paul Kompfner
d0495eeef6 fix(types): narrow voice in SpeechmaticsTTSSettings to disallow None
After widening TTSSettings.voice to str | None | _NotGiven (so other
TTS services can opt into None as a valid "no voice" state), pyright
flagged Speechmatics' URL builder receiving str | None where it
required str.

Speechmatics has no "no voice" mode (the URL path includes the voice
name), so override the inherited field in SpeechmaticsTTSSettings to
str | _NotGiven. The call site stays as a plain assert_given(...)
without an extra None check.
2026-04-23 21:08:47 -04:00
Paul Kompfner
c3eb69165c fix(types): accept SDK NotGiven in LLM Settings fields used for passthrough
Three LLM services initialize certain Settings fields with the SDK's
NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value
flows unmodified into SDK API calls. The inherited field types from
LLMSettings only admit pipecat's _NotGiven, so pyright flagged each
constructor call as a flavor mismatch.

Widen the field types in each service-specific Settings subclass so
they accept both pipecat's _NotGiven (for delta-mode defaults) and
the corresponding SDK NotGiven (for store-mode passthrough):

- OpenAILLMSettings: frequency_penalty, presence_penalty, seed,
  temperature, top_p, max_tokens, max_completion_tokens.
- OpenAIResponsesLLMSettings: temperature, top_p,
  max_completion_tokens.
- AnthropicLLMSettings: temperature, top_k, top_p, thinking.

Every overridden field is genuinely read from self._settings and
passed directly to the SDK, so none of the overrides are vestigial.

Clears 21 pyright errors and restores test_service_settings_complete
parity with the pre-NOT_GIVEN-swap state.
2026-04-23 18:32:46 -04:00
Paul Kompfner
0302f6d05c chore(pyright): drop newly-clean files from ignore list
asyncai/tts and google/vertex/llm are now clean after the missing-None
sweep (both benefited from the TTSSettings.voice / LLMSettings
cascades).

- src/pipecat/services/asyncai/tts.py
- src/pipecat/services/google/vertex/llm.py
2026-04-23 18:18:00 -04:00
Paul Kompfner
b9ff333654 fix(types): admit None on settings fields that accept it as a default
Service-specific Settings subclasses declared fields as T | _NotGiven
(no None), but the services routinely pass None to those fields during
init to mean "don't override — use the vendor's default". The field
type just didn't reflect that a None value is valid, so pyright
flagged every None at the call sites.

Change the declarations to T | None | _NotGiven, matching the pattern
already used by ServiceSettings.model and TTSSettings.language. No
constructor-call changes; the default_factory stays NOT_GIVEN.

Fields touched across 11 files:

- services/settings.py: TTSSettings.voice (base class; covers
  asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral,
  neuphonic, piper, resembleai, rime, xtts TTS services).
- services/aws/llm.py: latency.
- services/aws/tts.py: engine, pitch, rate, volume, lexicon_names.
- services/azure/tts.py: emphasis, pitch, rate, role, style,
  style_degree, volume.
- services/google/gemini_live/llm.py: vad.
- services/google/llm.py: thinking.
- services/google/stt.py: language_codes.
- services/inworld/tts.py: speaking_rate, temperature.
- services/openai/tts.py: instructions, speed.
- services/speechmatics/stt.py: 13 fields (domain, operating_point,
  max_delay, end_of_utterance_*, punctuation_overrides, *_partials,
  split_sentences, enable_diarization, speaker_*, max_speakers,
  prefer_current_speaker, extra_params).
- services/ultravox/llm.py: output_medium.

Clears 94 pyright errors (1035 -> 941).
2026-04-23 18:18:00 -04:00
Paul Kompfner
92610944af chore(pyright): drop newly-clean files from ignore list
Three files no longer have pyright errors after the is_given /
assert_given sweep — remove them from the ignore list (which serves as
a live todo of files with remaining type errors).

- src/pipecat/processors/gstreamer/pipeline_source.py
- src/pipecat/services/camb/tts.py
- src/pipecat/services/speechmatics/tts.py
2026-04-23 17:44:17 -04:00
Paul Kompfner
6a337f1bc6 fix(types): assert_given at store-mode settings read sites
Apply assert_given across service modules to narrow reads from
store-mode settings fields (self._settings.X, default_settings.X),
where _NotGiven is declared in the field type but should never appear
at runtime (enforced by validate_complete()).

Two idioms used:

- Inline wrap for single uses:
    func(assert_given(self._settings.enable_prompt_caching), ...)

- Extract-and-reuse when the same value is used multiple times:
    thinking = assert_given(self._settings.thinking)
    if thinking:
        params["thinking"] = thinking.model_dump(...)

43 service files touched. Cleared ~172 pyright errors; remaining
_NotGiven-related errors are in adjacent categories (flavor mismatch
between openai/anthropic NotGiven and pipecat _NotGiven, settings
field types that should allow None but don't) that need different
fixes.
2026-04-23 17:39:17 -04:00
Filipi da Silva Fuchter
ef7fa07bf7 Merge pull request #4358 from pipecat-ai/filipi/fix_aiortc_sctp
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU
2026-04-23 17:49:18 -03:00
filipi87
ce1506792e Linking to the docs instead of full explanation. 2026-04-23 17:46:54 -03:00
Paul Kompfner
70f3d32734 feat(types): add assert_given for narrowing store-mode settings reads
In store-mode settings objects, _NotGiven should never appear (the
invariant enforced by validate_complete). But the declared field types
still include _NotGiven because the same class doubles as delta mode,
so every field read is typed X | None | _NotGiven and pyright flags
operations that assume X | None.

assert_given is a one-line extractor that narrows away _NotGiven and
raises loudly if the invariant is violated — preferable to scattering
is_given guards that defend against something that can't occur in
practice.

    resolved_model = assert_given(self._settings.model)  # str | None
2026-04-23 16:40:07 -04:00
Paul Kompfner
356618b448 fix(types): use is_given at call sites pyright flagged
Replace direct identity checks against NOT_GIVEN with is_given() at
sites where pyright's inability to narrow on non-singleton sentinels
was causing type errors.

- adapters/services/anthropic_adapter.py: narrow converted.system for
  _resolve_system_instruction.
- services/openai/llm.py: narrow params.service_tier using OpenAI's
  is_given.
- services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's
  is_given (aliased as openai_is_given alongside the existing
  settings.is_given import).
- services/sarvam/tts.py: narrow settings.voice using settings.is_given.
2026-04-23 16:15:07 -04:00
Paul Kompfner
1624d7a474 feat(types): add is_given TypeGuard helpers for NotGiven sentinels
Pyright can't narrow identity checks against module-level NotGiven
sentinels (they aren't typed as singletons), which leaves many
NotGiven-bearing unions stuck as unnarrowed types throughout the
codebase. Introduce is_given TypeGuard helpers so narrowing works via
isinstance under the hood.

Each helper is co-located with the NotGiven flavor it guards:

- services/settings.py: upgrade the existing is_given to a TypeGuard.
- processors/aggregators/llm_context.py: add an is_given for
  LLMContext's NotGiven. Treat LLMContext's re-exported types
  (LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as
  LLMContext's own — independent definitions that happen to coincide
  with OpenAI's as an implementation detail.
- adapters/services/anthropic_adapter.py: add is_given for anthropic's
  NotGiven.
- adapters/services/open_ai_adapter.py: add is_given for openai's
  NotGiven.
2026-04-23 15:33:43 -04:00
Paul Kompfner
092b1dcb0f fix(types): widen TLLMInvocationParams bound to Mapping[str, Any]
TypedDict types are not subtypes of dict[...] in the type system
(per PEP 589), so TypedDict-based invocation param classes could not
satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while
preserving the "string-keyed mapping" constraint.
2026-04-23 14:35:59 -04:00
Mark Backman
b90ea9bf6a Merge pull request #4352 from pipecat-ai/mb/pyright-fixes-1-per-file
More pyright fixes
2026-04-23 14:14:36 -04:00
kompfner
05c97804d5 Merge pull request #4359 from pipecat-ai/pk/changelog-4355-rename
chore: rebind Gemini Live reconnect changelog fragment to PR #4355
2026-04-23 14:10:36 -04:00
Paul Kompfner
7a8357a569 chore: rebind Gemini Live reconnect changelog fragment to PR #4355
The original contributor's PR (#4328) landed as #4355. Rename the fragment
so the rendered changelog links to the merged PR, and add the leading `- `
bullet prefix that towncrier expects.
2026-04-23 12:00:56 -04:00
filipi87
44756de15a Adding changelog for the SmallWebRTC fix. 2026-04-23 12:19:56 -03:00
filipi87
94304ec74e Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU. 2026-04-23 12:18:33 -03:00
kompfner
a3fe34f4a2 Merge pull request #4355 from pipecat-ai/pk/gemini-live-context-reseed-on-reconnect
Re-seed Gemini Live context on reconnect without session resumption
2026-04-23 11:00:22 -04:00
Sathwika Reddy Geereddy
21f6c2afa5 Update NVIDIA STT services for Nemotron Speech defaults and config parity (#4269)
* Update NVIDIA STT services for Nemotron Speech defaults and config parity

* Add changelog entry for PR #4269

* initialize boosted LM settings defaults in streaming STT

* Align NVIDIA STT language handling with other STT services

* add finalised flag to Nvidia stt final transcripts, remove processing latency logs

* Changing interim transcription logging to tracing.

---------

Co-authored-by: sathwika <geereddysath@nvidia.com>
Co-authored-by: filipi87 <filipi87@gmail.com>
2026-04-23 09:01:27 -04:00
Filipi da Silva Fuchter
4d14251f4a Merge pull request #4354 from pipecat-ai/filipi/includes_inter_frame_spaces
feat(tts): add includes_inter_frame_spaces flag to word-timestamp API - follow-up
2026-04-23 08:49:26 -03:00
Paul Kompfner
1421c4ba22 fix: handle Gemini Live 2.5 quirks when re-seeding context on reconnect
Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5,
which has stricter seed requirements than 3.x and a documented audio-input /
history-recall limitation. Both initial connection and reconnect now share a
single code path (`_create_initial_response(for_reconnect=...)`), with four
well-documented cases.

On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so
the model produces a recap-style response immediately instead of briefly
acting "forgetful" on the user's next utterance — the latter being
especially jarring mid-conversation. When a 2.5 seed doesn't already end
with a user turn (e.g. the bot had finished speaking before the disconnect),
a blank user turn is appended to satisfy the server's seed-shape
requirement. Gemini 3.x needs neither workaround.
2026-04-22 15:58:54 -04:00
filipi87
6b1d8d9fa5 Fixing merge conflicts. 2026-04-22 15:22:32 -03:00
filipi87
ac810e57ed Merge branch 'main' into filipi/includes_inter_frame_spaces
# Conflicts:
#	uv.lock
2026-04-22 15:22:06 -03:00
filipi87
bba7ca80e3 Bumping to small-webrtc-prebuilt 2.5.0 to fix karaoke highlighting. 2026-04-22 15:20:37 -03:00
filipi87
79250f1fe0 Making includes_inter_frame_spaces optional for word-timestamp. 2026-04-22 14:20:30 -03:00
Mark Backman
4f6e76e6fd Add changelog entries for #4352 2026-04-22 12:23:33 -04:00
Mark Backman
b0962861c8 Acknowledge Tkinter's GC-reference idiom with a scoped type ignore
Tkinter's `Label` only stores `PhotoImage` references at the C level, so
Python GC eats them unless something on the Python side keeps a
reference. The canonical fix is to stash the reference on the widget
itself: `label.image = photo`. Tkinter widgets are plain Python objects,
so the assignment works at runtime, but the stub declares no `image`
attribute (correctly — there isn't one; we're adding it).

Narrow the suppression to `# type: ignore[attr-defined]` on the one
line. The existing comment above the assignment already documents why.
2026-04-22 12:19:16 -04:00