pipecat

Author	SHA1	Message	Date
Paul Kompfner	005fe33b25	Update docs URLs in README to reflect new docs site structure and avoid redirects	2026-04-27 10:22:49 -04:00
Paul Kompfner	24154474c9	Add OpenAI Responses to the README's list of LLM services	2026-04-27 10:19:13 -04:00
kompfner	86effc4d10	Merge pull request #4015 from prettyprettyprettygood/feat/nova-sonic-session-continuation feat(nova-sonic): add proactive session continuation for conversation…	2026-04-27 09:36:48 -04:00
Mark Backman	58e50882d8	Merge pull request #4374 from pipecat-ai/mb/fix-daily-runner-room-props Expire runner-created Daily rooms after 4h	2026-04-27 09:07:31 -04:00
Mark Backman	ef183d0c96	Add changelog for #4374	2026-04-27 09:00:17 -04:00
Mark Backman	f078df7805	runner: expire Daily rooms after 4h to mirror Pipecat Cloud session limit Runner-created Daily rooms previously had no expiration when callers posted partial `dailyRoomProperties` (e.g. `{"start_video_off": true}`). The model-default `exp=None` and `eject_at_room_exp=False` meant Daily's cron never cleaned them up, so rooms accumulated indefinitely. Encode the policy in the runner: define `PIPECAT_CLOUD_ROOM_EXP_HOURS=4.0`, inject `exp` and `eject_at_room_exp=True` into user-supplied properties via `setdefault` (so explicit caller values still win), and pass `room_exp_duration` to all four `configure()` call sites.	2026-04-27 09:00:17 -04:00
Mark Backman	815cd44c2a	Merge pull request #4372 from pipecat-ai/mb/relax-frames-proto-5x Relax protobuf pin to support both 5.x and 6.x runtimes	2026-04-27 08:58:23 -04:00
Garegin Harutyunyan	e5941926be	Krisp tt demo tool (#4335 ) * VIVA SDK TT v3 support * Format fix. * Renamed the API naming, removed '3' from the name. * Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support. * TT demo tool * Some improvements for demo scripts, audio recordin, etc. * Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions. * Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio. * Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds. * Add audio resampling functionality and update demo scripts for improved audio processing - Introduced `resample_audio` function to handle audio resampling with linear interpolation. - Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate. - Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD. - Adjusted imports in demo scripts to include the new resampling function. - Enhanced error handling for sample rate discrepancies in audio recording. * Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic - Added support for selecting between "silero" and "krisp" VAD engines in the demo script. - Introduced a new create_vad function to configure VAD analyzers based on the selected type. - Updated audio processing logic to handle VAD type-specific resampling and state management. - Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy. * Refactor audio processing scripts for improved readability and consistency - Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`. - Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability. - Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency. - Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity. - Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import. * Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity - Simplified the argument formatting in the _handle_vad_started method for improved readability. - Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing. - Enhanced test cases to verify that the process method is called appropriately under different conditions. * more format fixes. * removed demo scripts. * reverted wrongly removed file. * Corrected the IP integration logic. * style fix. * Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy - Removed the unused _vad_flag attribute to streamline state tracking. - Updated the reset method to clear the audio buffer instead of resetting the vad_flag. - Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic. - Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling. * FIxed formatting --------- Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>	2026-04-27 08:14:00 -04:00
Mark Backman	6266c026a6	Merge pull request #4362 from ai-coustics/ai-coustics/aic-sdk-py-v2.2.0-update Update aic-sdk to v2.2.0	2026-04-25 06:51:41 -04:00
Gökmen Görgen	e25dccfc6b	update `aic-sdk` to `~=2.2.0` and rename `AICOUSTICS_LICENSE_KEY` to `AIC_LICENSE_KEY`.	2026-04-25 10:13:06 +02:00
Gökmen Görgen	3bbfc42854	remove adaptive audio enhancement example and support for runtime enhancement level updates in `AICFilter`.	2026-04-25 10:05:47 +02:00
Gökmen Görgen	3b2127f912	rename environment variables and references from `AICOUSTICS` to `AIC`.	2026-04-25 09:51:23 +02:00
Gökmen Görgen	ea12b10742	rename `mcp-aic-adaptive.py` to `mcp-aicoustics-adaptive.py`.	2026-04-25 09:51:23 +02:00
Gökmen Görgen	a2fbed86cf	add adaptive audio enhancement example and support for runtime enhancement level updates in `AICFilter`.	2026-04-25 09:51:23 +02:00
Gökmen Görgen	f75f361629	bump `aic-sdk` to 2.2.0 and update `AICFilter` with `model_id` and `enhancement_level` changes.	2026-04-25 09:51:23 +02:00
Mark Backman	4c153e5d3c	Add changelog for #4372	2026-04-24 21:20:46 -04:00
Mark Backman	4088992d97	Relax protobuf pin to support both 5.x and 6.x runtimes Pipecat 1.0.8 hard-required protobuf 6.x via the base `protobuf>=6.31.1,<7` pin, blocking users whose dependency graph already constrains protobuf to the 5.x line. The original bump (PR #4136) was only needed because `nvidia-riva-client>=2.25.1` ships gencode compiled with protoc 6.31.1. Changes: - Widen base pin to `protobuf>=5.29.6,<7`. - Regenerate `frames_pb2.py` with `grpcio-tools~=1.67.1` (protoc 5.x). Per Google's cross-version runtime guarantee, 5.x gencode runs on both 5.x and 6.x runtimes, so this single artifact serves all users. - Loosen the dev pin `grpcio-tools` to `>=1.67.1,<2` so contributors can install `pipecat[dev,nvidia]` without resolver conflict. Comment in `frames.proto` documents the 1.67.x requirement for regeneration. - Add an explicit `protobuf>=6.31.1,<7` to the `nvidia` extra. This compensates for nvidia-riva-client's missing `protobuf` install requirement (upstream packaging gap, see https://github.com/nvidia-riva/python-clients/issues/172). When that issue is resolved, the explicit protobuf entry in the `nvidia` extra can be removed. Verified: pipecat imports cleanly on both protobuf 5.29.6 and 6.33.6; `tests/test_protobuf_serializer.py` passes; `import riva.client` succeeds when `pipecat[nvidia]` is installed.	2026-04-24 21:15:32 -04:00
Osman Ipek	f1b16a672a	feat(nova-sonic): add proactive session continuation for conversations >8min Nova Sonic sessions have an AWS-imposed ~8-minute time limit. This adds transparent session continuation that rotates sessions in the background before the limit is reached, preserving conversation context with no user-perceptible interruption. Implementation follows the AWS reference architecture: - Monitor loop detects when session age exceeds threshold - On assistant AUDIO contentStart: start buffering user audio, create next session (sessionStart + promptStart + system instruction) - Track SPECULATIVE/FINAL text counts as completion signal - On completion signal: send conversation history + audioInputStart + buffered audio to next session, then promote immediately - Close old session in background (non-blocking) - Dead session detection: recreate next session if idle >30s Key design decisions: - Session continuation enabled by default (fundamental for long conversations) - Conversation history tracked in real-time via _sc_conversation_history (independent of pipeline context aggregator which updates asynchronously) - Completion signal check in _handle_content_end_event (after history update) to ensure latest text is included in handoff - Rolling audio buffer (default 3s) captures user audio during transition - transition_threshold_seconds capped at 420s (7min) for safety margin - Unified event methods (_send_text_event, _send_client_event, etc.) accept optional stream/prompt_name params, eliminating duplicate SC methods Also adds: - SessionContinuationParams config (enabled, threshold, buffer, timeout)	2026-04-24 14:55:55 -07:00
Filipi da Silva Fuchter	38a02271c5	Merge pull request #4368 from pipecat-ai/filipi/stt_service Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.	2026-04-24 14:31:36 -03:00
filipi87	2ce203aeb8	Renaming the method to _maybe_reconnect_on_user_stopped_speaking.	2026-04-24 13:08:32 -03:00
filipi87	b30df95f13	Fix issue where STTService unintentionally created a method with the same name as SegmentedSTTService.	2026-04-24 13:00:38 -03:00
kompfner	6be8deee2a	Merge pull request #4361 from pipecat-ai/pk/pyright-fixes Some pyright fixes	2026-04-24 11:58:28 -04:00
Paul Kompfner	c113cacd59	refactor(types): name the LLMContext/OpenAI boundary with explicit cast helpers LLMContext's NotGiven, LLMContextToolChoice, and LLMStandardMessage are currently aliased to their OpenAI equivalents, so passing values between the two sides type-checks implicitly. That works today but obscures the fact that these are meant to be conceptually distinct — if LLMContext ever diverges from OpenAI's types, every implicit crossing would silently break. Introduce two module-private cast helpers in open_ai_adapter.py: - _openai_from_llm_context_tool_choice(tool_choice) - _openai_from_llm_standard_message(message) Both are typed no-ops today (implemented with typing.cast) but each carries a docstring explaining why the cast is present, and every boundary crossing now routes through a named function. Future readers (and future greps) can find the crossings; a later divergence becomes a mechanical find-and-update rather than hunting through adapter code. No behavior change, no pyright error delta.	2026-04-24 10:10:03 -04:00
Paul Kompfner	d0495eeef6	fix(types): narrow voice in SpeechmaticsTTSSettings to disallow None After widening TTSSettings.voice to str \| None \| _NotGiven (so other TTS services can opt into None as a valid "no voice" state), pyright flagged Speechmatics' URL builder receiving str \| None where it required str. Speechmatics has no "no voice" mode (the URL path includes the voice name), so override the inherited field in SpeechmaticsTTSSettings to str \| _NotGiven. The call site stays as a plain assert_given(...) without an extra None check.	2026-04-23 21:08:47 -04:00
Paul Kompfner	c3eb69165c	fix(types): accept SDK NotGiven in LLM Settings fields used for passthrough Three LLM services initialize certain Settings fields with the SDK's NOT_GIVEN (openai.NOT_GIVEN or anthropic.NOT_GIVEN) so the value flows unmodified into SDK API calls. The inherited field types from LLMSettings only admit pipecat's _NotGiven, so pyright flagged each constructor call as a flavor mismatch. Widen the field types in each service-specific Settings subclass so they accept both pipecat's _NotGiven (for delta-mode defaults) and the corresponding SDK NotGiven (for store-mode passthrough): - OpenAILLMSettings: frequency_penalty, presence_penalty, seed, temperature, top_p, max_tokens, max_completion_tokens. - OpenAIResponsesLLMSettings: temperature, top_p, max_completion_tokens. - AnthropicLLMSettings: temperature, top_k, top_p, thinking. Every overridden field is genuinely read from self._settings and passed directly to the SDK, so none of the overrides are vestigial. Clears 21 pyright errors and restores test_service_settings_complete parity with the pre-NOT_GIVEN-swap state.	2026-04-23 18:32:46 -04:00
Paul Kompfner	0302f6d05c	chore(pyright): drop newly-clean files from ignore list asyncai/tts and google/vertex/llm are now clean after the missing-None sweep (both benefited from the TTSSettings.voice / LLMSettings cascades). - src/pipecat/services/asyncai/tts.py - src/pipecat/services/google/vertex/llm.py	2026-04-23 18:18:00 -04:00
Paul Kompfner	b9ff333654	fix(types): admit None on settings fields that accept it as a default Service-specific Settings subclasses declared fields as T \| _NotGiven (no None), but the services routinely pass None to those fields during init to mean "don't override — use the vendor's default". The field type just didn't reflect that a None value is valid, so pyright flagged every None at the call sites. Change the declarations to T \| None \| _NotGiven, matching the pattern already used by ServiceSettings.model and TTSSettings.language. No constructor-call changes; the default_factory stays NOT_GIVEN. Fields touched across 11 files: - services/settings.py: TTSSettings.voice (base class; covers asyncai, cartesia, elevenlabs, fish, hume, kokoro, lmnt, mistral, neuphonic, piper, resembleai, rime, xtts TTS services). - services/aws/llm.py: latency. - services/aws/tts.py: engine, pitch, rate, volume, lexicon_names. - services/azure/tts.py: emphasis, pitch, rate, role, style, style_degree, volume. - services/google/gemini_live/llm.py: vad. - services/google/llm.py: thinking. - services/google/stt.py: language_codes. - services/inworld/tts.py: speaking_rate, temperature. - services/openai/tts.py: instructions, speed. - services/speechmatics/stt.py: 13 fields (domain, operating_point, max_delay, end_of_utterance_, punctuation_overrides, _partials, split_sentences, enable_diarization, speaker_*, max_speakers, prefer_current_speaker, extra_params). - services/ultravox/llm.py: output_medium. Clears 94 pyright errors (1035 -> 941).	2026-04-23 18:18:00 -04:00
Paul Kompfner	92610944af	chore(pyright): drop newly-clean files from ignore list Three files no longer have pyright errors after the is_given / assert_given sweep — remove them from the ignore list (which serves as a live todo of files with remaining type errors). - src/pipecat/processors/gstreamer/pipeline_source.py - src/pipecat/services/camb/tts.py - src/pipecat/services/speechmatics/tts.py	2026-04-23 17:44:17 -04:00
Paul Kompfner	6a337f1bc6	fix(types): assert_given at store-mode settings read sites Apply assert_given across service modules to narrow reads from store-mode settings fields (self._settings.X, default_settings.X), where _NotGiven is declared in the field type but should never appear at runtime (enforced by validate_complete()). Two idioms used: - Inline wrap for single uses: func(assert_given(self._settings.enable_prompt_caching), ...) - Extract-and-reuse when the same value is used multiple times: thinking = assert_given(self._settings.thinking) if thinking: params["thinking"] = thinking.model_dump(...) 43 service files touched. Cleared ~172 pyright errors; remaining _NotGiven-related errors are in adjacent categories (flavor mismatch between openai/anthropic NotGiven and pipecat _NotGiven, settings field types that should allow None but don't) that need different fixes.	2026-04-23 17:39:17 -04:00
Filipi da Silva Fuchter	ef7fa07bf7	Merge pull request #4358 from pipecat-ai/filipi/fix_aiortc_sctp Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU	2026-04-23 17:49:18 -03:00
filipi87	ce1506792e	Linking to the docs instead of full explanation.	2026-04-23 17:46:54 -03:00
Paul Kompfner	70f3d32734	feat(types): add assert_given for narrowing store-mode settings reads In store-mode settings objects, _NotGiven should never appear (the invariant enforced by validate_complete). But the declared field types still include _NotGiven because the same class doubles as delta mode, so every field read is typed X \| None \| _NotGiven and pyright flags operations that assume X \| None. assert_given is a one-line extractor that narrows away _NotGiven and raises loudly if the invariant is violated — preferable to scattering is_given guards that defend against something that can't occur in practice. resolved_model = assert_given(self._settings.model) # str \| None	2026-04-23 16:40:07 -04:00
Paul Kompfner	356618b448	fix(types): use is_given at call sites pyright flagged Replace direct identity checks against NOT_GIVEN with is_given() at sites where pyright's inability to narrow on non-singleton sentinels was causing type errors. - adapters/services/anthropic_adapter.py: narrow converted.system for _resolve_system_instruction. - services/openai/llm.py: narrow params.service_tier using OpenAI's is_given. - services/sarvam/llm.py: narrow tools / tool_choice using OpenAI's is_given (aliased as openai_is_given alongside the existing settings.is_given import). - services/sarvam/tts.py: narrow settings.voice using settings.is_given.	2026-04-23 16:15:07 -04:00
Paul Kompfner	1624d7a474	feat(types): add is_given TypeGuard helpers for NotGiven sentinels Pyright can't narrow identity checks against module-level NotGiven sentinels (they aren't typed as singletons), which leaves many NotGiven-bearing unions stuck as unnarrowed types throughout the codebase. Introduce is_given TypeGuard helpers so narrowing works via isinstance under the hood. Each helper is co-located with the NotGiven flavor it guards: - services/settings.py: upgrade the existing is_given to a TypeGuard. - processors/aggregators/llm_context.py: add an is_given for LLMContext's NotGiven. Treat LLMContext's re-exported types (LLMStandardMessage, LLMContextToolChoice, NOT_GIVEN, NotGiven) as LLMContext's own — independent definitions that happen to coincide with OpenAI's as an implementation detail. - adapters/services/anthropic_adapter.py: add is_given for anthropic's NotGiven. - adapters/services/open_ai_adapter.py: add is_given for openai's NotGiven.	2026-04-23 15:33:43 -04:00
Paul Kompfner	092b1dcb0f	fix(types): widen TLLMInvocationParams bound to Mapping[str, Any] TypedDict types are not subtypes of dict[...] in the type system (per PEP 589), so TypedDict-based invocation param classes could not satisfy the TypeVar bound. Mapping[str, Any] accepts TypedDicts while preserving the "string-keyed mapping" constraint.	2026-04-23 14:35:59 -04:00
Mark Backman	b90ea9bf6a	Merge pull request #4352 from pipecat-ai/mb/pyright-fixes-1-per-file More pyright fixes	2026-04-23 14:14:36 -04:00
kompfner	05c97804d5	Merge pull request #4359 from pipecat-ai/pk/changelog-4355-rename chore: rebind Gemini Live reconnect changelog fragment to PR #4355	2026-04-23 14:10:36 -04:00
Paul Kompfner	7a8357a569	chore: rebind Gemini Live reconnect changelog fragment to PR #4355 The original contributor's PR (#4328) landed as #4355. Rename the fragment so the rendered changelog links to the merged PR, and add the leading `- ` bullet prefix that towncrier expects.	2026-04-23 12:00:56 -04:00
filipi87	44756de15a	Adding changelog for the SmallWebRTC fix.	2026-04-23 12:19:56 -03:00
filipi87	94304ec74e	Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU.	2026-04-23 12:18:33 -03:00
kompfner	a3fe34f4a2	Merge pull request #4355 from pipecat-ai/pk/gemini-live-context-reseed-on-reconnect Re-seed Gemini Live context on reconnect without session resumption	2026-04-23 11:00:22 -04:00
Sathwika Reddy Geereddy	21f6c2afa5	Update NVIDIA STT services for Nemotron Speech defaults and config parity (#4269 ) * Update NVIDIA STT services for Nemotron Speech defaults and config parity * Add changelog entry for PR #4269 * initialize boosted LM settings defaults in streaming STT * Align NVIDIA STT language handling with other STT services * add finalised flag to Nvidia stt final transcripts, remove processing latency logs * Changing interim transcription logging to tracing. --------- Co-authored-by: sathwika <geereddysath@nvidia.com> Co-authored-by: filipi87 <filipi87@gmail.com>	2026-04-23 09:01:27 -04:00
Filipi da Silva Fuchter	4d14251f4a	Merge pull request #4354 from pipecat-ai/filipi/includes_inter_frame_spaces feat(tts): add includes_inter_frame_spaces flag to word-timestamp API - follow-up	2026-04-23 08:49:26 -03:00
Paul Kompfner	1421c4ba22	fix: handle Gemini Live 2.5 quirks when re-seeding context on reconnect Extends the reconnect re-seeding fix to work cleanly on Gemini Live 2.5, which has stricter seed requirements than 3.x and a documented audio-input / history-recall limitation. Both initial connection and reconnect now share a single code path (`_create_initial_response(for_reconnect=...)`), with four well-documented cases. On Gemini 2.5 reconnect, `turn_complete=True` is now forced on the seed so the model produces a recap-style response immediately instead of briefly acting "forgetful" on the user's next utterance — the latter being especially jarring mid-conversation. When a 2.5 seed doesn't already end with a user turn (e.g. the bot had finished speaking before the disconnect), a blank user turn is appended to satisfy the server's seed-shape requirement. Gemini 3.x needs neither workaround.	2026-04-22 15:58:54 -04:00
filipi87	6b1d8d9fa5	Fixing merge conflicts.	2026-04-22 15:22:32 -03:00
filipi87	ac810e57ed	Merge branch 'main' into filipi/includes_inter_frame_spaces # Conflicts: # uv.lock	2026-04-22 15:22:06 -03:00
filipi87	bba7ca80e3	Bumping to small-webrtc-prebuilt 2.5.0 to fix karaoke highlighting.	2026-04-22 15:20:37 -03:00
filipi87	79250f1fe0	Making includes_inter_frame_spaces optional for word-timestamp.	2026-04-22 14:20:30 -03:00
Mark Backman	4f6e76e6fd	Add changelog entries for #4352	2026-04-22 12:23:33 -04:00
Mark Backman	b0962861c8	Acknowledge Tkinter's GC-reference idiom with a scoped type ignore Tkinter's `Label` only stores `PhotoImage` references at the C level, so Python GC eats them unless something on the Python side keeps a reference. The canonical fix is to stash the reference on the widget itself: `label.image = photo`. Tkinter widgets are plain Python objects, so the assignment works at runtime, but the stub declares no `image` attribute (correctly — there isn't one; we're adding it). Narrow the suppression to `# type: ignore[attr-defined]` on the one line. The existing comment above the assignment already documents why.	2026-04-22 12:19:16 -04:00

1 2 3 4 5 ...

9191 Commits