pipecat

Author	SHA1	Message	Date
Mark Backman	83190d38e9	Merge pull request #4414 from pipecat-ai/mb/fix-ttsspeakframe-assistant-turn-stopped	2026-05-04 18:12:33 -04:00
Mark Backman	7519c26ac5	Merge pull request #4417 from pipecat-ai/mb/resolve-runner-filepath	2026-05-04 18:09:34 -04:00
Mark Backman	89f10dd9a1	test: drop webrtc-dependent test, remove webrtc extra from CI	2026-05-04 16:42:05 -04:00
Mark Backman	e780f759d0	fix: validate download path containment in runner Resolve and contain the user-supplied filename before serving it from the runner's /files endpoint. Also raise a 404 (instead of returning None) when the downloads folder is unset, and use the resolved basename for Content-Disposition.	2026-05-04 16:20:27 -04:00
Mark Backman	90e6b51acd	Fix ElevenLabs alignment chunk spacing	2026-05-04 15:15:37 -04:00
Mark Backman	f1a3ee97de	fix: surface TTSSpeakFrame greetings in on_assistant_turn_stopped Two issues were causing TTSSpeakFrame(append_to_context=True) greetings to silently lose their trailing words and never fire on_assistant_turn_stopped: - LLMAssistantPushAggregationFrame was emitted without a PTS, so the transport routed it through the audio (sync) queue while word-level TTSTextFrames travel through the clock queue. The aggregation could reach the assistant aggregator before the final words, leaving them orphaned in the buffer. Stamp the frame with `_word_last_pts + 1` when there are word timestamps so it can't overtake them. - The aggregator's LLMAssistantPushAggregationFrame handler called push_aggregation() directly, bypassing _trigger_assistant_turn_stopped. For TTS-only flows there is no LLMFullResponseStartFrame, so the turn start timestamp was never set and on_assistant_turn_stopped never fired. Open a turn (if needed) and trigger stopped from the handler. Fixes #4264.	2026-05-04 10:41:22 -04:00
Paul Kompfner	c4f5f1ebbb	test, refactor: follow-ups to LLMService generic refactor Two follow-ups now that LLMService is generic over its adapter: - Add an explicit backward-compat test verifying that an LLMService subclass with no generic parameter (the third-party-provider pattern) instantiates and returns a usable adapter. The existing MockLLMService (declared without brackets) already exercised this implicitly, but it's worth a named assertion. - Drop the now-redundant `params: SomeLLMInvocationParams = ...` variable annotations on `adapter.get_llm_invocation_params()` results. Since `get_llm_adapter()` now returns the precise adapter type, and `BaseLLMAdapter` is generic in its invocation-params type, the call already infers the right TypedDict.	2026-05-01 09:36:14 -04:00
kompfner	6d66bbceeb	Merge pull request #4395 from pipecat-ai/pk/app-resources-api-updates Broaden tool_resources to app_resources	2026-04-30 21:19:05 -04:00
Paul Kompfner	1b5c4cfa2a	feat: broaden tool_resources to app_resources Broaden `tool_resources` to `app_resources` for easy access not just in tool handlers but in other places like custom `FrameProcessor`s. Involves 3 changes: - A rename: `tool_resources` -> `app_resources` - A new property on `PipelineTask`: `app_resources` - A new property on `FrameProcessor`: `pipeline_task` Usage in tool handler: async def get_weather(params: FunctionCallParams): resources = cast(MyAppResources, params.app_resources) ... Usage in custom `FrameProcessor`: class MyProcessor(FrameProcessor): async def process_frame(self, frame, direction): await super().process_frame(frame, direction) if self.pipeline_task is not None: resources = cast(MyAppResources, self.pipeline_task.app_resources) ... The previous `tool_resources` aliases (on `PipelineTask`, `FunctionCallParams`, and `FrameProcessorSetup`) keep working but are deprecated as of 1.2.0 and emit `DeprecationWarning`s.	2026-04-30 16:16:17 -04:00
Mark Backman	351105a975	test(krisp): scope importlib.metadata.version mock to imports only The four krisp test files installed a process-wide mock of importlib.metadata.version with `patch(...).start()` at module level and never called .stop(). Once any of these files was collected, the mock leaked across the rest of the test session, returning '0.0.0-dev' for every version check. This corrupted unrelated tests that triggered transformers' import-time dependency check (e.g. lazy imports of LocalSmartTurnAnalyzerV3) — transformers saw tqdm=='0.0.0-dev' and refused to load. Wrap the pipecat imports in `with patch(...)` so the mock is active during import (where pipecat's krisp version check needs it) and torn down before any tests run.	2026-04-30 14:16:54 -04:00
kompfner	ce1311f6ba	Merge pull request #4301 from bnovik0v/fix-4300-missing-tool-lifecycle Fail missing tool calls cleanly	2026-04-27 11:54:43 -04:00
borislav	8869e25142	fix: compare bound method by equality, not identity Bound methods are created fresh on each attribute access, so 'self._missing_function_call_handler is self._missing_function_call_handler' is always False. Using 'is' meant the placeholder branch never fired and both warnings logged when a function was missing at queue time. Switch to == so equality compares the underlying function and instance. Strengthen the missing-at-queue-time test to assert the second warning does not fire.	2026-04-27 17:34:31 +02:00
borislav	822392b0d4	fix: re-resolve registry item at execution time Address review feedback: a function may be unregistered between when run_function_calls queues it and when _run_function_call executes it. Restore the live lookup, falling back to the missing-function handler when the entry is gone, so the call still terminates with a normal tool result. Factor the missing-handler item construction into a helper since it's now built in two places.	2026-04-27 17:22:30 +02:00
kompfner	bc29bdb95e	Merge pull request #4371 from Stoic-Angel/feat-global-context Add a global context for tool calls: tool_resources	2026-04-27 10:55:03 -04:00
Garegin Harutyunyan	e5941926be	Krisp tt demo tool (#4335 ) * VIVA SDK TT v3 support * Format fix. * Renamed the API naming, removed '3' from the name. * Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support. * TT demo tool * Some improvements for demo scripts, audio recordin, etc. * Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions. * Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio. * Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds. * Add audio resampling functionality and update demo scripts for improved audio processing - Introduced `resample_audio` function to handle audio resampling with linear interpolation. - Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate. - Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD. - Adjusted imports in demo scripts to include the new resampling function. - Enhanced error handling for sample rate discrepancies in audio recording. * Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic - Added support for selecting between "silero" and "krisp" VAD engines in the demo script. - Introduced a new create_vad function to configure VAD analyzers based on the selected type. - Updated audio processing logic to handle VAD type-specific resampling and state management. - Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy. * Refactor audio processing scripts for improved readability and consistency - Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`. - Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability. - Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency. - Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity. - Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import. * Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity - Simplified the argument formatting in the _handle_vad_started method for improved readability. - Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing. - Enhanced test cases to verify that the process method is called appropriately under different conditions. * more format fixes. * removed demo scripts. * reverted wrongly removed file. * Corrected the IP integration logic. * style fix. * Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy - Removed the unused _vad_flag attribute to streamline state tracking. - Updated the reset method to clear the audio buffer instead of resetting the vad_flag. - Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic. - Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling. * FIxed formatting --------- Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>	2026-04-27 08:14:00 -04:00
Aayush Jain	108e32eb72	Add a global context for tool calls - tool_resources, as a parameter to PipelineTask and FrameProcessorSetup	2026-04-25 02:12:40 +05:30
filipi87	ac810e57ed	Merge branch 'main' into filipi/includes_inter_frame_spaces # Conflicts: # uv.lock	2026-04-22 15:22:06 -03:00
Mark Backman	3f3d3c9203	Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy Split user-turn stop timeout into independent speech and STT timers	2026-04-22 10:23:03 -04:00
Mark Backman	d8f5c0be71	Add XAITTSService for xAI streaming WebSocket TTS Adds XAITTSService in the existing xai/tts.py module, alongside the existing XAIHttpTTSService. Connects to xAI's streaming endpoint at wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta chunks down on the same connection so audio starts flowing before the full utterance is synthesized. Extends InterruptibleTTSService since xAI's protocol is strictly sequential per connection and exposes neither a cancel verb nor a context ID — the only way to stop an in-flight utterance is to tear down the WebSocket, which is exactly what InterruptibleTTSService does on interruption when the bot is speaking. Voice, language, codec, and sample_rate are passed as query-string params at connect time; runtime setting changes reconnect the socket. Defaults to raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream. Splits the existing example into voice-xai.py (WebSocket) and voice-xai-http.py (batch HTTP) so each variant has its own entry point. Promotes the xai extra to depend on pipecat-ai[websockets-base] since the new service imports the websockets library.	2026-04-21 15:48:26 -04:00
Mark Backman	b59c4775da	Split user-turn stop timeout into independent speech and STT timers SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into max(stt_timeout, user_speech_timeout), which over-waited for finalizing STT services and could also end the turn early in a legacy code path. Run them as independent timers instead: - user_speech_timeout: policy floor, always runs to completion. - stt_timeout: latency safety net, short-circuited by a finalized transcript since STT has signaled it has nothing more to send. The no-VAD fallback now waits only user_speech_timeout rather than max(stt_timeout, user_speech_timeout); stt_timeout is defined relative to VAD stop and has no meaning when no VAD event occurred. This shortens the fallback wait for users who set stt_timeout greater than user_speech_timeout.	2026-04-20 11:55:09 -04:00
Ian Lee	b435ddfa44	feat(tts): add includes_inter_frame_spaces flag to word-timestamp API Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and punctuation are already embedded in the token text. When downstream consumers join these tokens with an extra space they produce "hello , world" instead of "hello, world". Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to `add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through `_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`. Defaults to `False` — no behaviour change for existing services. `InworldTTSService` passes `includes_inter_frame_spaces=True` and stops pre-processing tokens in `_calculate_word_times`, returning them verbatim. Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket delivery paths: verbatim text preservation, PTS ordering, text-before-audio ordering, and the Inworld punctuation-token scenario. Made-with: Cursor	2026-04-18 12:03:32 -07:00
Garegin Harutyunyan	4c19f5584c	VIVA SDK TT v3 support (#4252 ) * VIVA SDK TT v3 support * Format fix. * Renamed the API naming, removed '3' from the name. * Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support. * Typo fix in voice-krisp-viva example to use KrispVivaFilter class * style fix. * test run error fixes. * some test related changes. * Fixed tests * Stule fixes.	2026-04-17 07:53:41 -04:00
Aleix Conchillo Flaqué	b3bb6fdaa5	Modernize Python typing across the codebase Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311): - Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type` with their built-in equivalents (`list`, `dict`, `tuple`, etc.) - Replace `typing.Optional[X]` with `X \| None` - Replace `typing.Union[X, Y]` with `X \| Y` - Move `Mapping`, `Sequence`, `Callable`, `Awaitable`, `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`, `AsyncGenerator` imports from `typing` to `collections.abc` - Remove now-unused `typing` imports - Add `from __future__ import annotations` to 5 files that use forward-reference strings in `X \| "Y"` annotations	2026-04-16 09:28:23 -07:00
borislav	86e726107f	fix: fail missing tool calls cleanly	2026-04-14 22:40:45 +02:00
Aleix Conchillo Flaqué	958d90819f	Merge pull request #4294 from pipecat-ai/ac/fix-assistant-turn-stopped-event Fix on_assistant_turn_stopped not firing for tool-call-only responses	2026-04-14 10:09:55 -07:00
Aleix Conchillo Flaqué	698c2ba92e	Fix on_assistant_turn_stopped not firing for empty LLM responses When the LLM returned zero text tokens (e.g. it was interrupted before producing tokens or about to push tokens), push_aggregation() returned an empty string and on_assistant_turn_stopped was never emitted. This left consumers waiting for an event that would never arrive. Now on_assistant_turn_stopped always fires, with an empty content string when the LLM produced no text tokens. Fixes #4292	2026-04-14 10:07:19 -07:00
Mark Backman	989fb4deaa	Fix context summarization failing with mid-conversation system messages Only treat messages[0] as the initial system prompt when determining the summarization range. Previously, the code scanned the entire context for the first system-role message, which caused failures when the only system message was a mid-conversation injection (e.g. "The user has been quiet"). In that case summary_start exceeded summary_end, producing an empty range and "No messages to summarize" errors. Fixes #4286	2026-04-14 11:48:50 -04:00
Mark Backman	d1f7af0330	Merge pull request #4283 from pipecat-ai/mb/user-stop-transcript-improvements	2026-04-13 19:27:05 -04:00
Mark Backman	804e3ea9ec	Trigger turn stop immediately when transcript arrives after p99 timeout When the STT p99 timeout fires without a transcript, the turn stop strategy previously did nothing — falling through to the 5-second user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the timeout has elapsed so that a late transcript triggers the turn stop immediately instead of waiting for the fallback.	2026-04-13 18:11:32 -04:00
Aleix Conchillo Flaqué	7dc763d512	Merge pull request #4272 from pipecat-ai/pk/llm-context-get-messages-elide-large-values Add truncate_large_values to LLMContext.get_messages()	2026-04-13 15:04:41 -07:00
Paul Kompfner	1a02b5d61a	Rename elide_large_values to truncate_large_values	2026-04-11 14:29:05 -04:00
Aleix Conchillo Flaqué	f91a113de7	tests: yield in wake phrase strategy setup to let tasks start The strategy schedules background tasks during setup. Fast-running tests could observe state before those tasks had a chance to run; yielding once via asyncio.sleep(0) ensures they do.	2026-04-10 17:37:50 -07:00
Aleix Conchillo Flaqué	e553bb010f	tests: migrate LLM tests to Settings-based constructor API Replace the old `model=` / `params=InputParams(...)` style with the new `settings=<Service>.Settings(...)` form across LLM service tests.	2026-04-10 17:37:49 -07:00
Paul Kompfner	812cdc6822	Add elide_large_values to LLMContext.get_messages() Enable callers to get a compact version of context messages suitable for serialization, logging, and debugging tools. For standard messages, known binary data (base64 images, audio) is fully elided. For LLM-specific messages, long string values are recursively truncated. Adapter get_messages_for_logging() methods now use this.	2026-04-10 16:35:36 -04:00
Aleix Conchillo Flaqué	dcd21e7ff4	Rework audio idle detection with timestamp-based adaptive sleep Replaces the per-frame asyncio.Event signaling with a monotonic timestamp updated on each audio frame. The handler sleeps until the next deadline (last_audio_time + timeout), recomputing on each wake-up to account for audio arriving during sleep. This avoids waking the handler on every audio frame (~50/s at 20ms chunks), and guarantees detection latency is bounded by timeout rather than 2 * timeout. Also renames audio_starvation_timeout to audio_idle_timeout and associated identifiers for consistency with existing pipecat naming (user_idle_timeout, etc.).	2026-04-10 10:35:18 -07:00
Om Chauhan	cb2c1868b0	fix VAD stuck in SPEAKING state when audio stops mid-speech	2026-04-10 09:54:48 -07:00
kompfner	d07eebff20	Merge pull request #4248 from omChauhanDev/add-openai-custom-tools-support Add custom_tools support for OpenAI adapters	2026-04-10 10:27:28 -04:00
Paul Kompfner	fc3307bc63	Use OpenAI SDK types for tool params in adapters and tests These are TypedDicts (plain dicts at runtime), so no behavioral change — just more descriptive type hints for readers. Use ToolParam instead of FunctionToolParam for the Responses adapter to reflect that custom non-function tools are supported. Use ChatCompletionToolParam instead of Any for the completions adapter return type. Update tests to use typed params in expected values.	2026-04-10 10:15:39 -04:00
Aleix Conchillo Flaqué	43ddbdf1ec	Merge pull request #3797 from iamjr15/fix/idle-processor-event-race Fix asyncio.Event race conditions in idle processors	2026-04-09 16:04:03 -07:00
iamjr15	565349d332	Fix asyncio.Event race conditions in idle processors Move event.clear() from finally block to success path in IdleFrameProcessor and UserIdleProcessor._idle_task_handler(). The finally block unconditionally cleared signals set during async timeout callbacks, causing false-positive idle detection. Closes #3402	2026-04-09 13:41:01 -07:00
Cale Shapera	ec574edd53	Add Inworld Realtime Service (#4140 ) * Add Inworld Realtime LLM service Adds a WebSocket-based realtime service for Inworld's cascade STT/LLM/TTS API with semantic VAD, function calling, and streaming transcription support. New files: - src/pipecat/services/inworld/realtime/ (service, events) - src/pipecat/adapters/services/inworld_realtime_adapter.py - examples/foundational/19zb-inworld-realtime.py Also includes: - websockets dependency for inworld extra in pyproject.toml - Adapter and settings tests matching OpenAI/Grok realtime patterns - Fix for double-response when server-side VAD is enabled * Prefer init-provided system instruction in Inworld Realtime Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup branch. * Update changelog entry with PR number * Fix changelog format to use bullet point * Polish PR: default model, example cleanup, changelog update - Change default model from gpt-4.1-nano to gpt-4.1-mini - Add function calling demo to example - Remove demo-testing artifact from system instruction - Mention Router support in changelog * Address PR review feedback for Inworld Realtime - Move example to examples/realtime/realtime-inworld.py - Change initial context role from "user" to "developer" - Remove explicit sample rates from example; sync them in _ensure_audio_config so Inworld gets the transport's actual rates - Add audio race condition guard in _handle_evt_audio_delta (matches OpenAI realtime pattern) - Convert remaining "system"/"developer" messages to "user" in adapter - Add clarifying comment for local-VAD vs server-VAD metrics paths * Simplify example, add provider tracking, remove local VAD path - Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning - Add pipecat-realtime session key prefix and provider_data metadata for Inworld traffic attribution - Remove local VAD code path (Inworld only supports server-side VAD) - Use typed InputAudioBufferAppendEvent for audio sends * Default TTS model to inworld-tts-1.5-max * Remove dead shimmed tools code, set STT/VAD defaults - Remove non-functional AdapterType.SHIM custom tools code from adapter - Default STT model to assemblyai/u3-rt-pro - Default VAD eagerness to low	2026-04-09 13:04:17 -04:00
Om Chauhan	1443dfb070	added changelog	2026-04-08 08:48:26 +05:30
Om Chauhan	4bef85e363	added custom_tools support for OpenAI adapters	2026-04-08 08:40:03 +05:30
Filipi da Silva Fuchter	27a8a973b1	Merge pull request #4201 from pipecat-ai/mb/handle-recurring-disconnects Fix WebsocketService infinite reconnection loop	2026-04-07 11:02:24 -03:00
Filipi da Silva Fuchter	6eccd16543	Merge pull request #4217 from pipecat-ai/filipi/async_tools Supporting async function calls.	2026-04-07 09:35:03 -03:00
Paul Kompfner	70469e3c0c	Assert no LLMContextFrame when run_llm is not set in message frame tests	2026-04-03 11:34:58 -04:00
Paul Kompfner	6111df947e	Test LLMAssistantAggregator handling of upstream message frames Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame, and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator, mirroring the existing LLMUserAggregator downstream tests. Add frames_to_send_direction param to run_test helper to support this.	2026-04-03 11:34:58 -04:00
Paul Kompfner	4eebfd65d9	Add a `LLMMessagesTransformFrame` to facilitate programmatically editing context in a frame-based way. The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages at that point in time, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.	2026-04-03 11:34:50 -04:00
Mark Backman	fbb49ffc8d	Merge pull request #4233 from pipecat-ai/mb/remove-unused-imports-2026-04-02 Remove unused imports across codebase	2026-04-03 07:26:13 -04:00
Mark Backman	8adb38f87c	Remove unused imports across codebase	2026-04-02 22:21:16 -04:00

1 2 3 4 5 ...

489 Commits