Commit Graph

483 Commits

Author SHA1 Message Date
Paul Kompfner
c4f5f1ebbb test, refactor: follow-ups to LLMService generic refactor
Two follow-ups now that LLMService is generic over its adapter:

- Add an explicit backward-compat test verifying that an LLMService
  subclass with no generic parameter (the third-party-provider
  pattern) instantiates and returns a usable adapter. The existing
  MockLLMService (declared without brackets) already exercised this
  implicitly, but it's worth a named assertion.

- Drop the now-redundant `params: SomeLLMInvocationParams = ...`
  variable annotations on `adapter.get_llm_invocation_params()`
  results. Since `get_llm_adapter()` now returns the precise adapter
  type, and `BaseLLMAdapter` is generic in its invocation-params
  type, the call already infers the right TypedDict.
2026-05-01 09:36:14 -04:00
kompfner
6d66bbceeb Merge pull request #4395 from pipecat-ai/pk/app-resources-api-updates
Broaden tool_resources to app_resources
2026-04-30 21:19:05 -04:00
Paul Kompfner
1b5c4cfa2a feat: broaden tool_resources to app_resources
Broaden `tool_resources` to `app_resources` for easy access not just in
tool handlers but in other places like custom `FrameProcessor`s.

Involves 3 changes:

- A rename: `tool_resources` -> `app_resources`
- A new property on `PipelineTask`: `app_resources`
- A new property on `FrameProcessor`: `pipeline_task`

Usage in tool handler:

    async def get_weather(params: FunctionCallParams):
        resources = cast(MyAppResources, params.app_resources)
        ...

Usage in custom `FrameProcessor`:

    class MyProcessor(FrameProcessor):
        async def process_frame(self, frame, direction):
            await super().process_frame(frame, direction)
            if self.pipeline_task is not None:
                resources = cast(MyAppResources, self.pipeline_task.app_resources)
                ...

The previous `tool_resources` aliases (on `PipelineTask`,
`FunctionCallParams`, and `FrameProcessorSetup`) keep working but are
deprecated as of 1.2.0 and emit `DeprecationWarning`s.
2026-04-30 16:16:17 -04:00
Mark Backman
351105a975 test(krisp): scope importlib.metadata.version mock to imports only
The four krisp test files installed a process-wide mock of
importlib.metadata.version with `patch(...).start()` at module level and
never called .stop(). Once any of these files was collected, the mock
leaked across the rest of the test session, returning '0.0.0-dev' for
every version check. This corrupted unrelated tests that triggered
transformers' import-time dependency check (e.g. lazy imports of
LocalSmartTurnAnalyzerV3) — transformers saw tqdm=='0.0.0-dev' and
refused to load.

Wrap the pipecat imports in `with patch(...)` so the mock is active
during import (where pipecat's krisp version check needs it) and torn
down before any tests run.
2026-04-30 14:16:54 -04:00
kompfner
ce1311f6ba Merge pull request #4301 from bnovik0v/fix-4300-missing-tool-lifecycle
Fail missing tool calls cleanly
2026-04-27 11:54:43 -04:00
borislav
8869e25142 fix: compare bound method by equality, not identity
Bound methods are created fresh on each attribute access, so
'self._missing_function_call_handler is self._missing_function_call_handler'
is always False. Using 'is' meant the placeholder branch never fired and
both warnings logged when a function was missing at queue time.

Switch to == so equality compares the underlying function and instance.
Strengthen the missing-at-queue-time test to assert the second warning
does not fire.
2026-04-27 17:34:31 +02:00
borislav
822392b0d4 fix: re-resolve registry item at execution time
Address review feedback: a function may be unregistered between when
run_function_calls queues it and when _run_function_call executes it.
Restore the live lookup, falling back to the missing-function handler
when the entry is gone, so the call still terminates with a normal
tool result. Factor the missing-handler item construction into a
helper since it's now built in two places.
2026-04-27 17:22:30 +02:00
kompfner
bc29bdb95e Merge pull request #4371 from Stoic-Angel/feat-global-context
Add a global context for tool calls: tool_resources
2026-04-27 10:55:03 -04:00
Garegin Harutyunyan
e5941926be Krisp tt demo tool (#4335)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* TT demo tool

* Some improvements for demo scripts, audio recordin, etc.

* Enhance demo scripts with VAD selection and audio embedding features. Updated HTML report to include annotated audio players and improved response time metrics in summary formatting. Added README for setup and usage instructions.

* Refactor interrupt prediction demo to compare multiple interruption strategies (Krisp IP vs VAD). Updated README with usage instructions and output details. Enhanced audio processing with new helper functions for generating beeps and mixing audio.

* Refactor demo scripts to improve latency metrics by introducing total_delay property in TurnEvent. Update formatting in reports and visualizations to reflect accurate speech end times, including VAD wait times. Enhance HTML report with detailed latency information and adjust audio processing to account for VAD stop seconds.

* Add audio resampling functionality and update demo scripts for improved audio processing

- Introduced `resample_audio` function to handle audio resampling with linear interpolation.
- Updated `demo_audio_recorder.py` to utilize the new resampling feature, ensuring audio is saved at the requested sample rate.
- Modified `demo_interrupt_prediction.py` and `demo_turn_taking.py` to resample audio to 16 kHz for compatibility with Silero VAD.
- Adjusted imports in demo scripts to include the new resampling function.
- Enhanced error handling for sample rate discrepancies in audio recording.

* Enhance demo_interrupt_prediction.py with VAD type selection and improved processing logic

- Added support for selecting between "silero" and "krisp" VAD engines in the demo script.
- Introduced a new create_vad function to configure VAD analyzers based on the selected type.
- Updated audio processing logic to handle VAD type-specific resampling and state management.
- Modified the KrispVivaIPUserTurnStartStrategy to utilize a separate vad_flag for per-frame VAD input, improving interruption detection accuracy.

* Refactor audio processing scripts for improved readability and consistency

- Updated type hinting in `resample_audio` function to use `tuple` instead of `Tuple`.
- Simplified print statements in `demo_audio_recorder.py`, `demo_formatting.py`, and `demo_interrupt_prediction.py` for better readability.
- Adjusted argument formatting in `demo_audio_recorder.py` and `demo_formatting.py` for consistency.
- Cleaned up list comprehensions in `demo_formatting.py`, `demo_html_report.py`, and `demo_interrupt_prediction.py` for clarity.
- Enhanced error handling in `__init__.py` for the KrispVivaIPUserTurnStartStrategy import.

* Refactor VAD handling in KrispVivaIPUserTurnStartStrategy and update tests for clarity

- Simplified the argument formatting in the _handle_vad_started method for improved readability.
- Updated test assertions to reflect changes in VAD processing logic, ensuring that the vad_flag is correctly set to False during continuous state processing.
- Enhanced test cases to verify that the process method is called appropriately under different conditions.

* more format fixes.

* removed demo scripts.

* reverted wrongly removed file.

* Corrected the IP integration logic.

* style fix.

* Refactor audio processing and state management in KrispVivaIPUserTurnStartStrategy

- Removed the unused _vad_flag attribute to streamline state tracking.
- Updated the reset method to clear the audio buffer instead of resetting the vad_flag.
- Adjusted the process_frame method to use _speech_active for VAD input, enhancing clarity in the logic.
- Modified tests to reflect changes in state management and ensure proper functionality of the reset method and audio buffer handling.

* FIxed formatting

---------

Co-authored-by: Aram Poghosyan <apoghosyan@krisp.ai>
2026-04-27 08:14:00 -04:00
Aayush Jain
108e32eb72 Add a global context for tool calls - tool_resources, as a parameter to PipelineTask and FrameProcessorSetup 2026-04-25 02:12:40 +05:30
filipi87
ac810e57ed Merge branch 'main' into filipi/includes_inter_frame_spaces
# Conflicts:
#	uv.lock
2026-04-22 15:22:06 -03:00
Mark Backman
3f3d3c9203 Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy
Split user-turn stop timeout into independent speech and STT timers
2026-04-22 10:23:03 -04:00
Mark Backman
d8f5c0be71 Add XAITTSService for xAI streaming WebSocket TTS
Adds XAITTSService in the existing xai/tts.py module, alongside the
existing XAIHttpTTSService. Connects to xAI's streaming endpoint at
wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta
chunks down on the same connection so audio starts flowing before the full
utterance is synthesized.

Extends InterruptibleTTSService since xAI's protocol is strictly sequential
per connection and exposes neither a cancel verb nor a context ID — the
only way to stop an in-flight utterance is to tear down the WebSocket,
which is exactly what InterruptibleTTSService does on interruption when
the bot is speaking.

Voice, language, codec, and sample_rate are passed as query-string params
at connect time; runtime setting changes reconnect the socket. Defaults to
raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream.

Splits the existing example into voice-xai.py (WebSocket) and
voice-xai-http.py (batch HTTP) so each variant has its own entry point.
Promotes the xai extra to depend on pipecat-ai[websockets-base] since the
new service imports the websockets library.
2026-04-21 15:48:26 -04:00
Mark Backman
b59c4775da Split user-turn stop timeout into independent speech and STT timers
SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into
max(stt_timeout, user_speech_timeout), which over-waited for finalizing
STT services and could also end the turn early in a legacy code path.
Run them as independent timers instead:

- user_speech_timeout: policy floor, always runs to completion.
- stt_timeout: latency safety net, short-circuited by a finalized
  transcript since STT has signaled it has nothing more to send.

The no-VAD fallback now waits only user_speech_timeout rather than
max(stt_timeout, user_speech_timeout); stt_timeout is defined relative
to VAD stop and has no meaning when no VAD event occurred. This
shortens the fallback wait for users who set stt_timeout greater than
user_speech_timeout.
2026-04-20 11:55:09 -04:00
Ian Lee
b435ddfa44 feat(tts): add includes_inter_frame_spaces flag to word-timestamp API
Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and
punctuation are already embedded in the token text. When downstream consumers
join these tokens with an extra space they produce "hello , world" instead of
"hello, world".

Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to
`add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through
`_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`.
Defaults to `False` — no behaviour change for existing services.

`InworldTTSService` passes `includes_inter_frame_spaces=True` and stops
pre-processing tokens in `_calculate_word_times`, returning them verbatim.

Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket
delivery paths: verbatim text preservation, PTS ordering, text-before-audio
ordering, and the Inworld punctuation-token scenario.

Made-with: Cursor
2026-04-18 12:03:32 -07:00
Garegin Harutyunyan
4c19f5584c VIVA SDK TT v3 support (#4252)
* VIVA SDK TT v3 support

* Format fix.

* Renamed the API naming, removed '3' from the name.

* Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support.

* Typo fix in voice-krisp-viva example to use KrispVivaFilter class

* style fix.

* test run error fixes.

* some test related changes.

* Fixed tests

* Stule fixes.
2026-04-17 07:53:41 -04:00
Aleix Conchillo Flaqué
b3bb6fdaa5 Modernize Python typing across the codebase
Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311):

- Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type`
  with their built-in equivalents (`list`, `dict`, `tuple`, etc.)
- Replace `typing.Optional[X]` with `X | None`
- Replace `typing.Union[X, Y]` with `X | Y`
- Move `Mapping`, `Sequence`, `Callable`, `Awaitable`,
  `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`,
  `AsyncGenerator` imports from `typing` to `collections.abc`
- Remove now-unused `typing` imports
- Add `from __future__ import annotations` to 5 files that use
  forward-reference strings in `X | "Y"` annotations
2026-04-16 09:28:23 -07:00
borislav
86e726107f fix: fail missing tool calls cleanly 2026-04-14 22:40:45 +02:00
Aleix Conchillo Flaqué
958d90819f Merge pull request #4294 from pipecat-ai/ac/fix-assistant-turn-stopped-event
Fix on_assistant_turn_stopped not firing for tool-call-only responses
2026-04-14 10:09:55 -07:00
Aleix Conchillo Flaqué
698c2ba92e Fix on_assistant_turn_stopped not firing for empty LLM responses
When the LLM returned zero text tokens (e.g. it was interrupted before producing
tokens or about to push tokens), push_aggregation() returned an empty string and
on_assistant_turn_stopped was never emitted. This left consumers waiting for an
event that would never arrive.

Now on_assistant_turn_stopped always fires, with an empty content string when
the LLM produced no text tokens.

Fixes #4292
2026-04-14 10:07:19 -07:00
Mark Backman
989fb4deaa Fix context summarization failing with mid-conversation system messages
Only treat messages[0] as the initial system prompt when determining the
summarization range. Previously, the code scanned the entire context for
the first system-role message, which caused failures when the only system
message was a mid-conversation injection (e.g. "The user has been quiet").
In that case summary_start exceeded summary_end, producing an empty range
and "No messages to summarize" errors.

Fixes #4286
2026-04-14 11:48:50 -04:00
Mark Backman
d1f7af0330 Merge pull request #4283 from pipecat-ai/mb/user-stop-transcript-improvements 2026-04-13 19:27:05 -04:00
Mark Backman
804e3ea9ec Trigger turn stop immediately when transcript arrives after p99 timeout
When the STT p99 timeout fires without a transcript, the turn stop
strategy previously did nothing — falling through to the 5-second
user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the
timeout has elapsed so that a late transcript triggers the turn stop
immediately instead of waiting for the fallback.
2026-04-13 18:11:32 -04:00
Aleix Conchillo Flaqué
7dc763d512 Merge pull request #4272 from pipecat-ai/pk/llm-context-get-messages-elide-large-values
Add truncate_large_values to LLMContext.get_messages()
2026-04-13 15:04:41 -07:00
Paul Kompfner
1a02b5d61a Rename elide_large_values to truncate_large_values 2026-04-11 14:29:05 -04:00
Aleix Conchillo Flaqué
f91a113de7 tests: yield in wake phrase strategy setup to let tasks start
The strategy schedules background tasks during setup. Fast-running
tests could observe state before those tasks had a chance to run;
yielding once via asyncio.sleep(0) ensures they do.
2026-04-10 17:37:50 -07:00
Aleix Conchillo Flaqué
e553bb010f tests: migrate LLM tests to Settings-based constructor API
Replace the old `model=` / `params=InputParams(...)` style with the
new `settings=<Service>.Settings(...)` form across LLM service tests.
2026-04-10 17:37:49 -07:00
Paul Kompfner
812cdc6822 Add elide_large_values to LLMContext.get_messages()
Enable callers to get a compact version of context messages suitable
for serialization, logging, and debugging tools. For standard
messages, known binary data (base64 images, audio) is fully elided.
For LLM-specific messages, long string values are recursively
truncated. Adapter get_messages_for_logging() methods now use this.
2026-04-10 16:35:36 -04:00
Aleix Conchillo Flaqué
dcd21e7ff4 Rework audio idle detection with timestamp-based adaptive sleep
Replaces the per-frame asyncio.Event signaling with a monotonic
timestamp updated on each audio frame. The handler sleeps until the
next deadline (last_audio_time + timeout), recomputing on each wake-up
to account for audio arriving during sleep.

This avoids waking the handler on every audio frame (~50/s at 20ms
chunks), and guarantees detection latency is bounded by timeout rather
than 2 * timeout.

Also renames audio_starvation_timeout to audio_idle_timeout and
associated identifiers for consistency with existing pipecat naming
(user_idle_timeout, etc.).
2026-04-10 10:35:18 -07:00
Om Chauhan
cb2c1868b0 fix VAD stuck in SPEAKING state when audio stops mid-speech 2026-04-10 09:54:48 -07:00
kompfner
d07eebff20 Merge pull request #4248 from omChauhanDev/add-openai-custom-tools-support
Add custom_tools support for OpenAI adapters
2026-04-10 10:27:28 -04:00
Paul Kompfner
fc3307bc63 Use OpenAI SDK types for tool params in adapters and tests
These are TypedDicts (plain dicts at runtime), so no behavioral change
— just more descriptive type hints for readers. Use ToolParam instead
of FunctionToolParam for the Responses adapter to reflect that custom
non-function tools are supported. Use ChatCompletionToolParam instead
of Any for the completions adapter return type. Update tests to use
typed params in expected values.
2026-04-10 10:15:39 -04:00
Aleix Conchillo Flaqué
43ddbdf1ec Merge pull request #3797 from iamjr15/fix/idle-processor-event-race
Fix asyncio.Event race conditions in idle processors
2026-04-09 16:04:03 -07:00
iamjr15
565349d332 Fix asyncio.Event race conditions in idle processors
Move event.clear() from finally block to success path in
IdleFrameProcessor and UserIdleProcessor._idle_task_handler().
The finally block unconditionally cleared signals set during
async timeout callbacks, causing false-positive idle detection.

Closes #3402
2026-04-09 13:41:01 -07:00
Cale Shapera
ec574edd53 Add Inworld Realtime Service (#4140)
* Add Inworld Realtime LLM service

Adds a WebSocket-based realtime service for Inworld's cascade
STT/LLM/TTS API with semantic VAD, function calling, and streaming
transcription support.

New files:
- src/pipecat/services/inworld/realtime/ (service, events)
- src/pipecat/adapters/services/inworld_realtime_adapter.py
- examples/foundational/19zb-inworld-realtime.py

Also includes:
- websockets dependency for inworld extra in pyproject.toml
- Adapter and settings tests matching OpenAI/Grok realtime patterns
- Fix for double-response when server-side VAD is enabled

* Prefer init-provided system instruction in Inworld Realtime

Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the
pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and
Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup
branch.

* Update changelog entry with PR number

* Fix changelog format to use bullet point

* Polish PR: default model, example cleanup, changelog update

- Change default model from gpt-4.1-nano to gpt-4.1-mini
- Add function calling demo to example
- Remove demo-testing artifact from system instruction
- Mention Router support in changelog

* Address PR review feedback for Inworld Realtime

- Move example to examples/realtime/realtime-inworld.py
- Change initial context role from "user" to "developer"
- Remove explicit sample rates from example; sync them in
  _ensure_audio_config so Inworld gets the transport's actual rates
- Add audio race condition guard in _handle_evt_audio_delta (matches
  OpenAI realtime pattern)
- Convert remaining "system"/"developer" messages to "user" in adapter
- Add clarifying comment for local-VAD vs server-VAD metrics paths

* Simplify example, add provider tracking, remove local VAD path

- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning
- Add pipecat-realtime session key prefix and provider_data metadata
  for Inworld traffic attribution
- Remove local VAD code path (Inworld only supports server-side VAD)
- Use typed InputAudioBufferAppendEvent for audio sends

* Default TTS model to inworld-tts-1.5-max

* Remove dead shimmed tools code, set STT/VAD defaults

- Remove non-functional AdapterType.SHIM custom tools code from adapter
- Default STT model to assemblyai/u3-rt-pro
- Default VAD eagerness to low
2026-04-09 13:04:17 -04:00
Om Chauhan
1443dfb070 added changelog 2026-04-08 08:48:26 +05:30
Om Chauhan
4bef85e363 added custom_tools support for OpenAI adapters 2026-04-08 08:40:03 +05:30
Filipi da Silva Fuchter
27a8a973b1 Merge pull request #4201 from pipecat-ai/mb/handle-recurring-disconnects
Fix WebsocketService infinite reconnection loop
2026-04-07 11:02:24 -03:00
Filipi da Silva Fuchter
6eccd16543 Merge pull request #4217 from pipecat-ai/filipi/async_tools
Supporting async function calls.
2026-04-07 09:35:03 -03:00
Paul Kompfner
70469e3c0c Assert no LLMContextFrame when run_llm is not set in message frame tests 2026-04-03 11:34:58 -04:00
Paul Kompfner
6111df947e Test LLMAssistantAggregator handling of upstream message frames
Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame,
and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator,
mirroring the existing LLMUserAggregator downstream tests. Add
frames_to_send_direction param to run_test helper to support this.
2026-04-03 11:34:58 -04:00
Paul Kompfner
4eebfd65d9 Add a LLMMessagesTransformFrame to facilitate programmatically editing context in a frame-based way.
The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages *at that point in time*, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.
2026-04-03 11:34:50 -04:00
Mark Backman
fbb49ffc8d Merge pull request #4233 from pipecat-ai/mb/remove-unused-imports-2026-04-02
Remove unused imports across codebase
2026-04-03 07:26:13 -04:00
Mark Backman
8adb38f87c Remove unused imports across codebase 2026-04-02 22:21:16 -04:00
Mark Backman
41e46ee69e Remove deprecated vad_events and should_interrupt from DeepgramSTTService
Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of
Silero VAD. This removes vad_events from settings and LiveOptions,
the should_interrupt parameter, the vad_enabled property,
_on_speech_started/_on_utterance_end handlers, and simplifies
_on_message and process_frame accordingly.
2026-04-02 22:05:49 -04:00
Mark Backman
793ed8f9e3 Remove deprecated UserBotLatencyLogObserver and UserIdleProcessor
UserBotLatencyLogObserver (deprecated 0.0.102) is replaced by
UserBotLatencyObserver. UserIdleProcessor (deprecated 0.0.100) is
replaced by LLMUserAggregator with user_idle_timeout.
2026-04-02 21:54:36 -04:00
filipi87
929a0e33f4 Fixing the automated tests. 2026-04-02 16:58:28 -03:00
Aleix Conchillo Flaqué
976c644f90 Fix tests to expect SpeechControlParamsFrame from default turn strategy 2026-04-02 12:42:06 -07:00
Mark Backman
d503383c23 Remove deprecated interruption_strategies plumbing
The interruption_strategies mechanism was deprecated in v0.0.99 in favor
of LLMUserAggregator's user_turn_strategies. All evaluation logic was
already removed — this removes the remaining field definitions, property,
StartFrame propagation, conditional check in base_input.py, strategy
files, and test.
2026-04-02 11:19:17 -04:00
Mark Backman
2a118084bd Remove deprecated transcript_processor module 2026-04-02 10:57:05 -04:00