pipecat

Author	SHA1	Message	Date
filipi87	ac810e57ed	Merge branch 'main' into filipi/includes_inter_frame_spaces # Conflicts: # uv.lock	2026-04-22 15:22:06 -03:00
Mark Backman	3f3d3c9203	Merge pull request #4337 from pipecat-ai/mb/fix-speech-stop-strategy Split user-turn stop timeout into independent speech and STT timers	2026-04-22 10:23:03 -04:00
Mark Backman	d8f5c0be71	Add XAITTSService for xAI streaming WebSocket TTS Adds XAITTSService in the existing xai/tts.py module, alongside the existing XAIHttpTTSService. Connects to xAI's streaming endpoint at wss://api.x.ai/v1/tts, streams text.delta chunks up and base64 audio.delta chunks down on the same connection so audio starts flowing before the full utterance is synthesized. Extends InterruptibleTTSService since xAI's protocol is strictly sequential per connection and exposes neither a cancel verb nor a context ID — the only way to stop an in-flight utterance is to tear down the WebSocket, which is exactly what InterruptibleTTSService does on interruption when the bot is speaking. Voice, language, codec, and sample_rate are passed as query-string params at connect time; runtime setting changes reconnect the socket. Defaults to raw PCM so emitted TTSAudioRawFrame objects need no decoding downstream. Splits the existing example into voice-xai.py (WebSocket) and voice-xai-http.py (batch HTTP) so each variant has its own entry point. Promotes the xai extra to depend on pipecat-ai[websockets-base] since the new service imports the websockets library.	2026-04-21 15:48:26 -04:00
Mark Backman	b59c4775da	Split user-turn stop timeout into independent speech and STT timers SpeechTimeoutUserTurnStopStrategy previously collapsed two waits into max(stt_timeout, user_speech_timeout), which over-waited for finalizing STT services and could also end the turn early in a legacy code path. Run them as independent timers instead: - user_speech_timeout: policy floor, always runs to completion. - stt_timeout: latency safety net, short-circuited by a finalized transcript since STT has signaled it has nothing more to send. The no-VAD fallback now waits only user_speech_timeout rather than max(stt_timeout, user_speech_timeout); stt_timeout is defined relative to VAD stop and has no meaning when no VAD event occurred. This shortens the fallback wait for users who set stt_timeout greater than user_speech_timeout.	2026-04-20 11:55:09 -04:00
Ian Lee	b435ddfa44	feat(tts): add includes_inter_frame_spaces flag to word-timestamp API Some TTS providers (e.g. Inworld) return verbatim tokens where spaces and punctuation are already embedded in the token text. When downstream consumers join these tokens with an extra space they produce "hello , world" instead of "hello, world". Add an opt-in `includes_inter_frame_spaces: bool = False` parameter to `add_word_timestamps` / `_add_word_timestamps`. The flag is threaded through `_WordTimestampEntry` and stamped onto every emitted `TTSTextFrame`. Defaults to `False` — no behaviour change for existing services. `InworldTTSService` passes `includes_inter_frame_spaces=True` and stops pre-processing tokens in `_calculate_word_times`, returning them verbatim. Tests added to `test_tts_frame_ordering.py` covering both HTTP and WebSocket delivery paths: verbatim text preservation, PTS ordering, text-before-audio ordering, and the Inworld punctuation-token scenario. Made-with: Cursor	2026-04-18 12:03:32 -07:00
Garegin Harutyunyan	4c19f5584c	VIVA SDK TT v3 support (#4252 ) * VIVA SDK TT v3 support * Format fix. * Renamed the API naming, removed '3' from the name. * Implementation of User turn start strategy using Krisp VIVA Interruption Prediction in scope of TT v3 support. * Typo fix in voice-krisp-viva example to use KrispVivaFilter class * style fix. * test run error fixes. * some test related changes. * Fixed tests * Stule fixes.	2026-04-17 07:53:41 -04:00
Aleix Conchillo Flaqué	b3bb6fdaa5	Modernize Python typing across the codebase Automated via ruff UP006, UP007, UP035, UP045 rules (target: py311): - Replace `typing.List`, `Dict`, `Tuple`, `Set`, `FrozenSet`, `Type` with their built-in equivalents (`list`, `dict`, `tuple`, etc.) - Replace `typing.Optional[X]` with `X \| None` - Replace `typing.Union[X, Y]` with `X \| Y` - Move `Mapping`, `Sequence`, `Callable`, `Awaitable`, `MutableMapping`, `MutableSequence`, `Iterator`, `AsyncIterator`, `AsyncGenerator` imports from `typing` to `collections.abc` - Remove now-unused `typing` imports - Add `from __future__ import annotations` to 5 files that use forward-reference strings in `X \| "Y"` annotations	2026-04-16 09:28:23 -07:00
Aleix Conchillo Flaqué	958d90819f	Merge pull request #4294 from pipecat-ai/ac/fix-assistant-turn-stopped-event Fix on_assistant_turn_stopped not firing for tool-call-only responses	2026-04-14 10:09:55 -07:00
Aleix Conchillo Flaqué	698c2ba92e	Fix on_assistant_turn_stopped not firing for empty LLM responses When the LLM returned zero text tokens (e.g. it was interrupted before producing tokens or about to push tokens), push_aggregation() returned an empty string and on_assistant_turn_stopped was never emitted. This left consumers waiting for an event that would never arrive. Now on_assistant_turn_stopped always fires, with an empty content string when the LLM produced no text tokens. Fixes #4292	2026-04-14 10:07:19 -07:00
Mark Backman	989fb4deaa	Fix context summarization failing with mid-conversation system messages Only treat messages[0] as the initial system prompt when determining the summarization range. Previously, the code scanned the entire context for the first system-role message, which caused failures when the only system message was a mid-conversation injection (e.g. "The user has been quiet"). In that case summary_start exceeded summary_end, producing an empty range and "No messages to summarize" errors. Fixes #4286	2026-04-14 11:48:50 -04:00
Mark Backman	d1f7af0330	Merge pull request #4283 from pipecat-ai/mb/user-stop-transcript-improvements	2026-04-13 19:27:05 -04:00
Mark Backman	804e3ea9ec	Trigger turn stop immediately when transcript arrives after p99 timeout When the STT p99 timeout fires without a transcript, the turn stop strategy previously did nothing — falling through to the 5-second user_turn_stop_timeout. Now, a _timeout_expired flag tracks when the timeout has elapsed so that a late transcript triggers the turn stop immediately instead of waiting for the fallback.	2026-04-13 18:11:32 -04:00
Aleix Conchillo Flaqué	7dc763d512	Merge pull request #4272 from pipecat-ai/pk/llm-context-get-messages-elide-large-values Add truncate_large_values to LLMContext.get_messages()	2026-04-13 15:04:41 -07:00
Paul Kompfner	1a02b5d61a	Rename elide_large_values to truncate_large_values	2026-04-11 14:29:05 -04:00
Aleix Conchillo Flaqué	f91a113de7	tests: yield in wake phrase strategy setup to let tasks start The strategy schedules background tasks during setup. Fast-running tests could observe state before those tasks had a chance to run; yielding once via asyncio.sleep(0) ensures they do.	2026-04-10 17:37:50 -07:00
Aleix Conchillo Flaqué	e553bb010f	tests: migrate LLM tests to Settings-based constructor API Replace the old `model=` / `params=InputParams(...)` style with the new `settings=<Service>.Settings(...)` form across LLM service tests.	2026-04-10 17:37:49 -07:00
Paul Kompfner	812cdc6822	Add elide_large_values to LLMContext.get_messages() Enable callers to get a compact version of context messages suitable for serialization, logging, and debugging tools. For standard messages, known binary data (base64 images, audio) is fully elided. For LLM-specific messages, long string values are recursively truncated. Adapter get_messages_for_logging() methods now use this.	2026-04-10 16:35:36 -04:00
Aleix Conchillo Flaqué	dcd21e7ff4	Rework audio idle detection with timestamp-based adaptive sleep Replaces the per-frame asyncio.Event signaling with a monotonic timestamp updated on each audio frame. The handler sleeps until the next deadline (last_audio_time + timeout), recomputing on each wake-up to account for audio arriving during sleep. This avoids waking the handler on every audio frame (~50/s at 20ms chunks), and guarantees detection latency is bounded by timeout rather than 2 * timeout. Also renames audio_starvation_timeout to audio_idle_timeout and associated identifiers for consistency with existing pipecat naming (user_idle_timeout, etc.).	2026-04-10 10:35:18 -07:00
Om Chauhan	cb2c1868b0	fix VAD stuck in SPEAKING state when audio stops mid-speech	2026-04-10 09:54:48 -07:00
kompfner	d07eebff20	Merge pull request #4248 from omChauhanDev/add-openai-custom-tools-support Add custom_tools support for OpenAI adapters	2026-04-10 10:27:28 -04:00
Paul Kompfner	fc3307bc63	Use OpenAI SDK types for tool params in adapters and tests These are TypedDicts (plain dicts at runtime), so no behavioral change — just more descriptive type hints for readers. Use ToolParam instead of FunctionToolParam for the Responses adapter to reflect that custom non-function tools are supported. Use ChatCompletionToolParam instead of Any for the completions adapter return type. Update tests to use typed params in expected values.	2026-04-10 10:15:39 -04:00
Aleix Conchillo Flaqué	43ddbdf1ec	Merge pull request #3797 from iamjr15/fix/idle-processor-event-race Fix asyncio.Event race conditions in idle processors	2026-04-09 16:04:03 -07:00
iamjr15	565349d332	Fix asyncio.Event race conditions in idle processors Move event.clear() from finally block to success path in IdleFrameProcessor and UserIdleProcessor._idle_task_handler(). The finally block unconditionally cleared signals set during async timeout callbacks, causing false-positive idle detection. Closes #3402	2026-04-09 13:41:01 -07:00
Cale Shapera	ec574edd53	Add Inworld Realtime Service (#4140 ) * Add Inworld Realtime LLM service Adds a WebSocket-based realtime service for Inworld's cascade STT/LLM/TTS API with semantic VAD, function calling, and streaming transcription support. New files: - src/pipecat/services/inworld/realtime/ (service, events) - src/pipecat/adapters/services/inworld_realtime_adapter.py - examples/foundational/19zb-inworld-realtime.py Also includes: - websockets dependency for inworld extra in pyproject.toml - Adapter and settings tests matching OpenAI/Grok realtime patterns - Fix for double-response when server-side VAD is enabled * Prefer init-provided system instruction in Inworld Realtime Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup branch. * Update changelog entry with PR number * Fix changelog format to use bullet point * Polish PR: default model, example cleanup, changelog update - Change default model from gpt-4.1-nano to gpt-4.1-mini - Add function calling demo to example - Remove demo-testing artifact from system instruction - Mention Router support in changelog * Address PR review feedback for Inworld Realtime - Move example to examples/realtime/realtime-inworld.py - Change initial context role from "user" to "developer" - Remove explicit sample rates from example; sync them in _ensure_audio_config so Inworld gets the transport's actual rates - Add audio race condition guard in _handle_evt_audio_delta (matches OpenAI realtime pattern) - Convert remaining "system"/"developer" messages to "user" in adapter - Add clarifying comment for local-VAD vs server-VAD metrics paths * Simplify example, add provider tracking, remove local VAD path - Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning - Add pipecat-realtime session key prefix and provider_data metadata for Inworld traffic attribution - Remove local VAD code path (Inworld only supports server-side VAD) - Use typed InputAudioBufferAppendEvent for audio sends * Default TTS model to inworld-tts-1.5-max * Remove dead shimmed tools code, set STT/VAD defaults - Remove non-functional AdapterType.SHIM custom tools code from adapter - Default STT model to assemblyai/u3-rt-pro - Default VAD eagerness to low	2026-04-09 13:04:17 -04:00
Om Chauhan	1443dfb070	added changelog	2026-04-08 08:48:26 +05:30
Om Chauhan	4bef85e363	added custom_tools support for OpenAI adapters	2026-04-08 08:40:03 +05:30
Filipi da Silva Fuchter	27a8a973b1	Merge pull request #4201 from pipecat-ai/mb/handle-recurring-disconnects Fix WebsocketService infinite reconnection loop	2026-04-07 11:02:24 -03:00
Filipi da Silva Fuchter	6eccd16543	Merge pull request #4217 from pipecat-ai/filipi/async_tools Supporting async function calls.	2026-04-07 09:35:03 -03:00
Paul Kompfner	70469e3c0c	Assert no LLMContextFrame when run_llm is not set in message frame tests	2026-04-03 11:34:58 -04:00
Paul Kompfner	6111df947e	Test LLMAssistantAggregator handling of upstream message frames Add tests for LLMRunFrame, LLMMessagesAppendFrame, LLMMessagesUpdateFrame, and LLMMessagesTransformFrame sent upstream to LLMAssistantAggregator, mirroring the existing LLMUserAggregator downstream tests. Add frames_to_send_direction param to run_test helper to support this.	2026-04-03 11:34:58 -04:00
Paul Kompfner	4eebfd65d9	Add a `LLMMessagesTransformFrame` to facilitate programmatically editing context in a frame-based way. The previous approach required the caller to directly grab a reference to the context object, grab a "snapshot" of its messages at that point in time, transform the messages, and then push an `LLMMessagesUpdateFrame` with the transformed messages. This approach can lead to problems: what if there had already been a change to the context queued in the pipeline? The transformed messages would simply overwrite it without consideration.	2026-04-03 11:34:50 -04:00
Mark Backman	fbb49ffc8d	Merge pull request #4233 from pipecat-ai/mb/remove-unused-imports-2026-04-02 Remove unused imports across codebase	2026-04-03 07:26:13 -04:00
Mark Backman	8adb38f87c	Remove unused imports across codebase	2026-04-02 22:21:16 -04:00
Mark Backman	41e46ee69e	Remove deprecated vad_events and should_interrupt from DeepgramSTTService Deepgram's built-in VAD events were deprecated in 0.0.99 in favor of Silero VAD. This removes vad_events from settings and LiveOptions, the should_interrupt parameter, the vad_enabled property, _on_speech_started/_on_utterance_end handlers, and simplifies _on_message and process_frame accordingly.	2026-04-02 22:05:49 -04:00
Mark Backman	793ed8f9e3	Remove deprecated UserBotLatencyLogObserver and UserIdleProcessor UserBotLatencyLogObserver (deprecated 0.0.102) is replaced by UserBotLatencyObserver. UserIdleProcessor (deprecated 0.0.100) is replaced by LLMUserAggregator with user_idle_timeout.	2026-04-02 21:54:36 -04:00
filipi87	929a0e33f4	Fixing the automated tests.	2026-04-02 16:58:28 -03:00
Aleix Conchillo Flaqué	976c644f90	Fix tests to expect SpeechControlParamsFrame from default turn strategy	2026-04-02 12:42:06 -07:00
Mark Backman	d503383c23	Remove deprecated interruption_strategies plumbing The interruption_strategies mechanism was deprecated in v0.0.99 in favor of LLMUserAggregator's user_turn_strategies. All evaluation logic was already removed — this removes the remaining field definitions, property, StartFrame propagation, conditional check in base_input.py, strategy files, and test.	2026-04-02 11:19:17 -04:00
Mark Backman	2a118084bd	Remove deprecated transcript_processor module	2026-04-02 10:57:05 -04:00
Mark Backman	87e8ed109a	Remove deprecated STTMuteFilter, STTMuteConfig, and STTMuteStrategy	2026-04-02 10:52:41 -04:00
Mark Backman	41e3afbc2f	Remove deprecated add_pattern_pair method from PatternPairAggregator	2026-04-02 10:28:01 -04:00
kompfner	a3c7f6c2af	Merge pull request #4215 from pipecat-ai/pk/remove-openaillmcontext Remove deprecated `OpenAILLMContext` as well as everything (code path…	2026-04-01 14:03:35 -04:00
Paul Kompfner	ebab75765d	Fix stream cancellation tests to mock get_chat_completions The tests were mocking the removed _stream_chat_completions_*_context methods. Update them to mock get_chat_completions instead.	2026-03-31 18:54:23 -04:00
Paul Kompfner	394599d031	Remove deprecated `OpenAILLMContext` as well as everything (code paths or whole types) dependent on it (all of which were also deprecated)	2026-03-31 18:15:25 -04:00
mattie ruth backman	0f47076703	More RTVI version parsing improvements	2026-03-31 16:05:53 -04:00
mattie ruth backman	3e255f3d21	improve version format check	2026-03-31 16:05:53 -04:00
mattie ruth backman	565b9b961d	add tests for rtvi versioning	2026-03-31 16:05:53 -04:00
Mark Backman	7501effad5	Remove deprecated service module shims and old implementations Delete deprecated import shims that only re-export from new locations: - services/ai_services.py - services/gemini_multimodal_live/ - services/aws_nova_sonic/ - services/openai_realtime/ - services/deepgram/{stt,tts}_sagemaker.py - services/google/{llm_openai,llm_vertex,google}.py - services/google/gemini_live/llm_vertex.py - services/riva/ - services/nim/ Remove deprecated implementations replaced by newer services: - services/openai_realtime_beta/ (use openai.realtime) - services/google/openai/ (use google.llm) Also removes associated examples and tests for deleted services.	2026-03-31 15:34:14 -04:00
Paul Kompfner	712e42533d	Introduce WebsocketLLMService and refactor OpenAIResponsesLLMService to use it Add WebsocketLLMService as a base class for WebSocket-based LLM services, parallel to WebsocketTTSService/WebsocketSTTService but codifying a transactional request-response model rather than a continuous background receive loop. WebsocketLLMService provides: - Connection lifecycle (start/stop/cancel → connect/disconnect) - _ws_send/_ws_recv with transparent ConnectionClosed handling (auto-reconnect via exponential backoff → WebsocketReconnectedError) - _ensure_connected with retry via _try_reconnect OpenAIResponsesLLMService now inherits from WebsocketLLMService, removing duplicated connection management code (_connect, _disconnect, _reconnect, _ensure_connected, _ws_send, start, stop, cancel) and simplifying _process_context from a loop with attempt tracking to a flat try/except with a single retry.	2026-03-30 22:26:31 -04:00
Mark Backman	f6a3678f93	Improve tests	2026-03-30 12:46:30 -04:00

1 2 3 4 5 ...

472 Commits