Commit Graph

8403 Commits

Author SHA1 Message Date
Paul Kompfner
ea89819ece chore: update previous_response_id comment 2026-03-20 21:34:22 -04:00
Paul Kompfner
c66a5a8ede feat: set store=False and add run_inference tests
Set store=False in Responses API calls since we send full conversation
history as input items and don't use previous_response_id.

Add 5 run_inference tests for OpenAIResponsesLLMService using real
LLMContext and adapter (only HTTP client mocked).
2026-03-20 21:34:22 -04:00
Paul Kompfner
cd2886a4a8 chore: add note about previous_response_id and empty input handling 2026-03-20 21:34:22 -04:00
Paul Kompfner
312837a1d4 test: add run_inference tests for OpenAIResponsesLLMService
Uses real LLMContext and adapter (only HTTP client is mocked) to test
basic inference, client exception propagation, system_instruction
override, empty context fallback, and max_tokens override.
2026-03-20 21:34:22 -04:00
Paul Kompfner
4d4e56cfef test: add run_inference tests for OpenAIResponsesLLMService
Tests cover basic inference, client exception propagation,
system_instruction override, and max_tokens override.
2026-03-20 21:34:22 -04:00
Paul Kompfner
05e1d9f514 docs: add changelog for OpenAI Responses API service 2026-03-20 21:34:22 -04:00
Paul Kompfner
4d548117fa feat: add OpenAI Responses API LLM service
Add OpenAIResponsesLLMService using the Responses API, with a dedicated
adapter that converts LLMContext messages to Responses API input items
(system→developer, tool_calls→function_call, tool→function_call_output,
multimodal content conversion, and tools schema flattening).

- New adapter: open_ai_responses_adapter.py
- New service: openai/responses/llm.py
- Examples: 07-interruptible and 14-function-calling variants
- 19 unit tests for adapter conversion logic
- Eval entries for both examples
2026-03-20 21:34:22 -04:00
Paul Kompfner
b5c2d41ba3 Remove changelog fragment that no longer applies after a rebase 2026-03-20 21:34:22 -04:00
Paul Kompfner
dba2fc5451 Clarify SyncParallelPipeline docstrings
Rewrite docstrings to more clearly explain what SyncParallelPipeline
does: hold all output until every parallel branch finishes, so frames
produced in response to a single input are released together.
2026-03-20 21:34:22 -04:00
Paul Kompfner
0a4acfa294 Add frame_order parameter to SyncParallelPipeline
Adds a FrameOrder enum with ARRIVAL (default, existing behavior) and
PIPELINE (pushes frames in pipeline definition order). This lets callers
guarantee output ordering between parallel pipelines — e.g. ensuring
image frames precede audio frames — without needing a separate reordering
processor downstream.

Updates the 05-sync-speech-and-image example to use FrameOrder.PIPELINE,
removing the ImageBeforeAudioReorderer class entirely.
2026-03-20 21:34:22 -04:00
Paul Kompfner
ffdf629535 Add changelog entry for Whisker debugger fix 2026-03-20 21:34:22 -04:00
Paul Kompfner
a6b94c7424 Add changelog entries for PR #4029 2026-03-20 21:34:22 -04:00
Paul Kompfner
d2341e0199 Add ImageBeforeAudioReorderer to sync-speech-and-image example
Add a processor after SyncParallelPipeline that ensures each image frame
precedes its corresponding TTS audio frames. SyncParallelPipeline batches
them together but doesn't guarantee branch ordering. The reorderer detects
when TTS frames arrive before their image (via context_id tracking) and
holds them until the image arrives.

Also rename ImageAudioSync to MarkImageForPlaybackSync for clarity.
2026-03-20 21:34:22 -04:00
Paul Kompfner
4b66dd444b Revert a couple of logs that were changed from trace to debug just for debugging 2026-03-20 21:34:22 -04:00
Paul Kompfner
7b859423ab Use TextAggregationMode.TOKEN in the 05-sync-speech-and-image
example since the SentenceAggregator already provides complete sentences.
2026-03-20 21:34:22 -04:00
Paul Kompfner
b68495ce0a Add sync_with_audio support for OutputImageRawFrame
Add a `sync_with_audio` field to `OutputImageRawFrame` that routes image
frames through the audio queue in the output transport, ensuring images
are only displayed after all preceding audio has been sent. This enables
proper audio/image synchronization in pipelines like the calendar month
narration example.

Update the 05-sync-speech-and-image example to use an `ImageAudioSync`
processor that sets this flag on image frames.
2026-03-20 21:34:22 -04:00
Paul Kompfner
f39472b150 Fix SyncParallelPipeline race condition with concurrent SystemFrame processing
The FrameProcessor two-queue architecture processes SystemFrames and
non-SystemFrames on separate concurrent async tasks. Both paths called
SyncParallelPipeline.process_frame(), which used the same per-pipeline
sink queues. A SystemFrame's wait_for_sync could steal frames from a
concurrent non-SystemFrame's wait_for_sync, corrupting synchronization
and stalling the pipeline.

This was triggered by the auto-embedded RTVI processor (added in
v0.0.101) which floods OutputTransportMessageUrgentFrame SystemFrames
through the pipeline during LLM responses.

Fix: SystemFrames (except EndFrame) now take a fast path — passed
through internal pipelines and pushed downstream directly without
touching the sink queues or drain logic. EndFrame retains the full
drain behavior as a lifecycle frame.
2026-03-20 21:34:21 -04:00
Paul Kompfner
a8ea176ea3 Minor comment typo fix 2026-03-20 21:34:21 -04:00
Paul Kompfner
12cb9599ad Fix bug resulting in SyncParallelPipeline breaking the Whisker debugger 2026-03-20 21:34:21 -04:00
filipi87
167f008e47 Mentioning the frame order fix in the changelog. 2026-03-20 21:34:21 -04:00
filipi87
fe8cb2f4e0 Always appending TTSTextFrame to the audio context. 2026-03-20 21:34:21 -04:00
filipi87
cdf44f7a3f Fixing the frame ordering of the AggregatedTextFrame. 2026-03-20 21:34:21 -04:00
filipi87
d32a8a9ee2 Fixing TTS frame order. 2026-03-20 21:34:21 -04:00
joachimchauvet
ed160fd2e0 fix(livekit): suppress InvalidState log spam from audio mixer during interruptions 2026-03-20 21:34:21 -04:00
aconchillo
84eddb64d5 Update changelog for version 0.0.106 2026-03-20 21:34:21 -04:00
Aleix Conchillo Flaqué
189249caec Add missing on_dtmf_event callback to Tavus transport
The on_dtmf_event callback was added to DailyCallbacks in #4047 but
the Tavus transport was not updated, causing a missing argument error.
2026-03-20 21:34:21 -04:00
Filipi da Silva Fuchter
3c90468e03 Fixed the ordering of _maybe_pause_frame_processing call in TTSService (#4071)
* Fixing the invocation of pause_frame_processing at the correct time when receiving LLMFullResponseEndFrame and EndFrame.
2026-03-20 21:34:21 -04:00
Mark Backman
98d3f697f1 Add WakePhraseUserTurnStartStrategy (#4064)
- Add WakePhraseUserTurnStartStrategy for gating interaction behind wake                                                                            
  phrase detection, with timeout and single_activation modes                                                                                        
- Add default_user_turn_start_strategies() and                                                                                                      
  default_user_turn_stop_strategies() helper functions                                                                                              
- Deprecate WakeCheckFilter in favor of the new strategy
- Extend ProcessFrameResult to stop strategies for short-circuit evaluation
- Fix MinWordsUserTurnStartStrategy including filtered text in output
2026-03-20 21:34:21 -04:00
Mark Backman
b9d996ff41 Improvements for Nova Sonic LLM and TTS output frames (#4042)
* Fix empty user transcription causing spurious interruption in Nova Sonic

Skip _report_user_transcription_ended() when _user_text_buffer is empty,
which happens when the initial prompt is text-only. Previously, an empty
TranscriptionFrame was pushed upstream, triggering a chain reaction:
on_user_turn_stopped → UserStartedSpeakingFrame → interruption →
premature BotStoppedSpeaking → multiple response start/stop cycles.

* Improve TextFrame and assistant end of turn logic

Now, SPECULATIVE text results are used to push the LLMTextFrame,
AggregatedTextFrame, and TTSTextFrame. Additionally, the TTSTextFrames
are push at the end of the corresponding audio segment. 

* Remove BotStoppedSpeakingFrame fallback from Nova Sonic

Now that assistant response end is detected directly from Nova Sonic
contentEnd events (END_TURN and INTERRUPTED), the BotStoppedSpeakingFrame
handler is no longer needed. Inline the cleanup logic in reset_conversation.
2026-03-20 21:34:21 -04:00
Mark Backman
5de4256ab1 GradiumSTTService improvements (#4066)
* Remove duplicate reconnection logic from Gradium STT

The _receive_messages method had its own while-True reconnect loop,
duplicating the reconnection handling already provided by
WebsocketService._receive_task_handler (exponential backoff, max
retries, error reporting). Flatten to just the inner message loop
and let the base class handle reconnection.

* Align Gradium STT VAD handling with base class patterns

Replace the process_frame override with a _handle_vad_user_stopped_speaking
override, which is the proper hook provided by STTService. Move
start_processing_metrics() into run_stt (matching Gladia's pattern).
Remove unused FrameDirection and VADUserStartedSpeakingFrame imports.

* Add transcript aggregation delay after flushed to capture trailing tokens

Gradium flushed response can arrive before all text tokens have been
delivered. Instead of finalizing immediately on flushed, start a short
timer (100ms) that allows trailing tokens to accumulate before pushing
the final TranscriptionFrame.

* Add changelog for PR #4066

* Change default encoding to pcm_16000

* Decouple encoding from sample_rate in Gradium STT

The encoding parameter now takes just the base type (pcm, wav, opus)
and the sample rate is derived from the pipeline audio_in_sample_rate,
assembled dynamically via input_format_from_encoding(). This fixes the
mismatch where SAMPLE_RATE=24000 was passed to the base class while
encoding defaulted to pcm_16000.
2026-03-20 21:34:21 -04:00
Mark Backman
e2e0d9f8c4 fix: pass list-type Deepgram settings as lists instead of stringifying
List-valued settings like keyterm, keywords, search, redact, and replace
were being converted to strings before being passed to the SDK connect()
method. The SDK expects lists so its encode_query can produce repeated
query params (keyterm=a&keyterm=b).
2026-03-20 21:34:21 -04:00
Mark Backman
4c10fab0c9 Add changelog for #4046 2026-03-20 21:34:21 -04:00
Mark Backman
b610ba0aa5 Fix OpenAI STT crash when language is a plain string instead of Language enum 2026-03-20 21:34:21 -04:00
Mark Backman
d7d6ad6e96 Fix SonioxSTTService crash when language_hints contains plain strings (#4045)
Refactor language_to_soniox_language to use resolve_language + LANGUAGE_MAP
pattern consistent with other services. Fix resolve_language fallback to use
str(language) instead of language.value so plain strings don't crash.
2026-03-20 21:34:21 -04:00
Mark Backman
7eedd5929d Add changelog for #4026 2026-03-20 21:34:21 -04:00
Mark Backman
490e460c4b Fix DeepgramSTTService base_url forcing HTTPS/WSS schemes
The base_url parameter previously forced wss:// and https:// schemes,
breaking air-gapped or private deployments that need ws:// or http://.
Extract URL derivation into _derive_deepgram_urls() helper that respects
the developers scheme choice while deriving the paired WebSocket and
HTTP URLs the Deepgram SDK requires.

Closes #4019
2026-03-20 21:34:21 -04:00
Mark Backman
e1ce74c7a5 Fix deprecation warning when using filter_incomplete_user_turns 2026-03-20 21:34:21 -04:00
Mark Backman
5faac08d36 docs: add changelog for #4058 2026-03-20 21:34:21 -04:00
Mark Backman
4171a75f79 fix: resolve raw language strings through Language enum for proper service conversion
Raw strings like "de-DE" passed as the language parameter to TTS/STT services
were bypassing the Language enum resolution logic, causing silent failures
(e.g. ElevenLabs expects "de" not "de-DE"). Now raw strings are first converted
to Language enums so they go through the same resolve_language() path, with a
warning logged for unrecognized strings.
2026-03-20 21:34:21 -04:00
Mark Backman
fa345a510f Add changelog for #4057 2026-03-20 21:34:21 -04:00
Mark Backman
55fb274d5a Fix stale state in user turn stop strategies between turns
Reset stop strategies at turn start (not just turn stop) so that late
transcriptions arriving between turns do not leave stale _text that
causes premature stops on the next turn. Also cancel pending timeout
tasks in reset() for both SpeechTimeout and TurnAnalyzer strategies.
2026-03-20 21:34:21 -04:00
Mark Backman
fffb16ad39 Update uv.lock with pyasn1 v0.6.3 2026-03-20 21:34:20 -04:00
Mark Backman
9a32364b34 feat: add enable_dialout parameter to configure() for dial-out rooms
Expose enable_dialout as a configure() parameter (default False) so
dial-out examples can opt in without needing to build DailyRoomProperties
manually.
2026-03-20 21:34:20 -04:00
Mark Backman
732afde3ea fix: clean up configure() type hints, deduplicate token expiry, and improve comment
Narrow misleading Optional type hints on parameters that never accept
None, extract the duplicated token_exp_duration * 60 * 60 calculation,
remove unnecessary forward-reference quotes on DailyMeetingTokenProperties,
and clarify why enable_dialout is explicitly set to False.
2026-03-20 21:34:20 -04:00
copilot-swe-agent[bot]
e5215a636f fix: set enable_dialout to False in PSTN runner to prevent room creation failures
Co-authored-by: jamsea <614910+jamsea@users.noreply.github.com>
2026-03-20 21:34:20 -04:00
copilot-swe-agent[bot]
c0bc94a9ce Initial plan 2026-03-20 21:34:20 -04:00
Julien Vantyghem
d26f512ba3 update docstring following https://github.com/pipecat-ai/pipecat/pull/3916 2026-03-20 21:34:20 -04:00
Blaine Kasten
fe84a881dd turn off server vad 2026-03-20 11:17:38 -05:00
Blaine Kasten
591c02fb0e a few updates 2026-03-19 13:37:21 -05:00
Blaine Kasten
077610184d Add together STT and TTS services 2026-03-17 07:24:02 -05:00