Reframe comments, docstrings, identifiers, changelog, and example
around a single explanation of the option: (1) turn strategies do not
consider user transcripts, letting the user turn end sooner, and (2)
the aggregator gathers user transcripts on its own after the turn
ends via a simple timer, then emits `on_user_turn_message_finalized`
with the new user context message.
The mechanism is generic, so internal aggregator vocabulary stays
generic ("transcript-gather", "after the user turn ends"); the
public-facing param docstring is the one place that explains the
"local turn detection drives a realtime service" use case. The stop
strategies' `wait_for_transcript` flag is pointed at as something
that's "usually flipped indirectly" by the aggregator param rather
than something to pair with it.
Renames internal state to match: `_expect_delayed_transcripts` →
`_aggregator_gathers_transcripts`, `_pending_finalization_*` →
`_transcript_gather_*`, `_finalize_delayed_user_message` →
`_finalize_user_message`, etc.
982 B
982 B
- Added
wait_for_transcript_to_end_user_turnonLLMUserAggregatorParamsfor pipelines where local turn detection drives a realtime service like Gemini Live. Set it to False to avoid unnecessary latency from transcript delay — the realtime service consumes user audio directly, so we don't need user transcripts in context before it can respond. The option makes it so that (1) turn strategies do not consider user transcripts, letting the user turn end sooner, and (2) user transcripts are then handled by the aggregator: a simple timer gives it time to gather those transcripts after the user turn ends, and once gathered, the aggregator emits a newon_user_turn_message_finalizedevent with the new user context message. The new event also fires in the default mode (coinciding withon_user_turn_stopped), so consumers that want the populated user transcript can subscribe to it uniformly. Seeexamples/realtime/realtime-gemini-live-local-vad.pyfor the full pattern.