Decouple context management from turn frames and transcripts when a
realtime LLM service drives the conversation. Three problems with today's
behavior:
- Some realtime services (Gemini Live, AWS Nova Sonic, Ultravox) emit
no UserStarted/StoppedSpeakingFrame at all, so the aggregator — which
writes user messages on those frames — doesn't write to context
correctly without them.
- The workaround (local VAD on the aggregator) generates turn
boundaries that don't match the provider's server-side ground truth,
and the per-service "do I need it?" rule is hard to keep straight.
- When local turn detection is the intended driver, turn-end strategies
still wait for transcripts on the latency critical path.
Add a realtime_service_mode: RealtimeServiceModeConfig | None = None
kwarg on LLMContextAggregatorPair. When set, the pair switches both
halves to trailing context writes: user messages are flushed on the first
assistant content frame, assistant messages on the next user transcript,
both halves on EndFrame. Turn-end strategies stop waiting for transcripts
by default. Two fine-grained boolean fields (context_writes_await_turns,
turns_await_transcripts) let callers dial back to cascade-style behavior
selectively; their invalid combination is rejected in __post_init__.
The bifurcation is dispatch-only: seven branch points across the two
halves, each at method entry, each delegating to a mode-pure private
method. Cross-half coordination uses an asyncio.Lock and a back-reference
shared by both halves; the assistant signals user.flush() on
LLMFullResponseStartFrame, and the user signals assistant.flush() on the
first new transcript after the assistant turn. The mechanism reuses the
existing push_aggregation() — no parallel write path.
Two new events fire when messages are flushed to context:
on_user_message_added and on_assistant_message_added. In cascade mode
they coincide with the existing turn-stopped events; in realtime mode
(where the turn-stopped event fires before the message is finalized)
they're the canonical way to subscribe to "context just updated, here's
the text."
UserTurnStoppedMessage.content is now typed str | None to reflect that
realtime mode fires the event with None.
When a RealtimeServiceMetadataFrame arrives and realtime_service_mode is
None, the aggregator logs a one-time INFO recommendation pointing users
at the option.