Replaces the prior "log a warning and skip" approach with actual handling
of async-tool messages on Ultravox.
The catch with Ultravox is that its API freezes the conversation between
client_tool_invocation and the matching client_tool_result — there's no
"keep talking while the tool runs" channel like NON_BLOCKING on Gemini
or function_call_output-without-blocking on OpenAI Realtime. So:
- When the model invokes an async-registered function (cancel_on_inter
ruption=False), the service immediately ships a placeholder
client_tool_result that tells the model "the actual result isn't
ready yet; a follow-up will arrive shortly; keep the conversation
going". This unfreezes the conversation. The placeholder is sent
from _handle_tool_invocation, since the started async-tool message
doesn't reach the context-frame path until later.
- When the real tool finishes, the final async-tool message lands in
the context. _handle_context now forward-iterates and routes
async-tool messages: started is a no-op (placeholder already sent),
intermediate is logged-as-error and dropped (matching the other
realtime services), and final is injected as user-side text via
user_text_message with bracketed framing — the only mechanism
Ultravox offers for adding non-tool input mid-conversation.
Hoists the registry-lookup helper to LLMService as
_function_is_async(name) so future services can use the same pattern
without re-implementing it.
Adds an async-tool example file for Ultravox modeled on the existing
ones for the other realtime services.