Lays groundwork for cancel_on_interruption=False support on Gemini Live by restructuring _process_completed_function_calls to match the shape used by AWSNovaSonicLLMService and OpenAIRealtimeLLMService in #4441: a single-pass forward iteration over raw context messages that detects async-tool messages via async_tool_messages.parse_message and routes them — started skipped silently, intermediate logged-as-error and surfaced via push_error, final delivered via the formal FunctionResponse channel. Replaces the prior two-pass structure that went through the adapter for sync results — the service now uses a lightweight self._tool_call_id_to_name map (populated when the model issues tool calls) for the name lookup the adapter used to provide. Extracts a new GeminiLLMAdapter.to_function_response_dict static method for the dict-coercion logic that wraps non-dict tool returns as {value: <result>} for Gemini's FunctionResponse.response field; the adapter's existing inline copy in _from_standard_message uses it too. Example consolidation: - Folds realtime-gemini-live-function-calling.py into the base realtime-gemini-live.py example so the base exercises function calling out of the box (matching realtime-openai.py and realtime-aws-nova-sonic.py). - Renames realtime-gemini-live-vertex-function-calling.py to realtime-gemini-live-vertex.py, mirroring the consolidation. - Adds realtime-gemini-live-async-tool.py. - Updates scripts/evals/run-release-evals.py for the renames. This commit alone doesn't make cancel_on_interruption=False fully work on Gemini Live — additional investigation is pending. This is foundational work to be built on.
Pipecat Evals
This directory contains a set of utilities to help test Pipecat, specifically its examples.
Release Evals
Before any Pipecat release, we make sure that all (or most) of the examples work flawlessly. We have 100+ examples, and checking each one manually was very time-consuming (and painful!), especially because we aim to release often.
To make this process easier, we designed these "release evals," which do the following:
- Start one of the foundational examples (the user bot)
- Start an eval bot
The user bot (i.e. the example) introduces itself, and the eval bot then asks a question. The user bot replies, and the eval bot verifies the response.
For example, the eval bot might ask:
"What's 2 plus 2?"
The user bot replies:
"2 plus 2 is 4."
The eval bot (powered by an LLM) evaluates the response and emits a result. It also explains why it thinks the answer is valid or invalid.
To run the release evals:
uv run run-release-evals.py -a -v
This runs all the evals and stores logs and audio (-a) for each test.
You can also specify which tests to run. For example, to run all 07 series
tests:
uv run run-release-evals.py -p 07 -a -v
Script Evals
You can also run evals for a single example (not part of the release set):
uv run run-eval.py -p "A simple math addition" -a -v YOUR_EXAMPLE_SCRIPT
Your script needs to follow any of the foundation examples pattern.