Files

Paul Kompfner 272532a3ea Update examples, wherever possible, to use LLMContext and associated machinery instead of OpenAILLMContext and associated machinery.

With all these examples updated, we no longer need dedicated examples illustrating `LLMContext`, so they're removed.

Here’s where we *don’t* yet use `LLMContext` and associated machinery:
- Realtime services: OpenAI Realtime, Gemini Live, and AWS Nova Sonic (support coming soon)
- `GoogleLLMOpenAIBetaService` (it’s deprecated, so we didn’t bother adding support)
- `LLMLogObserver` (support coming soon)
- `GatedOpenAILLMContextAggregator` (support coming soon)
- `LangchainProcessor` (support coming soon)
- `Mem0MemoryService` (support coming soon)
- Examples that use LLM-specific tools definitions as opposed to `ToolsSchema` (these will be updated soon)
- Examples that rely `GoogleLLMContext.upgrade_to_google` (TBD what to do with these)

Examples that use `LLMLogObserver`:
- 30-

Examples that use `GatedOpenAILLMContextAggregator`:
- 22-

Examples that use `LangchainProcessor`:
- 07b-

Examples that use `Mem0MemoryService`:
- 37-

Examples that need updating to use `ToolsSchema`:
- 15-
- 15a-
- 20a-
- 20c-
- 20d-
- 22b-
- 22c-
- 33-
- 36-

Examples that use `GoogleLLMContext.upgrade_to_google`:
- 22d-
- 25-

2025-09-22 16:21:35 -04:00

assets

scripts(evals): add vision support

2025-08-11 20:06:24 -07:00

eval.py

scripts(evals): allow user to talk and only eval when needed

2025-09-06 19:19:08 -07:00

README.md

scripts(evals): update to use new runner function

2025-08-05 11:46:28 -07:00

run-eval.py

evals: allow running a single eval

2025-05-30 16:55:55 -07:00

run-release-evals.py

Update examples, wherever possible, to use LLMContext and associated machinery instead of OpenAILLMContext and associated machinery.

2025-09-22 16:21:35 -04:00

utils.py

evals: move scripts/release to script/evals and add README

2025-05-30 15:04:05 -07:00

README.md

Pipecat Evals

This directory contains a set of utilities to help test Pipecat, specifically its examples.

Release Evals

Before any Pipecat release, we make sure that all (or most) of the examples work flawlessly. We have 100+ examples, and checking each one manually was very time-consuming (and painful!), especially because we aim to release often.

To make this process easier, we designed these "release evals," which do the following:

Start one of the foundational examples (the user bot)
Start an eval bot

The user bot (i.e. the example) introduces itself, and the eval bot then asks a question. The user bot replies, and the eval bot verifies the response.

For example, the eval bot might ask:

"What's 2 plus 2?"

The user bot replies:

"2 plus 2 is 4."

The eval bot (powered by an LLM) evaluates the response and emits a result. It also explains why it thinks the answer is valid or invalid.

To run the release evals:

uv run run-release-evals.py -a -v

This runs all the evals and stores logs and audio (-a) for each test.

You can also specify which tests to run. For example, to run all 07 series tests:

uv run run-release-evals.py -p 07 -a -v

Script Evals

You can also run evals for a single example (not part of the release set):

uv run run-eval.py -p "A simple math addition" -a -v YOUR_EXAMPLE_SCRIPT

Your script needs to follow any of the foundation examples pattern.