Files
pipecat/examples/multi-worker/ui-worker/pointing/README.md
Mark Backman 81b956d963 Add pointing UIWorker example
The voice LLM delegates to a ReplyToolMixin UIWorker that scrolls offscreen
items into view and highlights the phones it names — exercising the scroll_to /
highlight UI commands and the [offscreen] state tag.
2026-05-21 23:20:40 -04:00

82 lines
2.6 KiB
Markdown

# pointing
The UIWorker finds items on the page and points at them. A grid of
phone listings tall enough that several rows sit below the fold; the
user asks for one by name and the worker scrolls it into view and
flashes it.
## What it shows
- The `scroll_to` and `highlight` UI commands round-tripping
end-to-end: the `UIWorker` emits them, the native bridge in
`PipelineWorker` translates them to RTVI frames, and the client
handler resolves the snapshot ref and acts on the live DOM.
- `ReplyToolMixin`'s visual fields — `reply(answer, scroll_to=...,
highlight=[...])`. One tool call per turn; `answer` is required so
the model can't forget the spoken reply.
- The `[offscreen]` state tag the client emits, and the LLM reading it
to decide whether a scroll is needed before highlighting.
## What it adds vs. `hello-snapshot`
`hello-snapshot` proved the worker can *read* the page. This one proves
it can *act* on the page. Same skeleton (voice LLM in the main pipeline
delegating to a `UIWorker` via a `respond` job); the new parts are the
`scroll_to` / `highlight` commands and the client handlers for them.
## Run
Two terminals.
**Terminal 1 — bot:**
```bash
cd examples/multi-worker/ui-worker/pointing
uv run python bot.py
```
The bot starts on `http://localhost:7860`.
**Terminal 2 — client:**
```bash
cd examples/multi-worker/ui-worker/pointing/client
npm install # one-time
npm run dev
```
Open `http://localhost:5173` and click **Connect**.
## What to try
The page renders 20 phone cards in a responsive grid; the bottom rows
usually land below the fold. Try:
- _"Where's the iPhone 17?"_ — the worker scrolls the card into view and
flashes it.
- _"Scroll to the Pixel 9 Pro."_ — same flow, different ref.
- _"Which one is the Nothing phone?"_ — if it's already visible, the
worker just highlights without scrolling.
- _"Which phones are from Google?"_ — a descriptive question; the worker
highlights each phone it names.
- _"What's the cheapest one?"_ — the worker names and highlights it.
Watch the bot logs: each turn shows the main LLM calling
`answer_about_screen`, then the UIWorker's LLM emitting one `reply`
(scroll/highlight + the spoken answer).
## Requirements
- `OPENAI_API_KEY`
- `DEEPGRAM_API_KEY`
- `CARTESIA_API_KEY`
A `.env` in the example folder is the easiest way to set these (see
`examples/multi-worker/env.example`).
## What this example _doesn't_ show
Form filling (see `form-fill/`), selection-based deixis (see `deixis/`),
async task cards (see `async-tasks/`), or custom command handlers beyond
the standard `scroll_to` / `highlight`.