Add document-review UIWorker example

Synthesis example: a ReplyToolMixin UIWorker adds a start_review tool that fans
out to clarity/tone peers via start_user_job_group, translates each reviewer
response into an add_note command in on_job_response, handles a client
note_click event via @on_ui_event, and keeps history across turns.
This commit is contained in:
Mark Backman
2026-05-21 17:21:08 -04:00
parent 07725429b2
commit 950fc10f05
9 changed files with 2790 additions and 0 deletions

View File

@@ -0,0 +1,119 @@
# document-review
The synthesis demo. A voice-driven workspace where the user reviews a
draft article — combining the patterns from every prior demo into one
application: snapshot reading, deixis (read + write), form-fill
state-changing actions, async job-group fan-out with progress streaming,
plus one custom command and one client-emitted event.
## What it shows
- **Read-side deixis**: select a paragraph, ask "review this", and the
worker grounds in the selected text.
- **Async fan-out**: a paragraph review spawns two peer workers (clarity
+ tone) in parallel via `start_user_job_group`. The in-flight card
streams each worker's progress.
- **Custom UI command**: as each worker completes, `on_job_response`
emits an `add_note` command with the worker's feedback; the client
renders a note attached to the reviewed paragraph.
- **State-changing actions**: dictating a note fills the textarea and
clicks Save (`fills` + `click` from the bundled `reply` tool).
- **Write-side deixis**: "where does it talk about rhythms?" → the worker
finds the paragraph and uses `select_text` to put the page selection
on it.
- **Client-emitted UI event**: clicking a note sends a `note_click` event
back; the worker's `@on_ui_event("note_click")` handler dispatches
`select_text` to jump to the paragraph. The round-trip event/command
pattern.
- **Two LLM tools coexisting**: `ReplyToolMixin`'s `reply` handles normal
turns; a custom `start_review` tool handles review kick-off. The prompt
steers the model to pick one (single tool call per turn).
- **`on_job_response` interception**: the worker overrides this hook to
translate reviewer responses into `add_note` commands — the peers don't
know they're driving a UI; the worker mediates.
## What's new vs. the prior demos
| Prior demo | Pattern |
|---|---|
| hello-snapshot | snapshot streaming, voice/UI delegation |
| pointing | scroll + multi-highlight |
| deixis | bidirectional text selection |
| form-fill | fills + click |
| async-tasks | job-group fan-out + cancel |
This one stitches all five together, plus the two patterns no prior demo
touched: a **custom UI command** (`add_note`) and a **custom
client-emitted event** (`note_click`).
## Run
Two terminals.
**Terminal 1 — bot:**
```bash
cd examples/multi-worker/ui-worker/document-review
uv run python bot.py
```
The bot starts on `http://localhost:7860`.
**Terminal 2 — client:**
```bash
cd examples/multi-worker/ui-worker/document-review/client
npm install # one-time
npm run dev
```
Open `http://localhost:5173` and click **Connect**.
## What to try
The article is a 6-paragraph draft seeded with one too-dense paragraph,
one too-vague one, and one with absolutist tone problems.
**Review flow (the centerpiece):**
- Select the run-on paragraph, say _"review this."_ — the worker
acknowledges, the in-flight card appears, both reviewers tick through
progress, and two notes attach to the paragraph (clarity flags the
density).
- Select the absolutist paragraph, say _"give me feedback."_ — tone
flags the strong words.
**Notes flow:**
- _"Add a note that this paragraph is too jargony."_ (with a paragraph
selected) — the worker fills the textarea and clicks Save.
- Click any note in the panel — the page scrolls and selects the
paragraph it was attached to.
**Navigation:**
- _"Where does it talk about structured rhythms?"_ — the worker jumps to
the paragraph by selecting it.
**Cancellation:**
- During a review, click Cancel on the in-flight card. The reviewers'
responses come back as `cancelled`; feedback that already arrived stays
as a note.
## Requirements
- `OPENAI_API_KEY`
- `DEEPGRAM_API_KEY`
- `CARTESIA_API_KEY`
A `.env` in the example folder is the easiest way to set these (see
`examples/multi-worker/env.example`).
## What this example does _not_ show
Real worker integrations (the reviewers compute simple text metrics — for
real LLM reviewers, swap them for `LLMWorker` subclasses whose
`on_job_request` runs the LLM with the paragraph text and a critique
prompt; everything else stays the same), note persistence, or
multi-document / multi-page flows.

View File

@@ -0,0 +1,509 @@
#
# Copyright (c) 2026, Daily
#
# SPDX-License-Identifier: BSD 2-Clause License
#
"""Document review — the synthesis demo.
A single workspace combining everything from the prior demos. The user
reviews a draft article. They can:
- Select a paragraph and ask for review. The UIWorker fans out to two
peer reviewers (clarity, tone) in parallel. Their progress streams to
an in-flight card, and each worker's feedback becomes a note attached
to the paragraph (a custom ``add_note`` command).
- Dictate their own notes by voice. The worker fills the notes textarea
and clicks Save (``fills`` + ``click`` via the bundled ``reply`` tool).
- Ask "where does it talk about X" and the worker uses ``select_text`` to
navigate.
- Click an existing note; the client emits a ``note_click`` UI event, and
the worker's ``@on_ui_event("note_click")`` handler jumps to the related
paragraph — the round-trip event/command pattern.
Architecture::
Main worker (PipelineWorker, owns transport + RTVI):
transport.in → STT → user_agg → LLM → TTS → transport.out → assistant_agg
└── answer_about_screen(query) tool
└── params.pipeline_worker.job("ui", name="respond", payload={query})
ReviewWorker (ReplyToolMixin + UIWorker, keep_history=True):
├── inherited reply (scroll_to, highlight, select_text, fills, click)
├── @tool start_review(answer, paragraph_ref, paragraph_text)
│ └── start_user_job_group("clarity", "tone", ...)
├── @on_ui_event("note_click") → select_text(ref)
└── on_job_response → emit add_note for each reviewer that completes
Two peer workers (BaseWorker each):
ClarityReviewer · ToneReviewer
The reviewers are simulated, like async-tasks: a few ``send_job_update``
progress lines, then a ``send_job_response`` with a final analysis
computed from simple text metrics (word/sentence counts, absolutist /
hedging words) so different paragraphs get different feedback without
real NLP.
Run::
uv run python bot.py
Then open the client at ``http://localhost:5173`` (see ``README.md``).
Requirements:
- OPENAI_API_KEY
- DEEPGRAM_API_KEY
- CARTESIA_API_KEY
"""
import asyncio
import os
import random
from dotenv import load_dotenv
from loguru import logger
from pipecat.adapters.schemas.tools_schema import ToolsSchema
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.bus.messages import BusJobRequestMessage, BusJobResponseMessage
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.base_worker import BaseWorker
from pipecat.pipeline.job_context import JobError, JobStatus
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.worker import PipelineParams, PipelineWorker
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments
from pipecat.runner.utils import create_transport
from pipecat.services.cartesia.tts import CartesiaTTSService
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.llm_service import FunctionCallParams
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat.transports.daily.transport import DailyParams
from pipecat.workers.llm import tool
from pipecat.workers.ui import ReplyToolMixin, UIWorker, on_ui_event
load_dotenv(override=True)
MAIN_NAME = "main"
transport_params = {
"daily": lambda: DailyParams(audio_in_enabled=True, audio_out_enabled=True),
"webrtc": lambda: TransportParams(audio_in_enabled=True, audio_out_enabled=True),
}
VOICE_PROMPT = """\
You are the voice layer of a document review assistant. A separate \
UI layer sees the page (the article and the notes panel) and writes \
the spoken reply.
For every user utterance about the document or the review (selecting \
paragraphs, asking for feedback, dictating notes, navigating), call \
``answer_about_screen`` with the user's request verbatim. The \
tool's response is the spoken reply, already TTS-ready.
Only respond directly for pure pleasantries (greetings, thanks, \
goodbyes). Keep direct replies to one short spoken sentence."""
# The UI wire-format guide (UI_STATE_PROMPT_GUIDE) is appended to the LLM's
# system instruction automatically by UIWorker, so this prompt only needs the
# app-specific behavior.
UI_PROMPT = """\
You are reviewing a draft article with the user. The current \
``<ui_state>`` block is in your context, and may contain a \
``<selection>`` block when the user has highlighted text.
## The hard rule
**Every turn MUST call exactly one tool: either ``reply`` or \
``start_review``.** Never respond with plain text. If the user \
asks something that doesn't need a visual action — including \
open questions like "how can we improve it?", "what do you think?", \
"any suggestions?" — call ``reply`` with the answer in the \
``answer`` field. The spoken response is whatever you put there. \
If you forget to call a tool, the user hears nothing and the turn \
times out.
You have two LLM tools:
## Tool: reply
For most turns. ``reply(answer, scroll_to=None, highlight=None, \
select_text=None, fills=None, click=None)``:
- ``answer`` (REQUIRED): the spoken reply, plain language, one or \
two short sentences.
- ``scroll_to`` (OPTIONAL): a snapshot ref. Scroll the element into \
view.
- ``select_text`` (OPTIONAL): a snapshot ref. Place the page's text \
selection on a paragraph (use this for "this paragraph" / "the \
section about X").
- ``highlight`` (OPTIONAL): list of refs. Brief flash. Rarely used \
here; ``select_text`` is usually better for paragraphs.
- ``fills`` (OPTIONAL): list of ``{"ref", "value"}`` objects. Fill \
the notes textarea (ref is in ``<ui_state>`` as the ``textbox``).
- ``click`` (OPTIONAL): list of refs to click. Use to click the \
Save button after filling the notes textarea.
## Tool: start_review
For "review this paragraph" / "give me feedback on this" requests. \
``start_review(answer, paragraph_ref, paragraph_text)``:
- ``answer`` (REQUIRED): brief acknowledgement spoken right away \
("Reviewing this paragraph").
- ``paragraph_ref`` (REQUIRED): the snapshot ref of the paragraph \
under review. When the user has a selection, use the selection's \
ref. Otherwise pick the right paragraph from ``<ui_state>``.
- ``paragraph_text`` (REQUIRED): the full paragraph text. Read it \
from the ``<selection>`` block when present, or from the ``name`` \
attribute on the paragraph node in ``<ui_state>``.
The server fans out two worker reviewers (clarity, tone) in \
parallel and streams progress to the page. As each worker finishes, \
their feedback becomes a note attached to the paragraph. You do NOT \
wait for results.
## Decision rules
- **"Review this", "give me feedback on this paragraph", "what do \
you think of this"** with a selection → ``start_review``.
- **"Review the third paragraph"** with no selection → use \
``<ui_state>`` to find the ref + text, call ``start_review``.
- **"Add a note: …"** or any dictated note content → use ``reply`` \
with ``fills`` for the notes textarea and ``click`` on the Save \
button. The note will automatically attach to whichever article \
paragraph the user last selected.
- **"Where does it talk about X"** → ``reply`` with ``scroll_to`` + \
``select_text`` to navigate to the matching paragraph.
- **"Read me back the notes"** / **"What did you say about \
paragraph 3"** → ``reply`` with answer text only; the notes panel \
is in ``<ui_state>`` so you can summarize from it.
- **General questions about the draft** ("how can we improve it?", \
"what do you think?", "any suggestions?", "what's missing?") → \
``reply`` with the answer text only. Put your suggestions / \
opinions / analysis directly in the ``answer`` field; that becomes \
the spoken reply.
## Examples
(refs are illustrative; use actual refs from the current snapshot)
- User has selected paragraph e8, says "Review this."\
``start_review(answer="Reviewing this paragraph.", paragraph_ref="e8", paragraph_text="The asynchronous-first model that emerged...")``
- "Add a note that this is too dense" with paragraph e8 selected → \
``reply(answer="Noted.", fills=[{"ref": "<textarea_ref>", "value": "This paragraph is too dense."}], click=["<save_button_ref>"])``
- "Where does it talk about rhythms?"\
``reply(answer="Here, in this paragraph.", scroll_to="e14", select_text="e14")``"""
# ─────────────────────────────────────────────────────────────────────
# Peer workers: simulated reviewers that compute simple text metrics and
# send back a plausible-sounding review. The analysis is canned but
# varies per paragraph based on actual properties of the text.
# ─────────────────────────────────────────────────────────────────────
class _SimulatedReviewer(BaseWorker):
"""Base for the two simulated reviewers."""
source_name: str = "reviewer"
def review(self, text: str) -> str:
return ""
async def on_job_request(self, message: BusJobRequestMessage) -> None:
await super().on_job_request(message)
job_id = message.job_id
text = str((message.payload or {}).get("text", "")).strip()
try:
await asyncio.sleep(random.uniform(0.4, 0.9))
await self.send_job_update(job_id, {"text": f"reading {len(text.split())} words"})
await asyncio.sleep(random.uniform(0.5, 1.1))
await self.send_job_update(job_id, {"text": f"checking {self.source_name}"})
await asyncio.sleep(random.uniform(0.4, 0.9))
feedback = self.review(text) or "(no notes)"
await self.send_job_response(job_id, response={"feedback": feedback})
except asyncio.CancelledError:
raise
class ClarityReviewer(_SimulatedReviewer):
"""Comments on density, sentence length, and structural issues."""
source_name = "clarity"
def review(self, text: str) -> str:
words = len(text.split())
# Cheap sentence count: terminal punctuation.
sentences = max(1, sum(1 for ch in text if ch in ".!?"))
avg = words / sentences
if avg > 35:
return (
f"This passage runs {words} words across just {sentences} "
f"sentence(s) (~{avg:.0f} words each). Consider breaking "
"it into smaller units; the reader is asked to hold a lot "
"in working memory."
)
if words < 25:
return (
f"Brief at {words} words. If this is a key idea, consider "
"expanding with one concrete example."
)
if avg < 12:
return (
f"Sentences average {avg:.0f} words. This is fine, "
"sometimes preferable, but watch for choppiness if "
"several short ones run in a row."
)
return (
f"Density is reasonable at ~{avg:.0f} words per sentence across {sentences} sentences."
)
class ToneReviewer(_SimulatedReviewer):
"""Comments on hedging, overstatement, and word choice."""
source_name = "tone"
ABSOLUTIST = (
"simply",
"anyone who",
"unanimous",
"always",
"never",
"obviously",
"comprehensively",
)
HEDGES = ("might", "perhaps", "seems", "appears", "could", "may")
def review(self, text: str) -> str:
lower = text.lower()
absolutes = [w for w in self.ABSOLUTIST if w in lower]
hedges = [w for w in self.HEDGES if w in lower]
if absolutes:
sample = ", ".join(repr(w) for w in absolutes[:3])
return (
f"Strong words flagged: {sample}. If the claim is contested "
"or the evidence is mixed, some hedging would read as more "
"credible."
)
if len(hedges) >= 4:
return (
f"Heavy hedging — I count {len(hedges)} hedge words. Fine "
"for an exploratory section, but if you mean to commit to "
"a claim, the hedges weaken it."
)
return "Tone reads as measured. No flags."
# ─────────────────────────────────────────────────────────────────────
# Review UI worker.
# ─────────────────────────────────────────────────────────────────────
class ReviewWorker(ReplyToolMixin, UIWorker):
"""UIWorker that drives the document review workspace.
Composes ``ReplyToolMixin`` for the bundled reply tool and adds a
``start_review`` tool for kicking off paragraph review. A
``@on_ui_event("note_click")`` handler converts client-side note
clicks into ``select_text`` navigation. ``on_job_response`` is
overridden to translate each reviewer's response into an ``add_note``
UI command so feedback shows up in the notes panel as it lands.
``keep_history=True`` so the worker can resolve deixis like "can we
add a note for that?" against its own prior replies.
"""
def __init__(self):
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(system_instruction=UI_PROMPT),
)
super().__init__("ui", llm=llm, keep_history=True)
# job_id -> {"paragraph_ref": "..."}; lets on_job_response know
# which paragraph a reviewer's feedback belongs to.
self._reviews: dict[str, dict] = {}
@tool
async def start_review(
self,
params: FunctionCallParams,
answer: str,
paragraph_ref: str,
paragraph_text: str,
):
"""Kick off a parallel review of one paragraph.
Spawns the clarity and tone workers via ``start_user_job_group``.
Workers run in the background; their progress is forwarded to the
page automatically. As each completes, ``on_job_response``
translates the response into an ``add_note`` UI command.
Args:
answer: A short spoken acknowledgement ("Reviewing this
paragraph").
paragraph_ref: The snapshot ref of the paragraph under
review.
paragraph_text: The paragraph's text content. Workers analyze
this directly.
"""
logger.info(f"{self}: start_review(ref={paragraph_ref!r})")
job_id = await self.start_user_job_group(
"clarity",
"tone",
payload={"ref": paragraph_ref, "text": paragraph_text},
label=f"Reviewing ¶ {paragraph_ref}",
)
# Remember which paragraph this review is for so we can attach
# each worker's response to the right note.
self._reviews[job_id] = {"paragraph_ref": paragraph_ref}
await self.respond_to_job(speak=answer)
await params.result_callback(None)
async def on_job_response(self, message: BusJobResponseMessage) -> None:
"""Turn reviewer responses into ``add_note`` UI commands."""
await super().on_job_response(message)
review = self._reviews.get(message.job_id)
if not review:
return
if message.status != JobStatus.COMPLETED:
return
feedback = ((message.response or {}).get("feedback") or "").strip()
if not feedback:
return
await self.send_command(
"add_note",
{
"source": message.source,
"ref": review["paragraph_ref"],
"text": feedback,
},
)
@on_ui_event("note_click")
async def on_note_click(self, message) -> None:
"""User clicked a note in the panel; jump to its paragraph."""
ref = (message.payload or {}).get("ref")
if not isinstance(ref, str) or not ref:
return
logger.info(f"{self}: note_click → select_text({ref!r})")
await self.scroll_to(ref)
await self.select_text(ref)
async def answer_about_screen(params: FunctionCallParams, query: str):
"""Forward the user's request to the screen-aware review worker.
Args:
query (str): The user's request, passed verbatim.
"""
logger.info(f"answer_about_screen('{query}')")
try:
async with params.pipeline_worker.job(
"ui", name="respond", payload={"query": query}, timeout=10
) as t:
pass
except JobError as e:
logger.warning(f"ui job failed: {e}")
await params.result_callback("Something went wrong on my side.")
return
speak = (t.response or {}).get("speak")
await params.result_callback(speak or "I'm not sure how to answer that.")
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
logger.info("Starting document-review bot")
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
tts = CartesiaTTSService(
api_key=os.environ["CARTESIA_API_KEY"],
settings=CartesiaTTSService.Settings(
voice=os.getenv("CARTESIA_VOICE_ID", "71a7ad14-091c-4e8e-a314-022ece01c121"),
),
)
llm = OpenAILLMService(
api_key=os.environ["OPENAI_API_KEY"],
settings=OpenAILLMService.Settings(system_instruction=VOICE_PROMPT),
)
llm.register_direct_function(answer_about_screen, cancel_on_interruption=False, timeout_secs=30)
context = LLMContext(tools=ToolsSchema(standard_tools=[answer_about_screen]))
aggregators = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
aggregators.user(),
llm,
tts,
transport.output(),
aggregators.assistant(),
]
)
worker = PipelineWorker(
pipeline,
name=MAIN_NAME,
params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
)
@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
logger.info("Client connected")
context.add_message(
{
"role": "developer",
"content": (
"Greet the user briefly. Tell them they can select any "
"paragraph and ask you to review it, dictate notes, or "
"navigate the draft. One short sentence."
),
}
)
await worker.queue_frame(LLMRunFrame())
@transport.event_handler("on_client_disconnected")
async def on_client_disconnected(transport, client):
logger.info("Client disconnected")
await runner.cancel()
await runner.launch_worker(ReviewWorker())
await runner.launch_worker(ClarityReviewer("clarity"))
await runner.launch_worker(ToneReviewer("tone"))
await runner.launch_worker(worker)
await runner.run()
async def bot(runner_args: RunnerArguments):
"""Main bot entry point compatible with Pipecat Cloud."""
transport = await create_transport(runner_args, transport_params)
await run_bot(transport, runner_args)
if __name__ == "__main__":
from pipecat.runner.run import main
main()

View File

@@ -0,0 +1,26 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*
node_modules
dist
dist-ssr
*.local
# Editor directories and files
.vscode/*
!.vscode/extensions.json
.idea
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
.vite

View File

@@ -0,0 +1,102 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Document review — UIAgent demo</title>
<link rel="stylesheet" href="./styles.css" />
</head>
<body>
<header>
<h1>Manuscript review</h1>
<button id="connect" type="button">Connect</button>
</header>
<main>
<section class="document" aria-label="Draft">
<article>
<h2>The quiet revolution of asynchronous work</h2>
<p class="lede">A draft. Select any paragraph and ask for review.</p>
<p>
Five years ago, working remotely was the unusual choice.
Today it is the unremarkable one. The shift happened too
quickly for most organizations to absorb its second-order
effects, and the conversation has barely caught up to where
work actually is.
</p>
<p>
The asynchronous-first model that emerged out of necessity in
2020 and has since become structural across knowledge work
has produced a strange paradox in which workers report higher
autonomy and satisfaction than at any point in the postwar
era while simultaneously reporting greater isolation, lower
trust in leadership, and a measurable decline in the quality
of collaboration on novel problems, which suggests not that
remote work is good or bad but that we have not yet learned
to use the tool we built ourselves.
</p>
<p>
There are real benefits and real costs. Different people
experience it differently. Each company has to find its own
way.
</p>
<p>
Anyone who claims that in-person work is more productive is
simply wrong. The research is unanimous. Decades of management
orthodoxy were built on assumptions that have been
comprehensively disproven.
</p>
<p>
What seems to work best are deliberately structured rhythms:
weekly all-hands video calls for shared context, written
async updates for status, in-person quarterly gatherings for
relationship building, and clear synchronous-only windows for
high-stakes decision making. Each rhythm has a different
cost; none is free.
</p>
<p>
The fight over whether remote work is here to stay has ended.
The interesting question now is what we have lost in the
transition that nobody noticed at the time, and what we are
about to lose if we do not redesign for the new shape of
work.
</p>
</article>
</section>
<aside class="notes" aria-label="Notes">
<h2>Notes</h2>
<form id="note-form">
<label for="note-input">Add a note</label>
<textarea
id="note-input"
name="note"
rows="3"
placeholder="Type or dictate a note…"
></textarea>
<button id="note-save" type="submit">Save</button>
</form>
<div id="notes-empty" class="empty-state">
No notes yet. Select a paragraph and ask the assistant to
review it, or dictate a note of your own.
</div>
<ol id="notes-list" aria-label="Existing notes"></ol>
</aside>
</main>
<div id="status" aria-live="polite"></div>
<audio id="bot-audio" autoplay data-a11y-exclude></audio>
<script type="module" src="./main.js"></script>
</body>
</html>

View File

@@ -0,0 +1,475 @@
/**
* Document review — vanilla JS client.
*
* Combines the patterns from every prior demo into one workspace:
*
* - Snapshot streaming (every demo).
* - ``scroll_to`` and ``select_text`` for the agent to point back at
* paragraphs (pointing + deixis).
* - ``set_input_value`` and ``click`` for dictating notes (form-fill).
* - ``ui-task`` envelopes for the in-flight review card with
* per-worker progress and a Cancel button (async-tasks).
* - One **custom command**, ``add_note``, registered locally.
* - One **client-emitted event**, ``note_click``, sent when the user
* clicks a note in the panel. The agent's
* ``@on_ui_event("note_click")`` handler drives ``select_text`` to
* navigate.
*/
import {
PipecatClient,
RTVIEvent,
findElementByRef,
findRefForElement,
} from "@pipecat-ai/client-js";
import { SmallWebRTCTransport } from "@pipecat-ai/small-webrtc-transport";
const BOT_URL = "http://localhost:7860/api/offer";
const connectButton = document.getElementById("connect");
const status = document.getElementById("status");
const botAudio = document.getElementById("bot-audio");
const noteInput = document.getElementById("note-input");
const noteForm = document.getElementById("note-form");
const notesList = document.getElementById("notes-list");
const notesEmpty = document.getElementById("notes-empty");
const articleEl = document.querySelector("article");
let client;
let unsubscribes = [];
// In-flight review groups, keyed by task_id. Rendered as cards above
// the notes list while running.
const reviewGroups = new Map();
// All notes ever added in this session (transient — not persisted).
// We use refs to find them and to drive the has-notes paragraph styling.
const notes = [];
// The last article paragraph the user selected. Tracked separately
// from window.getSelection() because the textarea steals selection
// focus when the user (or the agent) types into it. Updated only
// when the selection lands inside the article.
let lastArticleRef = null;
// Walk up from a node looking for the first ancestor that has a
// snapshot ref assigned. Used both at submit time and from the
// selection-tracker below.
function findRefForAncestor(node) {
let el = node && node.nodeType === 1 ? node : node?.parentElement ?? null;
while (el && el !== document.body) {
const ref = findRefForElement(el);
if (ref) return { ref, element: el };
el = el.parentElement;
}
return null;
}
document.addEventListener("selectionchange", () => {
const sel = document.getSelection();
if (!sel || sel.isCollapsed || !sel.anchorNode) return;
const found = findRefForAncestor(sel.anchorNode);
if (!found) return;
// Only remember selections inside the article column. Textarea /
// notes-pane selections shouldn't override it.
if (articleEl && articleEl.contains(found.element)) {
lastArticleRef = found.ref;
}
});
function setStatus(text, autoHideMs = 0) {
status.textContent = text;
status.dataset.show = text ? "1" : "0";
if (text && autoHideMs > 0) {
setTimeout(() => {
if (status.textContent === text) status.dataset.show = "0";
}, autoHideMs);
}
}
function refreshEmptyState() {
notesEmpty.hidden = notesList.children.length > 0 || reviewGroups.size > 0;
}
function resolveTarget(payload) {
if (payload?.ref) {
const el = findElementByRef(payload.ref);
if (el) return el;
}
if (payload?.target_id) {
return document.getElementById(payload.target_id);
}
return null;
}
// ─────────────────────────────────────────────
// Standard command handlers
// ─────────────────────────────────────────────
function handleScrollTo(payload) {
const el = resolveTarget(payload);
if (!el) return;
el.scrollIntoView({ behavior: "smooth", block: "center" });
}
function handleSelectText(payload) {
const el = resolveTarget(payload);
if (!el) return;
const range = document.createRange();
range.selectNodeContents(el);
const sel = window.getSelection();
if (!sel) return;
sel.removeAllRanges();
sel.addRange(range);
el.scrollIntoView({ behavior: "smooth", block: "center" });
}
function handleSetInputValue(payload) {
const el = resolveTarget(payload);
if (!el) return;
if (!(el instanceof HTMLInputElement || el instanceof HTMLTextAreaElement))
return;
if (el.disabled || el.readOnly || el.type === "hidden") return;
const value = String(payload?.value ?? "");
const replace = payload?.replace !== false;
el.value = replace ? value : (el.value || "") + value;
el.dispatchEvent(new Event("input", { bubbles: true }));
el.dispatchEvent(new Event("change", { bubbles: true }));
el.classList.remove("fill-flash");
void el.offsetWidth;
el.classList.add("fill-flash");
setTimeout(() => el.classList.remove("fill-flash"), 1200);
}
function handleClick(payload) {
const el = resolveTarget(payload);
if (!el) return;
if ("disabled" in el && el.disabled) return;
el.click();
}
// ─────────────────────────────────────────────
// Custom command: add_note
//
// Server emits this when a worker produces feedback, when the user's
// dictated note is committed, etc. Payload: {source, ref?, text}.
// We render a clickable card that — when clicked — sends a note_click
// UI event back to the server so the agent can respond by selecting
// the related paragraph.
// ─────────────────────────────────────────────
function handleAddNote(payload) {
const source = payload?.source ?? "me";
const ref = payload?.ref ?? null;
const text = String(payload?.text ?? "").trim();
if (!text) return;
const note = { source, ref, text };
notes.push(note);
const li = document.createElement("li");
li.className = "note";
if (ref) {
li.dataset.ref = ref;
li.tabIndex = 0;
li.title = "Click to jump to the paragraph";
}
const meta = document.createElement("div");
meta.className = "note-meta";
const sourceEl = document.createElement("span");
sourceEl.className = "note-source";
sourceEl.dataset.source = source;
sourceEl.textContent = source;
meta.appendChild(sourceEl);
if (ref) {
const refEl = document.createElement("span");
refEl.className = "note-ref";
refEl.textContent = `${ref}`;
meta.appendChild(refEl);
}
li.appendChild(meta);
const body = document.createElement("div");
body.className = "note-text";
body.textContent = text;
li.appendChild(body);
// Send a UI event when the user clicks the note. The server's
// @on_ui_event("note_click") handler turns it into a select_text
// command back to us — full round-trip, agent-driven.
if (ref) {
li.addEventListener("click", () => {
client?.sendUIEvent("note_click", { ref });
});
}
notesList.prepend(li);
refreshEmptyState();
// Mark the paragraph as having notes so it stands out in the
// document column.
if (ref) {
const para = findElementByRef(ref);
if (para) para.classList.add("has-notes");
}
}
// ─────────────────────────────────────────────
// In-flight review card (ui-task envelopes)
// ─────────────────────────────────────────────
function renderReviewCard(group) {
const card = document.createElement("div");
card.className = "review-card";
card.dataset.taskId = group.task_id;
const header = document.createElement("div");
header.className = "review-card-header";
const label = document.createElement("div");
label.className = "review-card-label";
label.textContent = group.label ?? `Review ${group.task_id.slice(0, 6)}`;
header.appendChild(label);
if (group.cancellable) {
const cancel = document.createElement("button");
cancel.type = "button";
cancel.className = "review-card-cancel";
cancel.textContent = "Cancel";
cancel.addEventListener("click", () => {
cancel.disabled = true;
cancel.textContent = "Cancelling…";
client?.cancelUITask(group.task_id, "user requested");
});
group.cancelButton = cancel;
header.appendChild(cancel);
}
card.appendChild(header);
const ul = document.createElement("ul");
ul.className = "review-workers";
for (const agent of group.agents) {
const li = document.createElement("li");
li.dataset.agent = agent;
const name = document.createElement("span");
name.className = "review-worker-name";
name.textContent = agent;
li.appendChild(name);
const update = document.createElement("span");
update.className = "review-worker-update";
update.textContent = "starting…";
li.appendChild(update);
const stat = document.createElement("span");
stat.className = "review-worker-status";
stat.dataset.status = "running";
stat.textContent = "running";
li.appendChild(stat);
ul.appendChild(li);
}
card.appendChild(ul);
group.cardEl = card;
group.listEl = ul;
return card;
}
function updateWorkerRow(group, agentName, { update, statusValue }) {
const li = group.listEl.querySelector(
`li[data-agent="${CSS.escape(agentName)}"]`,
);
if (!li) return;
if (update !== undefined) {
li.querySelector(".review-worker-update").textContent = update;
}
if (statusValue !== undefined) {
const stat = li.querySelector(".review-worker-status");
stat.dataset.status = statusValue;
stat.textContent = statusValue;
}
}
function handleTaskEnvelope(env) {
switch (env.kind) {
case "group_started": {
const group = {
task_id: env.task_id,
label: env.label,
cancellable: env.cancellable,
agents: env.agents,
ref: extractRefFromLabel(env.label),
};
reviewGroups.set(env.task_id, group);
// Place the in-flight card just below the new-note form so it
// sits visibly above the existing notes.
noteForm.insertAdjacentElement("afterend", renderReviewCard(group));
// Mark the paragraph as under review.
if (group.ref) {
const para = findElementByRef(group.ref);
if (para) para.classList.add("under-review");
}
refreshEmptyState();
break;
}
case "task_update": {
const group = reviewGroups.get(env.task_id);
if (!group) break;
const text = env.data?.text ?? JSON.stringify(env.data);
updateWorkerRow(group, env.agent_name, { update: text });
break;
}
case "task_completed": {
const group = reviewGroups.get(env.task_id);
if (!group) break;
updateWorkerRow(group, env.agent_name, {
update: env.status === "completed" ? "✓ done" : env.status,
statusValue: env.status,
});
break;
}
case "group_completed": {
const group = reviewGroups.get(env.task_id);
if (!group) break;
// Drop the in-flight card; the notes that arrived via add_note
// remain in the list.
group.cardEl.remove();
reviewGroups.delete(env.task_id);
if (group.ref) {
const para = findElementByRef(group.ref);
if (para) para.classList.remove("under-review");
}
refreshEmptyState();
break;
}
}
}
function extractRefFromLabel(label) {
// The server sends labels like "Reviewing ¶ e5". Extract the ref so
// we can mark the paragraph as under-review while workers run.
const m = (label ?? "").match(/¶\s+(\S+)/);
return m ? m[1] : null;
}
function onUICommand(command, handler) {
const listener = (data) => {
if (data.command !== command) return;
handler(data.payload);
};
client.on(RTVIEvent.UICommand, listener);
return () => client.off(RTVIEvent.UICommand, listener);
}
function onUITask(handler) {
client.on(RTVIEvent.UITask, handler);
return () => client.off(RTVIEvent.UITask, handler);
}
// ─────────────────────────────────────────────
// Form behavior
// ─────────────────────────────────────────────
// The user (or the agent via fills + click) submits a note. Pull the
// textarea content into a synthetic add_note so it shows up in the
// list, then clear the textarea. The note attaches to whichever
// article paragraph the user last selected (tracked via
// selectionchange above) — this works for both flows because the
// textarea's selection focus does NOT overwrite ``lastArticleRef``.
noteForm.addEventListener("submit", (e) => {
e.preventDefault();
const text = noteInput.value.trim();
if (!text) return;
handleAddNote({ source: "me", ref: lastArticleRef, text });
noteInput.value = "";
});
// ─────────────────────────────────────────────
// Connection lifecycle
// ─────────────────────────────────────────────
async function connect() {
connectButton.disabled = true;
setStatus("Connecting…");
client = new PipecatClient({
transport: new SmallWebRTCTransport(),
enableMic: true,
enableCam: false,
});
client.on(RTVIEvent.BotConnected, () => setStatus("Bot connected", 1500));
client.on(RTVIEvent.Disconnected, () => {
setStatus("Disconnected", 2000);
connectButton.dataset.state = "";
connectButton.textContent = "Connect";
connectButton.disabled = false;
teardownUI();
});
client.on(RTVIEvent.TrackStarted, (track, participant) => {
if (track.kind !== "audio") return;
if (participant?.local) return;
botAudio.srcObject = new MediaStream([track]);
});
unsubscribes = [
onUICommand("scroll_to", handleScrollTo),
onUICommand("select_text", handleSelectText),
onUICommand("set_input_value", handleSetInputValue),
onUICommand("click", handleClick),
onUICommand("add_note", handleAddNote),
onUITask(handleTaskEnvelope),
];
try {
await client.connect({ webrtcUrl: BOT_URL });
client.startUISnapshotStream();
connectButton.dataset.state = "connected";
connectButton.textContent = "Disconnect";
connectButton.disabled = false;
setStatus("Connected. Select a paragraph and ask 'review this'.", 5000);
} catch (err) {
console.error("Connect failed:", err);
setStatus(`Connect failed: ${err.message ?? err}`, 4000);
teardownUI();
connectButton.disabled = false;
}
}
async function disconnect() {
connectButton.disabled = true;
setStatus("Disconnecting…");
try {
await client?.disconnect();
} finally {
teardownUI();
connectButton.dataset.state = "";
connectButton.textContent = "Connect";
connectButton.disabled = false;
}
}
function teardownUI() {
client?.stopUISnapshotStream();
unsubscribes.forEach((unsubscribe) => unsubscribe());
unsubscribes = [];
if (botAudio.srcObject) botAudio.srcObject = null;
client = undefined;
}
connectButton.addEventListener("click", () => {
if (connectButton.dataset.state === "connected") {
disconnect();
} else {
connect();
}
});
refreshEmptyState();

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,18 @@
{
"name": "document-review-client",
"private": true,
"version": "0.1.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"@pipecat-ai/client-js": "1.9.0",
"@pipecat-ai/small-webrtc-transport": "^1.10.2"
},
"devDependencies": {
"vite": "^8"
}
}

View File

@@ -0,0 +1,406 @@
:root {
color-scheme: light;
font-family: system-ui, -apple-system, sans-serif;
--border: #d4d4d8;
--muted: #71717a;
--selection: #fde68a;
--note-bg: #fff;
--note-border: #e4e4e7;
--accent: #3b82f6;
--running: #3b82f6;
--completed: #16a34a;
--error: #dc2626;
--cancelled: #71717a;
}
* {
box-sizing: border-box;
}
body {
margin: 0;
background: #fafafa;
color: #18181b;
}
header {
position: sticky;
top: 0;
z-index: 10;
display: flex;
align-items: center;
justify-content: space-between;
padding: 1rem 1.5rem;
border-bottom: 1px solid var(--border);
background: #fff;
}
header h1 {
font-size: 1.125rem;
margin: 0;
}
#connect {
padding: 0.5rem 1rem;
border: 1px solid var(--border);
background: #fff;
border-radius: 6px;
cursor: pointer;
font-size: 0.875rem;
}
#connect:hover {
background: #f4f4f5;
}
#connect[data-state="connected"] {
background: #ef4444;
color: white;
border-color: #ef4444;
}
main {
display: grid;
grid-template-columns: minmax(0, 1fr) 24rem;
gap: 1.5rem;
max-width: 1200px;
margin: 0 auto;
padding: 1.5rem;
}
@media (max-width: 900px) {
main {
grid-template-columns: 1fr;
}
}
/* ─────────────────────────────────────────────
Document pane
───────────────────────────────────────────── */
.document {
background: #fff;
border: 1px solid var(--border);
border-radius: 8px;
padding: 2rem 2.25rem;
}
article h2 {
margin: 0 0 0.5rem;
font-size: 1.5rem;
font-family: Charter, Georgia, serif;
letter-spacing: -0.01em;
}
article .lede {
margin: 0 0 1.5rem;
font-size: 0.9375rem;
color: var(--muted);
font-style: italic;
}
article p {
margin: 0 0 1rem;
font-size: 1rem;
line-height: 1.65;
color: #27272a;
font-family: Charter, Georgia, serif;
scroll-margin-top: 6rem;
border-left: 2px solid transparent;
padding-left: 0.5rem;
margin-left: -0.5rem;
transition: border-color 0.4s;
}
article p.lede {
font-family: system-ui, -apple-system, sans-serif;
}
article p.under-review {
border-left-color: var(--running);
}
article p.has-notes {
border-left-color: #c7d2fe;
}
::selection {
background: var(--selection);
color: #18181b;
}
/* ─────────────────────────────────────────────
Notes pane
───────────────────────────────────────────── */
.notes {
position: sticky;
top: 5rem;
align-self: start;
background: #fff;
border: 1px solid var(--border);
border-radius: 8px;
padding: 1.25rem;
max-height: calc(100vh - 6rem);
overflow-y: auto;
}
.notes h2 {
margin: 0 0 1rem;
font-size: 0.8125rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.04em;
color: var(--muted);
}
#note-form {
display: flex;
flex-direction: column;
gap: 0.5rem;
margin-bottom: 1rem;
}
#note-form label {
font-size: 0.8125rem;
font-weight: 500;
color: #3f3f46;
}
#note-form textarea {
font: inherit;
font-size: 0.9375rem;
padding: 0.5rem 0.625rem;
border: 1px solid var(--border);
border-radius: 6px;
background: #fff;
width: 100%;
resize: vertical;
scroll-margin-top: 6rem;
transition:
border-color 0.15s,
box-shadow 0.15s,
background 0.4s;
}
#note-form textarea:focus {
outline: none;
border-color: var(--accent);
box-shadow: 0 0 0 3px rgba(59, 130, 246, 0.2);
}
#note-form textarea.fill-flash {
animation: field-fill-flash 1.2s ease-out;
}
@keyframes field-fill-flash {
0% {
background: var(--selection);
}
100% {
background: #fff;
}
}
#note-save {
align-self: flex-start;
padding: 0.4375rem 0.875rem;
font: inherit;
font-size: 0.875rem;
font-weight: 500;
background: #18181b;
color: white;
border: 1px solid #18181b;
border-radius: 6px;
cursor: pointer;
}
#note-save:hover {
background: #27272a;
}
.empty-state {
font-size: 0.8125rem;
color: var(--muted);
font-style: italic;
padding: 1rem;
border: 1px dashed var(--border);
border-radius: 6px;
text-align: center;
margin-bottom: 1rem;
}
.empty-state[hidden] {
display: none;
}
#notes-list {
list-style: none;
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
gap: 0.625rem;
}
.note {
background: var(--note-bg);
border: 1px solid var(--note-border);
border-radius: 6px;
padding: 0.625rem 0.75rem;
cursor: pointer;
transition: border-color 0.15s;
}
.note:hover {
border-color: var(--accent);
}
.note-meta {
display: flex;
justify-content: space-between;
margin-bottom: 0.25rem;
font-size: 0.6875rem;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--muted);
}
.note-source {
font-weight: 600;
}
.note-source[data-source="clarity"] {
color: #2563eb;
}
.note-source[data-source="tone"] {
color: #7c3aed;
}
.note-source[data-source="me"] {
color: #16a34a;
}
.note-text {
font-size: 0.875rem;
line-height: 1.45;
color: #3f3f46;
}
/* ─────────────────────────────────────────────
In-flight review card
───────────────────────────────────────────── */
.review-card {
background: #f9fafb;
border: 1px dashed var(--accent);
border-radius: 6px;
padding: 0.75rem 0.875rem;
margin-bottom: 0.75rem;
}
.review-card-header {
display: flex;
align-items: center;
justify-content: space-between;
gap: 0.75rem;
margin-bottom: 0.5rem;
}
.review-card-label {
font-size: 0.8125rem;
font-weight: 500;
color: #3f3f46;
}
.review-card-cancel {
padding: 0.1875rem 0.5rem;
border: 1px solid var(--border);
background: #fff;
border-radius: 4px;
cursor: pointer;
font-size: 0.6875rem;
color: var(--muted);
}
.review-card-cancel:hover {
background: #f4f4f5;
color: #18181b;
}
.review-card-cancel[disabled] {
opacity: 0.4;
cursor: not-allowed;
}
.review-workers {
list-style: none;
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
gap: 0.25rem;
}
.review-workers li {
display: flex;
align-items: baseline;
gap: 0.5rem;
font-size: 0.75rem;
}
.review-worker-name {
font-family: ui-monospace, "SF Mono", Menlo, monospace;
font-weight: 500;
min-width: 4.5rem;
color: #52525b;
}
.review-worker-update {
font-style: italic;
color: #52525b;
flex: 1;
}
.review-worker-status {
font-size: 0.6875rem;
font-weight: 500;
text-transform: uppercase;
letter-spacing: 0.04em;
}
.review-worker-status[data-status="running"] {
color: var(--running);
}
.review-worker-status[data-status="completed"] {
color: var(--completed);
}
.review-worker-status[data-status="cancelled"] {
color: var(--cancelled);
}
.review-worker-status[data-status="failed"],
.review-worker-status[data-status="error"] {
color: var(--error);
}
#status {
position: fixed;
bottom: 1rem;
right: 1rem;
padding: 0.5rem 0.75rem;
border-radius: 6px;
font-size: 0.8125rem;
background: #18181b;
color: white;
opacity: 0;
transition: opacity 0.2s;
pointer-events: none;
}
#status[data-show="1"] {
opacity: 1;
}

View File

@@ -0,0 +1,7 @@
import { defineConfig } from "vite";
export default defineConfig({
server: {
port: 5173,
},
});