Files
engine-v5-pipecat-core/examples/webpage/README.md
2026-05-22 10:23:17 +08:00

107 lines
3.5 KiB
Markdown

# Webpage Example — Realtime Voice Chat
A self-contained browser client for the engine's product websocket
(`/ws-product`, protocol `va.ws.v1`).
## Features
- **Connect / Disconnect** to any `ws://` or `wss://` URL.
- **Microphone selector + mic on/off toggle** — available input devices
are listed with `enumerateDevices`, and getUserMedia is requested with
`echoCancellation`, `noiseSuppression`, and `autoGainControl` so the
browser handles AEC against the bot's voice.
- **Text composer** — type a message and press <kbd>Enter</kbd> to send
an `input.text` event (Shift+Enter for newline). Sending interrupts
any in-flight bot audio so the next reply is heard cleanly.
- **Chat history** rendered from `input.transcript.final` (you, when
spoken), streamed `response.text.delta` / `response.text.final`
(assistant — deltas arrive ahead of the synthesized audio), and locally
for text you submit (the engine doesn't echo text input back as a
transcript).
- **WebSocket log** panel for connection state and compact send/receive
events. Audio chunks are summarized so the UI does not flood.
- **Gapless TTS playback** by scheduling each `response.audio.delta`
chunk back-to-back on the AudioContext.
- **Live VU meter** + mic and bot activity indicators.
- **Clear** button to reset history.
No build step, no dependencies — just three files plus an AudioWorklet.
## Layout
```text
examples/webpage/
├── index.html
├── styles.css
├── app.js
└── pcm-recorder.worklet.js
```
## Run
1. Start the engine (default port `8000`):
```bash
cd AI-VideoAssistant-engine-v5-pipecat-minimal
source .venv/bin/activate
export OPENAI_API_KEY=...
uvicorn engine.main:app --host 127.0.0.1 --port 8000
```
2. Open the demo page served by the same process:
```text
http://127.0.0.1:8000/demo/
```
The default websocket URL is derived from the page host
(`ws://127.0.0.1:8000/ws-product`). Click **Connect**, pick a
microphone if needed, click **Enable mic**, and start speaking.
Mount path and on/off are controlled in `config.json`:
```json
"server": {
"serve_webpage": true,
"webpage_mount": "/demo"
}
```
Set `"serve_webpage": false` in production if you serve the UI elsewhere.
### Standalone static server (optional)
You can still serve the files from another port for UI-only iteration.
Add that origin to `server.cors_origins` in `config.json` if needed:
```bash
cd AI-VideoAssistant-engine-v5-pipecat-minimal/examples/webpage
python -m http.server 8080
```
Then open <http://localhost:8080> and point the URL field at
`ws://127.0.0.1:8000/ws-product`.
> The browser's mic API requires a secure context. `http://localhost`
> qualifies; if you serve from another host, use HTTPS and a `wss://`
> URL.
## Audio details
- Input: mono Float32 from `getUserMedia` is resampled in the
AudioWorklet to PCM16 mono @ 16 kHz, framed into 20 ms chunks, and
sent as **binary** websocket messages (the server accepts either
binary or the JSON+base64 form).
- Output: each `response.audio.delta` carries base64-encoded PCM16 @
16 kHz; chunks are decoded and scheduled back-to-back through Web
Audio. The browser handles resampling to the device rate.
## Notes
- Use headphones if you still hear echo despite browser AEC; the bot's
voice leaking back into the open mic is the most common cause of
feedback loops.
- The engine's session has an inactivity timeout
(`session.inactivity_timeout_sec` in `config.json`). If the bot
doesn't respond after a long silence, reconnect.