107 lines
3.5 KiB
Markdown
107 lines
3.5 KiB
Markdown
# Webpage Example — Realtime Voice Chat
|
|
|
|
A self-contained browser client for the engine's product websocket
|
|
(`/ws-product`, protocol `va.ws.v1`).
|
|
|
|
## Features
|
|
|
|
- **Connect / Disconnect** to any `ws://` or `wss://` URL.
|
|
- **Microphone selector + mic on/off toggle** — available input devices
|
|
are listed with `enumerateDevices`, and getUserMedia is requested with
|
|
`echoCancellation`, `noiseSuppression`, and `autoGainControl` so the
|
|
browser handles AEC against the bot's voice.
|
|
- **Text composer** — type a message and press <kbd>Enter</kbd> to send
|
|
an `input.text` event (Shift+Enter for newline). Sending interrupts
|
|
any in-flight bot audio so the next reply is heard cleanly.
|
|
- **Chat history** rendered from `input.transcript.final` (you, when
|
|
spoken), streamed `response.text.delta` / `response.text.final`
|
|
(assistant — deltas arrive ahead of the synthesized audio), and locally
|
|
for text you submit (the engine doesn't echo text input back as a
|
|
transcript).
|
|
- **WebSocket log** panel for connection state and compact send/receive
|
|
events. Audio chunks are summarized so the UI does not flood.
|
|
- **Gapless TTS playback** by scheduling each `response.audio.delta`
|
|
chunk back-to-back on the AudioContext.
|
|
- **Live VU meter** + mic and bot activity indicators.
|
|
- **Clear** button to reset history.
|
|
|
|
No build step, no dependencies — just three files plus an AudioWorklet.
|
|
|
|
## Layout
|
|
|
|
```text
|
|
examples/webpage/
|
|
├── index.html
|
|
├── styles.css
|
|
├── app.js
|
|
└── pcm-recorder.worklet.js
|
|
```
|
|
|
|
## Run
|
|
|
|
1. Start the engine (default port `8000`):
|
|
|
|
```bash
|
|
cd AI-VideoAssistant-engine-v5-pipecat-minimal
|
|
source .venv/bin/activate
|
|
export OPENAI_API_KEY=...
|
|
uvicorn engine.main:app --host 127.0.0.1 --port 8000
|
|
```
|
|
|
|
2. Open the demo page served by the same process:
|
|
|
|
```text
|
|
http://127.0.0.1:8000/demo/
|
|
```
|
|
|
|
The default websocket URL is derived from the page host
|
|
(`ws://127.0.0.1:8000/ws-product`). Click **Connect**, pick a
|
|
microphone if needed, click **Enable mic**, and start speaking.
|
|
|
|
Mount path and on/off are controlled in `config.json`:
|
|
|
|
```json
|
|
"server": {
|
|
"serve_webpage": true,
|
|
"webpage_mount": "/demo"
|
|
}
|
|
```
|
|
|
|
Set `"serve_webpage": false` in production if you serve the UI elsewhere.
|
|
|
|
### Standalone static server (optional)
|
|
|
|
You can still serve the files from another port for UI-only iteration.
|
|
Add that origin to `server.cors_origins` in `config.json` if needed:
|
|
|
|
```bash
|
|
cd AI-VideoAssistant-engine-v5-pipecat-minimal/examples/webpage
|
|
python -m http.server 8080
|
|
```
|
|
|
|
Then open <http://localhost:8080> and point the URL field at
|
|
`ws://127.0.0.1:8000/ws-product`.
|
|
|
|
> The browser's mic API requires a secure context. `http://localhost`
|
|
> qualifies; if you serve from another host, use HTTPS and a `wss://`
|
|
> URL.
|
|
|
|
## Audio details
|
|
|
|
- Input: mono Float32 from `getUserMedia` is resampled in the
|
|
AudioWorklet to PCM16 mono @ 16 kHz, framed into 20 ms chunks, and
|
|
sent as **binary** websocket messages (the server accepts either
|
|
binary or the JSON+base64 form).
|
|
- Output: each `response.audio.delta` carries base64-encoded PCM16 @
|
|
16 kHz; chunks are decoded and scheduled back-to-back through Web
|
|
Audio. The browser handles resampling to the device rate.
|
|
|
|
## Notes
|
|
|
|
- Use headphones if you still hear echo despite browser AEC; the bot's
|
|
voice leaking back into the open mic is the most common cause of
|
|
feedback loops.
|
|
- The engine's session has an inactivity timeout
|
|
(`session.inactivity_timeout_sec` in `config.json`). If the bot
|
|
doesn't respond after a long silence, reconnect.
|