Files
2026-06-03 12:36:18 +08:00
..
2026-06-03 12:36:18 +08:00
2026-06-03 12:36:18 +08:00
2026-06-03 12:36:18 +08:00
2026-06-03 12:36:18 +08:00

Webpage Example — Realtime Voice Chat

A self-contained browser client for the engine's product websocket (/ws-product, protocol va.ws.v1).

Features

  • Connect / Disconnect to any ws:// or wss:// URL.
  • Microphone selector + mic on/off toggle — available input devices are listed with enumerateDevices, and getUserMedia is requested with echoCancellation, noiseSuppression, and autoGainControl so the browser handles AEC against the bot's voice.
  • Text composer — type a message and press Enter to send an input.text event (Shift+Enter for newline). Sending interrupts any in-flight bot audio so the next reply is heard cleanly.
  • Chat history rendered from input.transcript.final (you, when spoken), streamed response.text.delta / response.text.final (assistant — deltas arrive ahead of the synthesized audio), and locally for text you submit (the engine doesn't echo text input back as a transcript).
  • WebSocket log panel for connection state and compact send/receive events. Audio chunks are summarized so the UI does not flood.
  • Gapless TTS playback by scheduling each response.audio.delta chunk back-to-back on the AudioContext.
  • Live VU meter + mic and bot activity indicators.
  • Clear button to reset history.

No build step, no dependencies — just three files plus an AudioWorklet.

Layout

examples/webpage/
├── index.html
├── styles.css
├── app.js
└── pcm-recorder.worklet.js

Run

  1. Start the engine (default port 8000):

    cd AI-VideoAssistant-engine-v5-pipecat-minimal
    source .venv/bin/activate
    export OPENAI_API_KEY=...
    uvicorn engine.main:app --host 127.0.0.1 --port 8000
    
  2. Open the demo page served by the same process:

    http://127.0.0.1:8000/voice-demo/
    

    The default websocket URL is derived from the page host (ws://127.0.0.1:8000/ws-product). Click Connect, pick a microphone if needed, click Enable mic, and start speaking.

    Mount path and on/off are controlled in config.json:

    "server": {
      "serve_webpage": true,
      "webpage_mount": "/voice-demo"
    }
    

    Set "serve_webpage": false in production if you serve the UI elsewhere.

Standalone static server (optional)

You can still serve the files from another port for UI-only iteration. Add that origin to server.cors_origins in config.json if needed:

cd AI-VideoAssistant-engine-v5-pipecat-minimal/examples/webpage
python -m http.server 8080

Then open http://localhost:8080 and point the URL field at ws://127.0.0.1:8000/ws-product.

The browser's mic API requires a secure context. http://localhost qualifies; if you serve from another host, use HTTPS and a wss:// URL.

Audio details

  • Input: mono Float32 from getUserMedia is resampled in the AudioWorklet to PCM16 mono @ 16 kHz, framed into 20 ms chunks, and sent as binary websocket messages (the server accepts either binary or the JSON+base64 form).
  • Output: each response.audio.delta carries base64-encoded PCM16 @ 16 kHz; chunks are decoded and scheduled back-to-back through Web Audio. The browser handles resampling to the device rate.

Notes

  • Use headphones if you still hear echo despite browser AEC; the bot's voice leaking back into the open mic is the most common cause of feedback loops.
  • The engine's session has an inactivity timeout (session.inactivity_timeout_sec in config.json). If the bot doesn't respond after a long silence, reconnect.