# OmniSense: 12-Week Sprint Board + Tech Stack (Python Backend) — TODO ## Scope - [ ] Build a realtime AI SaaS (OmniSense) focused on web-first audio + video with WebSocket + WebRTC endpoints - [ ] Deliver assistant builder, tool execution, observability, evals, optional telephony later - [ ] Keep scope aligned to 2-person team, self-hosted services --- ## Sprint Board (12 weeks, 2-week sprints) Team assumption: 2 engineers. Scope prioritized to web-first audio + video, with BYO-SFU adapters. ### Sprint 1 (Weeks 1–2) — Realtime Core MVP (WebSocket + WebRTC Audio) - Deliverables - [ ] WebSocket transport: audio in/out streaming (1:1) - [ ] WebRTC transport: audio in/out streaming (1:1) - [ ] Adapter contract wired into runtime (transport-agnostic session core) - [ ] ASR → LLM → TTS pipeline, streaming both directions - [ ] Basic session state (start/stop, silence timeout) - [ ] Transcript persistence - Acceptance criteria - [ ] < 1.5s median round-trip for short responses - [ ] Stable streaming for 10+ minute session ### Sprint 2 (Weeks 3–4) — Video + Realtime UX - Deliverables - [ ] WebRTC video capture + streaming (assistant can “see” frames) - [ ] WebSocket video streaming for local/dev mode - [ ] Low-latency UI: push-to-talk, live captions, speaking indicator - [ ] Recording + transcript storage (web sessions) - Acceptance criteria - [ ] Video < 2.5s end-to-end latency for analysis - [ ] Audio quality acceptable (no clipping, jitter handling) ### Sprint 3 (Weeks 5–6) — Assistant Builder v1 - Deliverables - [ ] Assistant schema + versioning - [ ] UI: Model/Voice/Transcriber/Tools/Video/Transport tabs - [ ] “Test/Chat/Talk to Assistant” (web) - Acceptance criteria - [ ] Create/publish assistant and run a live web session - [ ] All config changes tracked by version ### Sprint 4 (Weeks 7–8) — Tooling + Structured Outputs - Deliverables - [ ] Tool registry + custom HTTP tools - [ ] Tool auth secrets management - [ ] Structured outputs (JSON extraction) - Acceptance criteria - [ ] Tool calls executed with retries/timeouts - [ ] Structured JSON stored per call/session ### Sprint 5 (Weeks 9–10) — Observability + QA + Dev Platform - Deliverables - [ ] Session logs + chat logs + media logs - [ ] Evals engine + test suites - [ ] Basic analytics dashboard - [ ] Public WebSocket API spec + message schema - [ ] JS/TS SDK (connect, send audio/video, receive transcripts) - Acceptance criteria - [ ] Reproducible test suite runs - [ ] Log filters by assistant/time/status - [ ] SDK demo app runs end-to-end ### Sprint 6 (Weeks 11–12) — SaaS Hardening - Deliverables - [ ] Org/RBAC + API keys + rate limits - [ ] Usage metering + credits - [ ] Stripe billing integration - [ ] Self-hosted DB ops (migrations, backup/restore, monitoring) - Acceptance criteria - [ ] Metered usage per org - [ ] Credits decrement correctly - [ ] Optional telephony spike documented (defer build) - [ ] Enterprise adapter guide published (BYO-SFU) --- ## Tech Stack by Service (Self-Hosted, Web-First) ### 1) Transport Gateway (Realtime) - [ ] WebRTC (browser) + WebSocket (lightweight/dev) protocols - [ ] BYO-SFU adapter (enterprise) + LiveKit optional adapter + WS transport server - [ ] Python core (FastAPI + asyncio) + Node.js mediasoup adapters when needed - [ ] Media: Opus/VP8, jitter buffer, VAD, echo cancellation - [ ] Storage: S3-compatible (MinIO) for recordings ### 2) ASR Service - [ ] Whisper (self-hosted) baseline - [ ] gRPC/WebSocket streaming transport - [ ] Python native service - [ ] Optional cloud provider fallback (later) ### 3) TTS Service - [ ] Piper or Coqui TTS (self-hosted) - [ ] gRPC/WebSocket streaming transport - [ ] Python native service - [ ] Redis cache for common phrases ### 4) LLM Orchestrator - [ ] Self-hosted (vLLM + open model) - [ ] Python (FastAPI + asyncio) - [ ] Streaming, tool calling, JSON mode - [ ] Safety filters + prompt templates ### 5) Assistant Config Service - [ ] PostgreSQL - [ ] Python (SQLAlchemy or SQLModel) - [ ] Versioning, publish/rollback ### 6) Session Service - [ ] PostgreSQL + Redis - [ ] Python - [ ] State machine, timeouts, events ### 7) Tool Execution Layer - [ ] PostgreSQL - [ ] Python - [ ] Auth secret vault, retry policies, tool schemas ### 8) Observability + Logs - [ ] Postgres (metadata), ClickHouse (logs/metrics) - [ ] OpenSearch for search - [ ] Prometheus + Grafana metrics - [ ] OpenTelemetry tracing ### 9) Billing + Usage Metering - [ ] Stripe billing - [ ] PostgreSQL - [ ] NATS JetStream (events) + Redis counters ### 10) Web App (Dashboard) - [ ] React + Next.js - [ ] Tailwind or Radix UI - [ ] WebRTC client + WS client; adapter-based RTC integration - [ ] ECharts/Recharts ### 11) Auth + RBAC - [ ] Keycloak (self-hosted) or custom JWT - [ ] Org/user/role tables in Postgres ### 12) Public WebSocket API + SDK - [ ] WS API: versioned schema, binary audio frames + JSON control messages - [ ] SDKs: JS/TS first, optional Python/Go clients - [ ] Docs: quickstart, auth flow, session lifecycle, examples --- ## Infrastructure (Self-Hosted) - [ ] Docker Compose → k3s (later) - [ ] Redis Streams or NATS - [ ] MinIO object store - [ ] GitHub Actions + Helm or kustomize - [ ] Self-hosted Postgres + pgbackrest backups - [ ] Vault for secrets --- ## Suggested MVP Sequence - [ ] WebRTC demo + ASR/LLM/TTS streaming - [ ] Assistant schema + versioning (web-first) - [ ] Video capture + multimodal analysis - [ ] Tool execution + structured outputs - [ ] Logs + evals + public WS API + SDK - [ ] Telephony (optional, later) --- ## Public WebSocket API (Minimum Spec) - [ ] Auth: API key or JWT in initial `hello` message - [ ] Core messages: `session.start`, `session.stop`, `audio.append`, `audio.commit`, `video.append`, `transcript.delta`, `assistant.response`, `tool.call`, `tool.result`, `error` - [ ] Binary payloads: PCM/Opus frames with metadata in control channel - [ ] Versioning: `v1` schema with backward compatibility rules --- ## Self-Hosted DB Ops Checklist - [ ] Postgres in Docker/k3s with persistent volumes - [ ] Migrations: `alembic` or `atlas` - [ ] Backups: `pgbackrest` nightly + on-demand - [ ] Monitoring: postgres_exporter + alerts --- ## RTC Adapter Contract (BYO-SFU First) - [ ] Keep RTC pluggable; LiveKit optional, not core dependency - [ ] Define adapter interface (TypeScript sketch)