6.3 KiB
6.3 KiB
OmniSense: 12-Week Sprint Board + Tech Stack (Python Backend) — TODO
Scope
- Build a realtime AI SaaS (OmniSense) focused on web-first audio + video with WebSocket + WebRTC endpoints
- Deliver assistant builder, tool execution, observability, evals, optional telephony later
- Keep scope aligned to 2-person team, self-hosted services
Sprint Board (12 weeks, 2-week sprints)
Team assumption: 2 engineers. Scope prioritized to web-first audio + video, with BYO-SFU adapters.
Sprint 1 (Weeks 1–2) — Realtime Core MVP (WebSocket + WebRTC Audio)
- Deliverables
- WebSocket transport: audio in/out streaming (1:1)
- WebRTC transport: audio in/out streaming (1:1)
- Adapter contract wired into runtime (transport-agnostic session core)
- ASR → LLM → TTS pipeline, streaming both directions
- Basic session state (start/stop, silence timeout)
- Transcript persistence
- Acceptance criteria
- < 1.5s median round-trip for short responses
- Stable streaming for 10+ minute session
Sprint 2 (Weeks 3–4) — Video + Realtime UX
- Deliverables
- WebRTC video capture + streaming (assistant can “see” frames)
- WebSocket video streaming for local/dev mode
- Low-latency UI: push-to-talk, live captions, speaking indicator
- Recording + transcript storage (web sessions)
- Acceptance criteria
- Video < 2.5s end-to-end latency for analysis
- Audio quality acceptable (no clipping, jitter handling)
Sprint 3 (Weeks 5–6) — Assistant Builder v1
- Deliverables
- Assistant schema + versioning
- UI: Model/Voice/Transcriber/Tools/Video/Transport tabs
- “Test/Chat/Talk to Assistant” (web)
- Acceptance criteria
- Create/publish assistant and run a live web session
- All config changes tracked by version
Sprint 4 (Weeks 7–8) — Tooling + Structured Outputs
- Deliverables
- Tool registry + custom HTTP tools
- Tool auth secrets management
- Structured outputs (JSON extraction)
- Acceptance criteria
- Tool calls executed with retries/timeouts
- Structured JSON stored per call/session
Sprint 5 (Weeks 9–10) — Observability + QA + Dev Platform
- Deliverables
- Session logs + chat logs + media logs
- Evals engine + test suites
- Basic analytics dashboard
- Public WebSocket API spec + message schema
- JS/TS SDK (connect, send audio/video, receive transcripts)
- Acceptance criteria
- Reproducible test suite runs
- Log filters by assistant/time/status
- SDK demo app runs end-to-end
Sprint 6 (Weeks 11–12) — SaaS Hardening
- Deliverables
- Org/RBAC + API keys + rate limits
- Usage metering + credits
- Stripe billing integration
- Self-hosted DB ops (migrations, backup/restore, monitoring)
- Acceptance criteria
- Metered usage per org
- Credits decrement correctly
- Optional telephony spike documented (defer build)
- Enterprise adapter guide published (BYO-SFU)
Tech Stack by Service (Self-Hosted, Web-First)
1) Transport Gateway (Realtime)
- WebRTC (browser) + WebSocket (lightweight/dev) protocols
- BYO-SFU adapter (enterprise) + LiveKit optional adapter + WS transport server
- Python core (FastAPI + asyncio) + Node.js mediasoup adapters when needed
- Media: Opus/VP8, jitter buffer, VAD, echo cancellation
- Storage: S3-compatible (MinIO) for recordings
2) ASR Service
- Whisper (self-hosted) baseline
- gRPC/WebSocket streaming transport
- Python native service
- Optional cloud provider fallback (later)
3) TTS Service
- Piper or Coqui TTS (self-hosted)
- gRPC/WebSocket streaming transport
- Python native service
- Redis cache for common phrases
4) LLM Orchestrator
- Self-hosted (vLLM + open model)
- Python (FastAPI + asyncio)
- Streaming, tool calling, JSON mode
- Safety filters + prompt templates
5) Assistant Config Service
- PostgreSQL
- Python (SQLAlchemy or SQLModel)
- Versioning, publish/rollback
6) Session Service
- PostgreSQL + Redis
- Python
- State machine, timeouts, events
7) Tool Execution Layer
- PostgreSQL
- Python
- Auth secret vault, retry policies, tool schemas
8) Observability + Logs
- Postgres (metadata), ClickHouse (logs/metrics)
- OpenSearch for search
- Prometheus + Grafana metrics
- OpenTelemetry tracing
9) Billing + Usage Metering
- Stripe billing
- PostgreSQL
- NATS JetStream (events) + Redis counters
10) Web App (Dashboard)
- React + Next.js
- Tailwind or Radix UI
- WebRTC client + WS client; adapter-based RTC integration
- ECharts/Recharts
11) Auth + RBAC
- Keycloak (self-hosted) or custom JWT
- Org/user/role tables in Postgres
12) Public WebSocket API + SDK
- WS API: versioned schema, binary audio frames + JSON control messages
- SDKs: JS/TS first, optional Python/Go clients
- Docs: quickstart, auth flow, session lifecycle, examples
Infrastructure (Self-Hosted)
- Docker Compose → k3s (later)
- Redis Streams or NATS
- MinIO object store
- GitHub Actions + Helm or kustomize
- Self-hosted Postgres + pgbackrest backups
- Vault for secrets
Suggested MVP Sequence
- WebRTC demo + ASR/LLM/TTS streaming
- Assistant schema + versioning (web-first)
- Video capture + multimodal analysis
- Tool execution + structured outputs
- Logs + evals + public WS API + SDK
- Telephony (optional, later)
Public WebSocket API (Minimum Spec)
- Auth: API key or JWT in initial
hellomessage - Core messages:
session.start,session.stop,audio.append,audio.commit,video.append,transcript.delta,assistant.response,tool.call,tool.result,error - Binary payloads: PCM/Opus frames with metadata in control channel
- Versioning:
v1schema with backward compatibility rules
Self-Hosted DB Ops Checklist
- Postgres in Docker/k3s with persistent volumes
- Migrations:
alembicoratlas - Backups:
pgbackrestnightly + on-demand - Monitoring: postgres_exporter + alerts
RTC Adapter Contract (BYO-SFU First)
- Keep RTC pluggable; LiveKit optional, not core dependency
- Define adapter interface (TypeScript sketch)