Add backend api and engine
This commit is contained in:
187
engine/docs/proejct_todo.md
Normal file
187
engine/docs/proejct_todo.md
Normal file
@@ -0,0 +1,187 @@
|
||||
# OmniSense: 12-Week Sprint Board + Tech Stack (Python Backend) — TODO
|
||||
|
||||
## Scope
|
||||
- [ ] Build a realtime AI SaaS (OmniSense) focused on web-first audio + video with WebSocket + WebRTC endpoints
|
||||
- [ ] Deliver assistant builder, tool execution, observability, evals, optional telephony later
|
||||
- [ ] Keep scope aligned to 2-person team, self-hosted services
|
||||
|
||||
---
|
||||
|
||||
## Sprint Board (12 weeks, 2-week sprints)
|
||||
Team assumption: 2 engineers. Scope prioritized to web-first audio + video, with BYO-SFU adapters.
|
||||
|
||||
### Sprint 1 (Weeks 1–2) — Realtime Core MVP (WebSocket + WebRTC Audio)
|
||||
- Deliverables
|
||||
- [ ] WebSocket transport: audio in/out streaming (1:1)
|
||||
- [ ] WebRTC transport: audio in/out streaming (1:1)
|
||||
- [ ] Adapter contract wired into runtime (transport-agnostic session core)
|
||||
- [ ] ASR → LLM → TTS pipeline, streaming both directions
|
||||
- [ ] Basic session state (start/stop, silence timeout)
|
||||
- [ ] Transcript persistence
|
||||
- Acceptance criteria
|
||||
- [ ] < 1.5s median round-trip for short responses
|
||||
- [ ] Stable streaming for 10+ minute session
|
||||
|
||||
### Sprint 2 (Weeks 3–4) — Video + Realtime UX
|
||||
- Deliverables
|
||||
- [ ] WebRTC video capture + streaming (assistant can “see” frames)
|
||||
- [ ] WebSocket video streaming for local/dev mode
|
||||
- [ ] Low-latency UI: push-to-talk, live captions, speaking indicator
|
||||
- [ ] Recording + transcript storage (web sessions)
|
||||
- Acceptance criteria
|
||||
- [ ] Video < 2.5s end-to-end latency for analysis
|
||||
- [ ] Audio quality acceptable (no clipping, jitter handling)
|
||||
|
||||
### Sprint 3 (Weeks 5–6) — Assistant Builder v1
|
||||
- Deliverables
|
||||
- [ ] Assistant schema + versioning
|
||||
- [ ] UI: Model/Voice/Transcriber/Tools/Video/Transport tabs
|
||||
- [ ] “Test/Chat/Talk to Assistant” (web)
|
||||
- Acceptance criteria
|
||||
- [ ] Create/publish assistant and run a live web session
|
||||
- [ ] All config changes tracked by version
|
||||
|
||||
### Sprint 4 (Weeks 7–8) — Tooling + Structured Outputs
|
||||
- Deliverables
|
||||
- [ ] Tool registry + custom HTTP tools
|
||||
- [ ] Tool auth secrets management
|
||||
- [ ] Structured outputs (JSON extraction)
|
||||
- Acceptance criteria
|
||||
- [ ] Tool calls executed with retries/timeouts
|
||||
- [ ] Structured JSON stored per call/session
|
||||
|
||||
### Sprint 5 (Weeks 9–10) — Observability + QA + Dev Platform
|
||||
- Deliverables
|
||||
- [ ] Session logs + chat logs + media logs
|
||||
- [ ] Evals engine + test suites
|
||||
- [ ] Basic analytics dashboard
|
||||
- [ ] Public WebSocket API spec + message schema
|
||||
- [ ] JS/TS SDK (connect, send audio/video, receive transcripts)
|
||||
- Acceptance criteria
|
||||
- [ ] Reproducible test suite runs
|
||||
- [ ] Log filters by assistant/time/status
|
||||
- [ ] SDK demo app runs end-to-end
|
||||
|
||||
### Sprint 6 (Weeks 11–12) — SaaS Hardening
|
||||
- Deliverables
|
||||
- [ ] Org/RBAC + API keys + rate limits
|
||||
- [ ] Usage metering + credits
|
||||
- [ ] Stripe billing integration
|
||||
- [ ] Self-hosted DB ops (migrations, backup/restore, monitoring)
|
||||
- Acceptance criteria
|
||||
- [ ] Metered usage per org
|
||||
- [ ] Credits decrement correctly
|
||||
- [ ] Optional telephony spike documented (defer build)
|
||||
- [ ] Enterprise adapter guide published (BYO-SFU)
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack by Service (Self-Hosted, Web-First)
|
||||
|
||||
### 1) Transport Gateway (Realtime)
|
||||
- [ ] WebRTC (browser) + WebSocket (lightweight/dev) protocols
|
||||
- [ ] BYO-SFU adapter (enterprise) + LiveKit optional adapter + WS transport server
|
||||
- [ ] Python core (FastAPI + asyncio) + Node.js mediasoup adapters when needed
|
||||
- [ ] Media: Opus/VP8, jitter buffer, VAD, echo cancellation
|
||||
- [ ] Storage: S3-compatible (MinIO) for recordings
|
||||
|
||||
### 2) ASR Service
|
||||
- [ ] Whisper (self-hosted) baseline
|
||||
- [ ] gRPC/WebSocket streaming transport
|
||||
- [ ] Python native service
|
||||
- [ ] Optional cloud provider fallback (later)
|
||||
|
||||
### 3) TTS Service
|
||||
- [ ] Piper or Coqui TTS (self-hosted)
|
||||
- [ ] gRPC/WebSocket streaming transport
|
||||
- [ ] Python native service
|
||||
- [ ] Redis cache for common phrases
|
||||
|
||||
### 4) LLM Orchestrator
|
||||
- [ ] Self-hosted (vLLM + open model)
|
||||
- [ ] Python (FastAPI + asyncio)
|
||||
- [ ] Streaming, tool calling, JSON mode
|
||||
- [ ] Safety filters + prompt templates
|
||||
|
||||
### 5) Assistant Config Service
|
||||
- [ ] PostgreSQL
|
||||
- [ ] Python (SQLAlchemy or SQLModel)
|
||||
- [ ] Versioning, publish/rollback
|
||||
|
||||
### 6) Session Service
|
||||
- [ ] PostgreSQL + Redis
|
||||
- [ ] Python
|
||||
- [ ] State machine, timeouts, events
|
||||
|
||||
### 7) Tool Execution Layer
|
||||
- [ ] PostgreSQL
|
||||
- [ ] Python
|
||||
- [ ] Auth secret vault, retry policies, tool schemas
|
||||
|
||||
### 8) Observability + Logs
|
||||
- [ ] Postgres (metadata), ClickHouse (logs/metrics)
|
||||
- [ ] OpenSearch for search
|
||||
- [ ] Prometheus + Grafana metrics
|
||||
- [ ] OpenTelemetry tracing
|
||||
|
||||
### 9) Billing + Usage Metering
|
||||
- [ ] Stripe billing
|
||||
- [ ] PostgreSQL
|
||||
- [ ] NATS JetStream (events) + Redis counters
|
||||
|
||||
### 10) Web App (Dashboard)
|
||||
- [ ] React + Next.js
|
||||
- [ ] Tailwind or Radix UI
|
||||
- [ ] WebRTC client + WS client; adapter-based RTC integration
|
||||
- [ ] ECharts/Recharts
|
||||
|
||||
### 11) Auth + RBAC
|
||||
- [ ] Keycloak (self-hosted) or custom JWT
|
||||
- [ ] Org/user/role tables in Postgres
|
||||
|
||||
### 12) Public WebSocket API + SDK
|
||||
- [ ] WS API: versioned schema, binary audio frames + JSON control messages
|
||||
- [ ] SDKs: JS/TS first, optional Python/Go clients
|
||||
- [ ] Docs: quickstart, auth flow, session lifecycle, examples
|
||||
|
||||
---
|
||||
|
||||
## Infrastructure (Self-Hosted)
|
||||
- [ ] Docker Compose → k3s (later)
|
||||
- [ ] Redis Streams or NATS
|
||||
- [ ] MinIO object store
|
||||
- [ ] GitHub Actions + Helm or kustomize
|
||||
- [ ] Self-hosted Postgres + pgbackrest backups
|
||||
- [ ] Vault for secrets
|
||||
|
||||
---
|
||||
|
||||
## Suggested MVP Sequence
|
||||
- [ ] WebRTC demo + ASR/LLM/TTS streaming
|
||||
- [ ] Assistant schema + versioning (web-first)
|
||||
- [ ] Video capture + multimodal analysis
|
||||
- [ ] Tool execution + structured outputs
|
||||
- [ ] Logs + evals + public WS API + SDK
|
||||
- [ ] Telephony (optional, later)
|
||||
|
||||
---
|
||||
|
||||
## Public WebSocket API (Minimum Spec)
|
||||
- [ ] Auth: API key or JWT in initial `hello` message
|
||||
- [ ] Core messages: `session.start`, `session.stop`, `audio.append`, `audio.commit`, `video.append`, `transcript.delta`, `assistant.response`, `tool.call`, `tool.result`, `error`
|
||||
- [ ] Binary payloads: PCM/Opus frames with metadata in control channel
|
||||
- [ ] Versioning: `v1` schema with backward compatibility rules
|
||||
|
||||
---
|
||||
|
||||
## Self-Hosted DB Ops Checklist
|
||||
- [ ] Postgres in Docker/k3s with persistent volumes
|
||||
- [ ] Migrations: `alembic` or `atlas`
|
||||
- [ ] Backups: `pgbackrest` nightly + on-demand
|
||||
- [ ] Monitoring: postgres_exporter + alerts
|
||||
|
||||
---
|
||||
|
||||
## RTC Adapter Contract (BYO-SFU First)
|
||||
- [ ] Keep RTC pluggable; LiveKit optional, not core dependency
|
||||
- [ ] Define adapter interface (TypeScript sketch)
|
||||
Reference in New Issue
Block a user