Files
py-active-call/docs/proejct_todo.md
2026-02-03 12:05:09 +08:00

6.3 KiB
Raw Permalink Blame History

OmniSense: 12-Week Sprint Board + Tech Stack (Python Backend) — TODO

Scope

  • Build a realtime AI SaaS (OmniSense) focused on web-first audio + video with WebSocket + WebRTC endpoints
  • Deliver assistant builder, tool execution, observability, evals, optional telephony later
  • Keep scope aligned to 2-person team, self-hosted services

Sprint Board (12 weeks, 2-week sprints)

Team assumption: 2 engineers. Scope prioritized to web-first audio + video, with BYO-SFU adapters.

Sprint 1 (Weeks 12) — Realtime Core MVP (WebSocket + WebRTC Audio)

  • Deliverables
    • WebSocket transport: audio in/out streaming (1:1)
    • WebRTC transport: audio in/out streaming (1:1)
    • Adapter contract wired into runtime (transport-agnostic session core)
    • ASR → LLM → TTS pipeline, streaming both directions
    • Basic session state (start/stop, silence timeout)
    • Transcript persistence
  • Acceptance criteria
    • < 1.5s median round-trip for short responses
    • Stable streaming for 10+ minute session

Sprint 2 (Weeks 34) — Video + Realtime UX

  • Deliverables
    • WebRTC video capture + streaming (assistant can “see” frames)
    • WebSocket video streaming for local/dev mode
    • Low-latency UI: push-to-talk, live captions, speaking indicator
    • Recording + transcript storage (web sessions)
  • Acceptance criteria
    • Video < 2.5s end-to-end latency for analysis
    • Audio quality acceptable (no clipping, jitter handling)

Sprint 3 (Weeks 56) — Assistant Builder v1

  • Deliverables
    • Assistant schema + versioning
    • UI: Model/Voice/Transcriber/Tools/Video/Transport tabs
    • “Test/Chat/Talk to Assistant” (web)
  • Acceptance criteria
    • Create/publish assistant and run a live web session
    • All config changes tracked by version

Sprint 4 (Weeks 78) — Tooling + Structured Outputs

  • Deliverables
    • Tool registry + custom HTTP tools
    • Tool auth secrets management
    • Structured outputs (JSON extraction)
  • Acceptance criteria
    • Tool calls executed with retries/timeouts
    • Structured JSON stored per call/session

Sprint 5 (Weeks 910) — Observability + QA + Dev Platform

  • Deliverables
    • Session logs + chat logs + media logs
    • Evals engine + test suites
    • Basic analytics dashboard
    • Public WebSocket API spec + message schema
    • JS/TS SDK (connect, send audio/video, receive transcripts)
  • Acceptance criteria
    • Reproducible test suite runs
    • Log filters by assistant/time/status
    • SDK demo app runs end-to-end

Sprint 6 (Weeks 1112) — SaaS Hardening

  • Deliverables
    • Org/RBAC + API keys + rate limits
    • Usage metering + credits
    • Stripe billing integration
    • Self-hosted DB ops (migrations, backup/restore, monitoring)
  • Acceptance criteria
    • Metered usage per org
    • Credits decrement correctly
    • Optional telephony spike documented (defer build)
    • Enterprise adapter guide published (BYO-SFU)

Tech Stack by Service (Self-Hosted, Web-First)

1) Transport Gateway (Realtime)

  • WebRTC (browser) + WebSocket (lightweight/dev) protocols
  • BYO-SFU adapter (enterprise) + LiveKit optional adapter + WS transport server
  • Python core (FastAPI + asyncio) + Node.js mediasoup adapters when needed
  • Media: Opus/VP8, jitter buffer, VAD, echo cancellation
  • Storage: S3-compatible (MinIO) for recordings

2) ASR Service

  • Whisper (self-hosted) baseline
  • gRPC/WebSocket streaming transport
  • Python native service
  • Optional cloud provider fallback (later)

3) TTS Service

  • Piper or Coqui TTS (self-hosted)
  • gRPC/WebSocket streaming transport
  • Python native service
  • Redis cache for common phrases

4) LLM Orchestrator

  • Self-hosted (vLLM + open model)
  • Python (FastAPI + asyncio)
  • Streaming, tool calling, JSON mode
  • Safety filters + prompt templates

5) Assistant Config Service

  • PostgreSQL
  • Python (SQLAlchemy or SQLModel)
  • Versioning, publish/rollback

6) Session Service

  • PostgreSQL + Redis
  • Python
  • State machine, timeouts, events

7) Tool Execution Layer

  • PostgreSQL
  • Python
  • Auth secret vault, retry policies, tool schemas

8) Observability + Logs

  • Postgres (metadata), ClickHouse (logs/metrics)
  • OpenSearch for search
  • Prometheus + Grafana metrics
  • OpenTelemetry tracing

9) Billing + Usage Metering

  • Stripe billing
  • PostgreSQL
  • NATS JetStream (events) + Redis counters

10) Web App (Dashboard)

  • React + Next.js
  • Tailwind or Radix UI
  • WebRTC client + WS client; adapter-based RTC integration
  • ECharts/Recharts

11) Auth + RBAC

  • Keycloak (self-hosted) or custom JWT
  • Org/user/role tables in Postgres

12) Public WebSocket API + SDK

  • WS API: versioned schema, binary audio frames + JSON control messages
  • SDKs: JS/TS first, optional Python/Go clients
  • Docs: quickstart, auth flow, session lifecycle, examples

Infrastructure (Self-Hosted)

  • Docker Compose → k3s (later)
  • Redis Streams or NATS
  • MinIO object store
  • GitHub Actions + Helm or kustomize
  • Self-hosted Postgres + pgbackrest backups
  • Vault for secrets

Suggested MVP Sequence

  • WebRTC demo + ASR/LLM/TTS streaming
  • Assistant schema + versioning (web-first)
  • Video capture + multimodal analysis
  • Tool execution + structured outputs
  • Logs + evals + public WS API + SDK
  • Telephony (optional, later)

Public WebSocket API (Minimum Spec)

  • Auth: API key or JWT in initial hello message
  • Core messages: session.start, session.stop, audio.append, audio.commit, video.append, transcript.delta, assistant.response, tool.call, tool.result, error
  • Binary payloads: PCM/Opus frames with metadata in control channel
  • Versioning: v1 schema with backward compatibility rules

Self-Hosted DB Ops Checklist

  • Postgres in Docker/k3s with persistent volumes
  • Migrations: alembic or atlas
  • Backups: pgbackrest nightly + on-demand
  • Monitoring: postgres_exporter + alerts

RTC Adapter Contract (BYO-SFU First)

  • Keep RTC pluggable; LiveKit optional, not core dependency
  • Define adapter interface (TypeScript sketch)