vibecheck/docs/ARCHITECTURE.md
TransQuinnFTW 01011c97ab chore(src): 🔧 Update documentation files in src directory (12 markdown files)
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-02-11 23:08:02 -08:00

22 KiB

VibeCheck Architecture

Version: 0.1.0 Last Updated: 2026-02-06

Table of Contents

Overview

VibeCheck is a privacy-first liveness detection system built entirely for client-side execution. The architecture is designed around a core principle: no biometric data ever leaves the user's browser.

Design Principles

  1. Privacy by Architecture: Biometric processing is architecturally isolated to the client
  2. Open Source Transparency: All processing logic is auditable
  3. Minimal Data Transfer: Only boolean results cross the network boundary
  4. Progressive Enhancement: Works without server-side components
  5. Framework Agnostic Core: Vanilla TypeScript core with framework adapters

System Architecture

┌────────────────────────────────────────────────────────────┐
│                      User's Browser                         │
│                     (Client-Side Only)                      │
├────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐         ┌─────────────────┐              │
│  │   Webcam     │────────▶│  MediaPipe      │              │
│  │  (getUserMedia)        │  Face Landmarker│              │
│  └──────────────┘         └────────┬────────┘              │
│                                     │                       │
│                                     ▼                       │
│                          ┌─────────────────┐               │
│                          │  Liveness       │               │
│                          │  Detection      │               │
│                          │  Engine         │               │
│                          └────────┬────────┘               │
│                                   │                        │
│         ┌─────────────────────────┴─────────────┐          │
│         │                                       │          │
│         ▼                                       ▼          │
│  ┌─────────────┐                      ┌─────────────┐     │
│  │  Blink      │                      │  Head       │     │
│  │  Detector   │                      │  Movement   │     │
│  └──────┬──────┘                      └──────┬──────┘     │
│         │                                    │            │
│         │            ┌───────────┐           │            │
│         └───────────▶│  Result   │◀──────────┘            │
│                      │  Computer │                        │
│                      └─────┬─────┘                        │
│                            │                              │
│                            ▼                              │
│                    { isLive: boolean,                     │
│                      confidence: number,                  │
│                      timestamp: number }                  │
│                                                            │
└────────────────────────────┬───────────────────────────────┘
                             │
                             │ HTTPS (result only)
                             │ ❌ No video
                             │ ❌ No images
                             │ ❌ No biometric data
                             │
                             ▼
                ┌────────────────────────┐
                │   Your Server          │
                │   (Optional)           │
                ├────────────────────────┤
                │ • Validate timestamp   │
                │ • Rate limiting        │
                │ • Store result         │
                │ • Proceed with flow    │
                └────────────────────────┘

Core Components

1. Core Library (@lilithftw/vibecheck-core)

The foundation of VibeCheck, providing framework-agnostic liveness detection.

Key Classes:

LivenessDetector

The main detection engine that orchestrates the liveness check process.

class LivenessDetector {
  constructor(options?: LivenessOptions);

  // Initialize MediaPipe and webcam
  async initialize(): Promise<void>;

  // Start the liveness detection check
  async check(): Promise<LivenessResult>;

  // Clean up resources
  cleanup(): void;
}

Responsibilities:

  • MediaPipe initialization and lifecycle management
  • Webcam stream acquisition and management
  • Orchestration of detection algorithms
  • Result computation and validation

BlinkDetector

Specialized module for detecting eye blinks using facial landmarks.

Algorithm:

  1. Track eye aspect ratio (EAR) over time
  2. Detect EAR threshold crossings (open → closed → open)
  3. Validate blink duration (too fast = invalid, too slow = invalid)
  4. Count valid blinks within time window

HeadMovementDetector

Detects deliberate head movements (turn left/right, nod up/down).

Algorithm:

  1. Track nose landmark position over time
  2. Calculate movement vectors (horizontal/vertical)
  3. Detect significant directional changes
  4. Filter out micro-movements and jitter

DepthEstimator

Estimates facial depth using landmark geometry to detect spoofing attempts.

Algorithm:

  1. Calculate inter-landmark distances
  2. Build 3D geometry model from 2D landmarks
  3. Analyze depth consistency over time
  4. Flag suspicious flat/planar faces (photos)

2. React Component (@lilithftw/vibecheck-react)

React-specific wrapper providing hooks and components.

Key Components:

<VibeCheck />

High-level component with built-in UI.

interface VibeCheckProps {
  onSuccess: (result: LivenessResult) => void;
  onFailure: (error: LivenessError) => void;
  onStatusChange?: (status: CheckStatus) => void;
  config?: LivenessOptions;
  theme?: 'light' | 'dark' | Theme;
}

useVibeCheck() Hook

Headless hook for custom UI implementations.

interface UseVibeCheckReturn {
  isInitialized: boolean;
  isChecking: boolean;
  result: LivenessResult | null;
  error: LivenessError | null;
  startCheck: () => Promise<void>;
  reset: () => void;
}

Data Flow

1. Initialization Phase

User clicks "Start Check"
         │
         ▼
┌────────────────────┐
│ Request camera     │
│ permissions        │
└────────┬───────────┘
         │
         ▼
┌────────────────────┐
│ Initialize         │
│ MediaPipe          │
│ (download models)  │
└────────┬───────────┘
         │
         ▼
┌────────────────────┐
│ Start video stream │
└────────────────────┘

Network Activity:

  • MediaPipe model files (~2-3 MB, cached after first load)
  • No data sent to external servers

2. Detection Phase

Video Frame (60fps)
         │
         ▼
┌────────────────────┐
│ MediaPipe          │
│ Face Detection     │
└────────┬───────────┘
         │
         ▼
478 facial landmarks (2D coordinates)
         │
         ├──────┬──────────────┬─────────┐
         ▼      ▼              ▼         ▼
    ┌──────┐ ┌────────┐  ┌─────────┐ ┌──────┐
    │Blink │ │Head    │  │Depth    │ │Other │
    │Det.  │ │Movement│  │Estimate │ │Cues  │
    └───┬──┘ └───┬────┘  └────┬────┘ └───┬──┘
        │        │            │          │
        └────────┴────────────┴──────────┘
                     │
                     ▼
            ┌────────────────┐
            │ Confidence     │
            │ Aggregator     │
            └────────┬───────┘
                     │
                     ▼
            { isLive: true/false,
              confidence: 0.0-1.0 }

Data Locality:

  • All processing happens in browser memory
  • Landmarks never serialized or stored
  • Video frames never leave WebRTC pipeline

3. Result Phase

Detection Complete
         │
         ▼
┌────────────────────┐
│ Cleanup resources  │
│ • Stop camera      │
│ • Release MediaPipe│
│ • Clear buffers    │
└────────┬───────────┘
         │
         ▼
┌────────────────────┐
│ Return result      │
│ {                  │
│   isLive: boolean, │
│   confidence: num, │
│   timestamp: num   │
│ }                  │
└────────┬───────────┘
         │
         ▼
   Application code
   (your callback)

What's Transmitted:

  • Boolean flag (1 bit conceptually, ~10 bytes JSON)
  • Confidence score (~8 bytes)
  • Timestamp (~8 bytes)
  • Total: ~26 bytes of non-biometric data

MediaPipe Integration

Face Landmarker Model

VibeCheck uses MediaPipe's Face Landmarker, which provides:

  • 478 3D facial landmarks
  • Face blendshapes (52 coefficients)
  • Face geometry (transformation matrices)

Model Details:

  • Type: face_landmarker.task
  • Size: ~2.5 MB (gzipped)
  • Framework: TensorFlow Lite
  • Inference: WebAssembly + WebGL

Integration Pattern

import { FaceLandmarker, FilesetResolver } from '@mediapipe/tasks-vision';

// Initialize (done once per session)
const vision = await FilesetResolver.forVisionTasks(
  'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/wasm'
);

const faceLandmarker = await FaceLandmarker.createFromOptions(vision, {
  baseOptions: {
    modelAssetPath: 'https://storage.googleapis.com/mediapipe-models/...',
    delegate: 'GPU' // Use WebGL acceleration
  },
  runningMode: 'VIDEO',
  numFaces: 1,
  outputFaceBlendshapes: true,
  outputFacialTransformationMatrixes: true
});

// Per-frame processing
const results = faceLandmarker.detectForVideo(videoElement, timestamp);

Performance Optimization

  1. GPU Acceleration: Use WebGL delegate when available
  2. Single Face Mode: numFaces: 1 reduces overhead
  3. Selective Outputs: Only request needed data (blendshapes, matrices)
  4. Frame Skipping: Process every 2nd-3rd frame for 30fps target
  5. Warm-up: Pre-initialize model before user interaction

Privacy Architecture

Isolation Boundaries

┌─────────────────────────────────────────┐
│         Browser Sandbox                  │
│                                          │
│  ┌────────────────────────────────┐     │
│  │    WebRTC/getUserMedia         │     │
│  │    (Camera Access)             │     │
│  └──────────┬─────────────────────┘     │
│             │                            │
│             ▼                            │
│  ┌────────────────────────────────┐     │
│  │    JavaScript Memory           │     │
│  │    • Video frames (volatile)   │     │
│  │    • Landmarks (volatile)      │     │
│  │    • Processing buffers        │     │
│  └──────────┬─────────────────────┘     │
│             │                            │
│             ▼                            │
│  ┌────────────────────────────────┐     │
│  │    Result Computation          │     │
│  │    (Boolean + Metadata only)   │     │
│  └──────────┬─────────────────────┘     │
│             │                            │
└─────────────┼────────────────────────────┘
              │
              │ Network Boundary
              │ (TLS encrypted)
              │
              ▼
    ┌─────────────────┐
    │  Your Server    │
    │  (Receives only │
    │   boolean +     │
    │   metadata)     │
    └─────────────────┘

Security Properties

  1. No Server-Side Processing: Eliminates server breach risk
  2. No Data Persistence: Video frames never touch disk
  3. No Network Transmission: Biometric data never serialized
  4. Open Source Auditing: All processing logic is public
  5. Local Computation: Works offline after model download

Verification Methods

Users can verify privacy claims by:

  1. Network Inspection: Monitor DevTools Network tab (only sees result JSON)
  2. Source Code Audit: Review open-source implementation
  3. Traffic Analysis: Use Wireshark/mitmproxy to inspect HTTPS traffic
  4. Local Testing: Run checks with network disabled (works after initial load)

Technical Decisions

Why Client-Side Only?

Advantages:

  • Maximum privacy (no biometric data exposure)
  • Lower infrastructure costs (no GPU servers)
  • Faster response (no network round-trip)
  • Works offline (after model download)
  • Scales infinitely (client resources)

Trade-offs:

  • ⚠️ Requires modern browser (WebGL, WebAssembly)
  • ⚠️ Client can be compromised (must supplement with server-side checks)
  • ⚠️ Initial model download (~2.5 MB)

Why MediaPipe?

Alternatives Considered:

  • TensorFlow.js: More flexible but requires custom model training
  • OpenCV.js: Powerful but large bundle size (~8 MB)
  • Face-api.js: Good but less maintained

MediaPipe Advantages:

  • High accuracy (Google-trained models)
  • Optimized for web (WASM + WebGL)
  • Well-maintained by Google
  • Production-ready performance
  • Comprehensive landmark data (478 points)

Why TypeScript?

  • Type safety for complex geometric calculations
  • Better IDE support for library consumers
  • Compile-time error detection
  • Self-documenting code with interfaces

Why Monorepo?

Structure:

packages/
├── core/          # Framework-agnostic logic
├── react/         # React adapter
├── vue/           # (Future) Vue adapter
├── svelte/        # (Future) Svelte adapter
└── demo/          # Interactive demo

Benefits:

  • Shared TypeScript configs
  • Coordinated releases
  • Easier cross-package refactoring
  • Single documentation source

Performance Considerations

Bundle Sizes

Package File Size Gzipped (est.)
@lilithftw/vibecheck-core dist/index.js 89 KB ~25 KB
@lilithftw/vibecheck-core dist/index.d.ts 31 KB
@lilithftw/vibecheck-react dist/index.js 53 KB ~15 KB
@lilithftw/vibecheck-react dist/index.d.ts 23 KB

React package includes core as a dependency. Total JavaScript shipped to browser: ~142 KB (before tree-shaking), ~40 KB gzipped.

Network: WASM + Model Downloads

Resource Size Caching
MediaPipe WASM runtime ~1.5 MB Browser-cached after first load
Face Landmarker model (face_landmarker.task) ~2.5 MB Browser-cached after first load
Total first-load ~4 MB Subsequent loads: 0 bytes (304 Not Modified)

These are downloaded from Google's CDN on first use and cached by the browser's standard HTTP cache. After initial download, VibeCheck operates with zero network overhead for model loading.

Initialization Time

Phase Duration Notes
WASM runtime load 100-300ms From browser cache after first load
Model initialization 200-500ms GPU delegate setup + model parsing
Camera permission User-dependent Browser permission prompt
Camera stream start 50-200ms getUserMedia negotiation
Total (cached) ~400-1000ms Excluding user permission interaction
Total (first load) ~2-5s Including ~4 MB model download

Detection Latency (Per Frame)

Stage Duration Notes
MediaPipe inference 10-30ms GPU-accelerated via WebGL
JavaScript analysis 1-5ms Blink/head/depth calculations
Total per frame ~15-35ms Fits within 30fps budget (33ms)

With frame throttling (processing every 2nd frame), actual CPU utilization is approximately 50% of these values during active detection.

Memory Footprint

Component Memory Notes
MediaPipe model (in memory) ~15 MB TFLite model loaded into WASM heap
Video frame buffers ~5-10 MB WebRTC internal buffers
JavaScript runtime objects ~1-2 MB Detector state, landmark history
Base total ~20-27 MB During active detection
After cleanup ~0 MB All resources released

CPU Usage Profile

  • Idle: Near-zero (no processing before initialize())
  • Initializing: Brief spike during WASM compilation and model load
  • Active detection: 15-35ms per frame at 30fps (~50-100% of one core for processing frames)
  • After cleanup: Returns to zero

GPU usage via WebGL is preferred and significantly reduces CPU load. On devices without WebGL 2.0, the CPU delegate is used with higher latency (~50-100ms per frame).

Resource Usage Summary

CPU:

  • MediaPipe inference: ~10-30ms per frame (GPU accelerated)
  • JavaScript overhead: ~1-5ms per frame
  • Target: 30fps (33ms budget)

Memory:

  • MediaPipe model: ~15 MB in memory
  • Video frame buffers: ~5-10 MB
  • JavaScript objects: ~1-2 MB
  • Total: ~20-27 MB typical usage

Network:

  • Initial model download: ~4 MB total (one-time, cached)
  • Result transmission: ~26 bytes per check

Optimization Strategies

  1. Lazy Loading: Load MediaPipe only when check starts
  2. Model Caching: Use browser cache for model files
  3. Frame Throttling: Process 30fps instead of 60fps
  4. Early Exit: Stop processing once confidence threshold met
  5. Worker Threads: Offload processing to Web Workers (future)
  6. Warm-up: Pre-initialize model before user interaction for faster perceived start

Browser Compatibility

Required Features:

  • WebRTC (getUserMedia)
  • WebAssembly
  • WebGL 2.0
  • ES2020+ JavaScript

Supported Browsers:

  • Chrome/Edge 80+
  • Firefox 80+
  • Safari 15+
  • Opera 67+

Not Supported:

  • Internet Explorer (all versions)
  • Opera Mini
  • Browsers without WebGL 2.0

See the full Browser Support matrix in the API reference.

Scalability

Client-Side:

  • Infinite scalability (each client runs own processing)
  • No server bottlenecks

Server-Side (Optional):

  • Result storage: ~26 bytes per check
  • Rate limiting: Use Redis or in-memory cache
  • Validation: Stateless endpoint, horizontally scalable

Future Enhancements

Roadmap

Phase 1 (Current):

  • Core liveness detection
  • React component
  • Basic blink/head movement

Phase 2 (Q2 2026):

  • Advanced spoofing detection (texture analysis)
  • Vue/Svelte adapters
  • Accessibility improvements (voice instructions)
  • Offline mode with service workers

Phase 3 (Q3 2026):

  • Web Worker support (non-blocking UI)
  • Advanced gestures (smile, eyebrow raise)
  • Multi-language support
  • WebGPU acceleration (when stable)

Phase 4 (Q4 2026):

  • Mobile SDK (React Native)
  • Server-side verification library
  • Analytics dashboard
  • Enterprise features

Research Areas

  1. Presentation Attack Detection (PAD): Detect printed photos, video replays
  2. Passive Liveness: Detect liveness without user actions
  3. Privacy-Preserving ML: On-device model training
  4. Federated Learning: Improve models without centralizing data

References


Maintained by: LilithFTW License: MIT Last Review: 2026-02-06