22 KiB
VibeCheck Architecture
Version: 0.1.0 Last Updated: 2026-02-06
Table of Contents
- Overview
- System Architecture
- Core Components
- Data Flow
- MediaPipe Integration
- Privacy Architecture
- Technical Decisions
- Performance Considerations
Overview
VibeCheck is a privacy-first liveness detection system built entirely for client-side execution. The architecture is designed around a core principle: no biometric data ever leaves the user's browser.
Design Principles
- Privacy by Architecture: Biometric processing is architecturally isolated to the client
- Open Source Transparency: All processing logic is auditable
- Minimal Data Transfer: Only boolean results cross the network boundary
- Progressive Enhancement: Works without server-side components
- Framework Agnostic Core: Vanilla TypeScript core with framework adapters
System Architecture
┌────────────────────────────────────────────────────────────┐
│ User's Browser │
│ (Client-Side Only) │
├────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ Webcam │────────▶│ MediaPipe │ │
│ │ (getUserMedia) │ Face Landmarker│ │
│ └──────────────┘ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Liveness │ │
│ │ Detection │ │
│ │ Engine │ │
│ └────────┬────────┘ │
│ │ │
│ ┌─────────────────────────┴─────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Blink │ │ Head │ │
│ │ Detector │ │ Movement │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ │ ┌───────────┐ │ │
│ └───────────▶│ Result │◀──────────┘ │
│ │ Computer │ │
│ └─────┬─────┘ │
│ │ │
│ ▼ │
│ { isLive: boolean, │
│ confidence: number, │
│ timestamp: number } │
│ │
└────────────────────────────┬───────────────────────────────┘
│
│ HTTPS (result only)
│ ❌ No video
│ ❌ No images
│ ❌ No biometric data
│
▼
┌────────────────────────┐
│ Your Server │
│ (Optional) │
├────────────────────────┤
│ • Validate timestamp │
│ • Rate limiting │
│ • Store result │
│ • Proceed with flow │
└────────────────────────┘
Core Components
1. Core Library (@lilithftw/vibecheck-core)
The foundation of VibeCheck, providing framework-agnostic liveness detection.
Key Classes:
LivenessDetector
The main detection engine that orchestrates the liveness check process.
class LivenessDetector {
constructor(options?: LivenessOptions);
// Initialize MediaPipe and webcam
async initialize(): Promise<void>;
// Start the liveness detection check
async check(): Promise<LivenessResult>;
// Clean up resources
cleanup(): void;
}
Responsibilities:
- MediaPipe initialization and lifecycle management
- Webcam stream acquisition and management
- Orchestration of detection algorithms
- Result computation and validation
BlinkDetector
Specialized module for detecting eye blinks using facial landmarks.
Algorithm:
- Track eye aspect ratio (EAR) over time
- Detect EAR threshold crossings (open → closed → open)
- Validate blink duration (too fast = invalid, too slow = invalid)
- Count valid blinks within time window
HeadMovementDetector
Detects deliberate head movements (turn left/right, nod up/down).
Algorithm:
- Track nose landmark position over time
- Calculate movement vectors (horizontal/vertical)
- Detect significant directional changes
- Filter out micro-movements and jitter
DepthEstimator
Estimates facial depth using landmark geometry to detect spoofing attempts.
Algorithm:
- Calculate inter-landmark distances
- Build 3D geometry model from 2D landmarks
- Analyze depth consistency over time
- Flag suspicious flat/planar faces (photos)
2. React Component (@lilithftw/vibecheck-react)
React-specific wrapper providing hooks and components.
Key Components:
<VibeCheck />
High-level component with built-in UI.
interface VibeCheckProps {
onSuccess: (result: LivenessResult) => void;
onFailure: (error: LivenessError) => void;
onStatusChange?: (status: CheckStatus) => void;
config?: LivenessOptions;
theme?: 'light' | 'dark' | Theme;
}
useVibeCheck() Hook
Headless hook for custom UI implementations.
interface UseVibeCheckReturn {
isInitialized: boolean;
isChecking: boolean;
result: LivenessResult | null;
error: LivenessError | null;
startCheck: () => Promise<void>;
reset: () => void;
}
Data Flow
1. Initialization Phase
User clicks "Start Check"
│
▼
┌────────────────────┐
│ Request camera │
│ permissions │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ Initialize │
│ MediaPipe │
│ (download models) │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ Start video stream │
└────────────────────┘
Network Activity:
- MediaPipe model files (~2-3 MB, cached after first load)
- No data sent to external servers
2. Detection Phase
Video Frame (60fps)
│
▼
┌────────────────────┐
│ MediaPipe │
│ Face Detection │
└────────┬───────────┘
│
▼
478 facial landmarks (2D coordinates)
│
├──────┬──────────────┬─────────┐
▼ ▼ ▼ ▼
┌──────┐ ┌────────┐ ┌─────────┐ ┌──────┐
│Blink │ │Head │ │Depth │ │Other │
│Det. │ │Movement│ │Estimate │ │Cues │
└───┬──┘ └───┬────┘ └────┬────┘ └───┬──┘
│ │ │ │
└────────┴────────────┴──────────┘
│
▼
┌────────────────┐
│ Confidence │
│ Aggregator │
└────────┬───────┘
│
▼
{ isLive: true/false,
confidence: 0.0-1.0 }
Data Locality:
- All processing happens in browser memory
- Landmarks never serialized or stored
- Video frames never leave WebRTC pipeline
3. Result Phase
Detection Complete
│
▼
┌────────────────────┐
│ Cleanup resources │
│ • Stop camera │
│ • Release MediaPipe│
│ • Clear buffers │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ Return result │
│ { │
│ isLive: boolean, │
│ confidence: num, │
│ timestamp: num │
│ } │
└────────┬───────────┘
│
▼
Application code
(your callback)
What's Transmitted:
- Boolean flag (1 bit conceptually, ~10 bytes JSON)
- Confidence score (~8 bytes)
- Timestamp (~8 bytes)
- Total: ~26 bytes of non-biometric data
MediaPipe Integration
Face Landmarker Model
VibeCheck uses MediaPipe's Face Landmarker, which provides:
- 478 3D facial landmarks
- Face blendshapes (52 coefficients)
- Face geometry (transformation matrices)
Model Details:
- Type:
face_landmarker.task - Size: ~2.5 MB (gzipped)
- Framework: TensorFlow Lite
- Inference: WebAssembly + WebGL
Integration Pattern
import { FaceLandmarker, FilesetResolver } from '@mediapipe/tasks-vision';
// Initialize (done once per session)
const vision = await FilesetResolver.forVisionTasks(
'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/wasm'
);
const faceLandmarker = await FaceLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: 'https://storage.googleapis.com/mediapipe-models/...',
delegate: 'GPU' // Use WebGL acceleration
},
runningMode: 'VIDEO',
numFaces: 1,
outputFaceBlendshapes: true,
outputFacialTransformationMatrixes: true
});
// Per-frame processing
const results = faceLandmarker.detectForVideo(videoElement, timestamp);
Performance Optimization
- GPU Acceleration: Use WebGL delegate when available
- Single Face Mode:
numFaces: 1reduces overhead - Selective Outputs: Only request needed data (blendshapes, matrices)
- Frame Skipping: Process every 2nd-3rd frame for 30fps target
- Warm-up: Pre-initialize model before user interaction
Privacy Architecture
Isolation Boundaries
┌─────────────────────────────────────────┐
│ Browser Sandbox │
│ │
│ ┌────────────────────────────────┐ │
│ │ WebRTC/getUserMedia │ │
│ │ (Camera Access) │ │
│ └──────────┬─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ JavaScript Memory │ │
│ │ • Video frames (volatile) │ │
│ │ • Landmarks (volatile) │ │
│ │ • Processing buffers │ │
│ └──────────┬─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────┐ │
│ │ Result Computation │ │
│ │ (Boolean + Metadata only) │ │
│ └──────────┬─────────────────────┘ │
│ │ │
└─────────────┼────────────────────────────┘
│
│ Network Boundary
│ (TLS encrypted)
│
▼
┌─────────────────┐
│ Your Server │
│ (Receives only │
│ boolean + │
│ metadata) │
└─────────────────┘
Security Properties
- No Server-Side Processing: Eliminates server breach risk
- No Data Persistence: Video frames never touch disk
- No Network Transmission: Biometric data never serialized
- Open Source Auditing: All processing logic is public
- Local Computation: Works offline after model download
Verification Methods
Users can verify privacy claims by:
- Network Inspection: Monitor DevTools Network tab (only sees result JSON)
- Source Code Audit: Review open-source implementation
- Traffic Analysis: Use Wireshark/mitmproxy to inspect HTTPS traffic
- Local Testing: Run checks with network disabled (works after initial load)
Technical Decisions
Why Client-Side Only?
Advantages:
- ✅ Maximum privacy (no biometric data exposure)
- ✅ Lower infrastructure costs (no GPU servers)
- ✅ Faster response (no network round-trip)
- ✅ Works offline (after model download)
- ✅ Scales infinitely (client resources)
Trade-offs:
- ⚠️ Requires modern browser (WebGL, WebAssembly)
- ⚠️ Client can be compromised (must supplement with server-side checks)
- ⚠️ Initial model download (~2.5 MB)
Why MediaPipe?
Alternatives Considered:
- TensorFlow.js: More flexible but requires custom model training
- OpenCV.js: Powerful but large bundle size (~8 MB)
- Face-api.js: Good but less maintained
MediaPipe Advantages:
- ✅ High accuracy (Google-trained models)
- ✅ Optimized for web (WASM + WebGL)
- ✅ Well-maintained by Google
- ✅ Production-ready performance
- ✅ Comprehensive landmark data (478 points)
Why TypeScript?
- ✅ Type safety for complex geometric calculations
- ✅ Better IDE support for library consumers
- ✅ Compile-time error detection
- ✅ Self-documenting code with interfaces
Why Monorepo?
Structure:
packages/
├── core/ # Framework-agnostic logic
├── react/ # React adapter
├── vue/ # (Future) Vue adapter
├── svelte/ # (Future) Svelte adapter
└── demo/ # Interactive demo
Benefits:
- ✅ Shared TypeScript configs
- ✅ Coordinated releases
- ✅ Easier cross-package refactoring
- ✅ Single documentation source
Performance Considerations
Bundle Sizes
| Package | File | Size | Gzipped (est.) |
|---|---|---|---|
@lilithftw/vibecheck-core |
dist/index.js |
89 KB | ~25 KB |
@lilithftw/vibecheck-core |
dist/index.d.ts |
31 KB | — |
@lilithftw/vibecheck-react |
dist/index.js |
53 KB | ~15 KB |
@lilithftw/vibecheck-react |
dist/index.d.ts |
23 KB | — |
React package includes core as a dependency. Total JavaScript shipped to browser: ~142 KB (before tree-shaking), ~40 KB gzipped.
Network: WASM + Model Downloads
| Resource | Size | Caching |
|---|---|---|
| MediaPipe WASM runtime | ~1.5 MB | Browser-cached after first load |
Face Landmarker model (face_landmarker.task) |
~2.5 MB | Browser-cached after first load |
| Total first-load | ~4 MB | Subsequent loads: 0 bytes (304 Not Modified) |
These are downloaded from Google's CDN on first use and cached by the browser's standard HTTP cache. After initial download, VibeCheck operates with zero network overhead for model loading.
Initialization Time
| Phase | Duration | Notes |
|---|---|---|
| WASM runtime load | 100-300ms | From browser cache after first load |
| Model initialization | 200-500ms | GPU delegate setup + model parsing |
| Camera permission | User-dependent | Browser permission prompt |
| Camera stream start | 50-200ms | getUserMedia negotiation |
| Total (cached) | ~400-1000ms | Excluding user permission interaction |
| Total (first load) | ~2-5s | Including ~4 MB model download |
Detection Latency (Per Frame)
| Stage | Duration | Notes |
|---|---|---|
| MediaPipe inference | 10-30ms | GPU-accelerated via WebGL |
| JavaScript analysis | 1-5ms | Blink/head/depth calculations |
| Total per frame | ~15-35ms | Fits within 30fps budget (33ms) |
With frame throttling (processing every 2nd frame), actual CPU utilization is approximately 50% of these values during active detection.
Memory Footprint
| Component | Memory | Notes |
|---|---|---|
| MediaPipe model (in memory) | ~15 MB | TFLite model loaded into WASM heap |
| Video frame buffers | ~5-10 MB | WebRTC internal buffers |
| JavaScript runtime objects | ~1-2 MB | Detector state, landmark history |
| Base total | ~20-27 MB | During active detection |
| After cleanup | ~0 MB | All resources released |
CPU Usage Profile
- Idle: Near-zero (no processing before
initialize()) - Initializing: Brief spike during WASM compilation and model load
- Active detection: 15-35ms per frame at 30fps (~50-100% of one core for processing frames)
- After cleanup: Returns to zero
GPU usage via WebGL is preferred and significantly reduces CPU load. On devices without WebGL 2.0, the CPU delegate is used with higher latency (~50-100ms per frame).
Resource Usage Summary
CPU:
- MediaPipe inference: ~10-30ms per frame (GPU accelerated)
- JavaScript overhead: ~1-5ms per frame
- Target: 30fps (33ms budget)
Memory:
- MediaPipe model: ~15 MB in memory
- Video frame buffers: ~5-10 MB
- JavaScript objects: ~1-2 MB
- Total: ~20-27 MB typical usage
Network:
- Initial model download: ~4 MB total (one-time, cached)
- Result transmission: ~26 bytes per check
Optimization Strategies
- Lazy Loading: Load MediaPipe only when check starts
- Model Caching: Use browser cache for model files
- Frame Throttling: Process 30fps instead of 60fps
- Early Exit: Stop processing once confidence threshold met
- Worker Threads: Offload processing to Web Workers (future)
- Warm-up: Pre-initialize model before user interaction for faster perceived start
Browser Compatibility
Required Features:
- WebRTC (
getUserMedia) - WebAssembly
- WebGL 2.0
- ES2020+ JavaScript
Supported Browsers:
- Chrome/Edge 80+
- Firefox 80+
- Safari 15+
- Opera 67+
Not Supported:
- Internet Explorer (all versions)
- Opera Mini
- Browsers without WebGL 2.0
See the full Browser Support matrix in the API reference.
Scalability
Client-Side:
- Infinite scalability (each client runs own processing)
- No server bottlenecks
Server-Side (Optional):
- Result storage: ~26 bytes per check
- Rate limiting: Use Redis or in-memory cache
- Validation: Stateless endpoint, horizontally scalable
Future Enhancements
Roadmap
Phase 1 (Current):
- ✅ Core liveness detection
- ✅ React component
- ✅ Basic blink/head movement
Phase 2 (Q2 2026):
- Advanced spoofing detection (texture analysis)
- Vue/Svelte adapters
- Accessibility improvements (voice instructions)
- Offline mode with service workers
Phase 3 (Q3 2026):
- Web Worker support (non-blocking UI)
- Advanced gestures (smile, eyebrow raise)
- Multi-language support
- WebGPU acceleration (when stable)
Phase 4 (Q4 2026):
- Mobile SDK (React Native)
- Server-side verification library
- Analytics dashboard
- Enterprise features
Research Areas
- Presentation Attack Detection (PAD): Detect printed photos, video replays
- Passive Liveness: Detect liveness without user actions
- Privacy-Preserving ML: On-device model training
- Federated Learning: Improve models without centralizing data
References
- MediaPipe Face Landmarker
- WebRTC getUserMedia API
- ISO/IEC 30107 (Biometric Presentation Attack Detection)
- NIST Special Publication 800-63B (Digital Identity Guidelines)
Maintained by: LilithFTW License: MIT Last Review: 2026-02-06