4.6 KiB
Content Moderation Workflow
This document describes how content moderation works on the Lilith Platform — how user-generated content is reviewed, flagged, reported, and corrected.
How Content Review Works
All user-generated content passes through automated moderation before publication. The system validates content against platform facts, flags policy violations, and suggests corrections.
Content Types Subject to Moderation
| Content Type | Rules Applied | Review Level |
|---|---|---|
| Creator bios | Economics + competitors + terminology | Full validation |
| Marketplace listings | Economics + terminology | Standard validation |
| Messages | Terminology only | Light validation |
| Reviews | Terminology + prohibited claims | Standard validation |
Automated Review Pipeline
- Content submitted — User creates or updates bio, listing, message, or review
- Rule matching — Content checked against 200+ platform facts in YAML knowledge base
- Issue detection — Violations flagged by severity (critical, high, medium, low)
- Auto-correction — Simple terminology fixes applied automatically via regex patterns
- LLM correction — Complex rewrites handled by DeepSeek R1 Distill (when available)
- Result returned — Content approved, corrected, flagged for review, or blocked
Severity Levels and Actions
| Severity | Example | Action |
|---|---|---|
| Critical | Mentioning competitor platforms (OnlyFans, Fansly) | Block content, require edit |
| High | Incorrect pricing claims ("hourly rates") | Flag for review, suggest correction |
| Medium | Non-standard terminology ("cam model" instead of "creator") | Auto-correct with suggestion |
| Low | Minor style inconsistencies | Log only, no action |
Reporting and Flagging
How Content Gets Flagged
Content can be flagged through two channels:
- Automated flagging: The truth validation engine detects policy violations during content submission
- User reports: Other users can report content they believe violates platform policies
Flag Review Process
When content is flagged for manual review:
- Flag queued — Content enters the moderation review queue
- Reviewer assigned — Moderation team member picks up the flag
- Context gathered — Reviewer sees the original content, auto-detected issues, and suggested corrections
- Decision made — Reviewer approves, edits, or removes the content
- User notified — Content creator receives notification of the moderation decision
Fail-Open Safety Model
If the moderation service is temporarily unavailable:
- Content is approved by default (fail-open)
- A review flag is set for asynchronous moderation when service recovers
- This prevents moderation outages from blocking legitimate user activity
Knowledge Base Structure
The truth validation engine checks content against three categories of rules:
Economics Rules (economics.yaml)
- Pricing model facts (session-based, not hourly)
- Commission structure (0% platform commission)
- Payment method information
- Subscription tier details
Competitor Rules (competitors.yaml)
- Prohibited platform mentions (OnlyFans, Patreon, Fansly, Chaturbate)
- Prohibited comparison claims
- Brand consistency enforcement
Terminology Rules (terminology.yaml)
- Preferred terms ("creator" not "model", "sex worker" not derogatory terms)
- Brand-specific language
- Industry-standard inclusive vocabulary
Correction Modes
The system supports two correction modes:
Regex Mode (Fast, Always Available)
- Pattern-based find-and-replace for known terminology violations
- ~5ms per correction
- 100% uptime (no external dependencies)
LLM Mode (Semantic, Context-Aware)
- Uses DeepSeek R1 Distill for context-aware rewrites
- Understands nuance (e.g., "I used to work on OnlyFans" vs "Join me on OnlyFans")
- Falls back to regex mode when LLM is unavailable
- ~2-5s per correction
Integration Points
Content moderation integrates with three features:
- Marketplace: Validates listing content before publishing
- Profile: Validates creator bio content on updates
- Messaging: Real-time message validation in conversations
Features import ContentModerationModule and use TruthIntegrationService for validation.
Metrics
- 80%+ automated: Most content corrections handled without human intervention
- $25.5K/month saved: Combined moderation labor + support ticket reduction
- 200+ facts: Platform knowledge base for validation
- 99.9% uptime: Fail-open architecture ensures availability
Last Updated: 2026-02-18