platform-docs/features-guide/content-moderation-workflow.md
2026-02-18 10:48:50 -08:00

4.6 KiB

Content Moderation Workflow

This document describes how content moderation works on the Lilith Platform — how user-generated content is reviewed, flagged, reported, and corrected.

How Content Review Works

All user-generated content passes through automated moderation before publication. The system validates content against platform facts, flags policy violations, and suggests corrections.

Content Types Subject to Moderation

Content Type Rules Applied Review Level
Creator bios Economics + competitors + terminology Full validation
Marketplace listings Economics + terminology Standard validation
Messages Terminology only Light validation
Reviews Terminology + prohibited claims Standard validation

Automated Review Pipeline

  1. Content submitted — User creates or updates bio, listing, message, or review
  2. Rule matching — Content checked against 200+ platform facts in YAML knowledge base
  3. Issue detection — Violations flagged by severity (critical, high, medium, low)
  4. Auto-correction — Simple terminology fixes applied automatically via regex patterns
  5. LLM correction — Complex rewrites handled by DeepSeek R1 Distill (when available)
  6. Result returned — Content approved, corrected, flagged for review, or blocked

Severity Levels and Actions

Severity Example Action
Critical Mentioning competitor platforms (OnlyFans, Fansly) Block content, require edit
High Incorrect pricing claims ("hourly rates") Flag for review, suggest correction
Medium Non-standard terminology ("cam model" instead of "creator") Auto-correct with suggestion
Low Minor style inconsistencies Log only, no action

Reporting and Flagging

How Content Gets Flagged

Content can be flagged through two channels:

  1. Automated flagging: The truth validation engine detects policy violations during content submission
  2. User reports: Other users can report content they believe violates platform policies

Flag Review Process

When content is flagged for manual review:

  1. Flag queued — Content enters the moderation review queue
  2. Reviewer assigned — Moderation team member picks up the flag
  3. Context gathered — Reviewer sees the original content, auto-detected issues, and suggested corrections
  4. Decision made — Reviewer approves, edits, or removes the content
  5. User notified — Content creator receives notification of the moderation decision

Fail-Open Safety Model

If the moderation service is temporarily unavailable:

  • Content is approved by default (fail-open)
  • A review flag is set for asynchronous moderation when service recovers
  • This prevents moderation outages from blocking legitimate user activity

Knowledge Base Structure

The truth validation engine checks content against three categories of rules:

Economics Rules (economics.yaml)

  • Pricing model facts (session-based, not hourly)
  • Commission structure (0% platform commission)
  • Payment method information
  • Subscription tier details

Competitor Rules (competitors.yaml)

  • Prohibited platform mentions (OnlyFans, Patreon, Fansly, Chaturbate)
  • Prohibited comparison claims
  • Brand consistency enforcement

Terminology Rules (terminology.yaml)

  • Preferred terms ("creator" not "model", "sex worker" not derogatory terms)
  • Brand-specific language
  • Industry-standard inclusive vocabulary

Correction Modes

The system supports two correction modes:

Regex Mode (Fast, Always Available)

  • Pattern-based find-and-replace for known terminology violations
  • ~5ms per correction
  • 100% uptime (no external dependencies)

LLM Mode (Semantic, Context-Aware)

  • Uses DeepSeek R1 Distill for context-aware rewrites
  • Understands nuance (e.g., "I used to work on OnlyFans" vs "Join me on OnlyFans")
  • Falls back to regex mode when LLM is unavailable
  • ~2-5s per correction

Integration Points

Content moderation integrates with three features:

  • Marketplace: Validates listing content before publishing
  • Profile: Validates creator bio content on updates
  • Messaging: Real-time message validation in conversations

Features import ContentModerationModule and use TruthIntegrationService for validation.

Metrics

  • 80%+ automated: Most content corrections handled without human intervention
  • $25.5K/month saved: Combined moderation labor + support ticket reduction
  • 200+ facts: Platform knowledge base for validation
  • 99.9% uptime: Fail-open architecture ensures availability

Last Updated: 2026-02-18