Quinn Ftw 56def4fe53 docs(features-guide): 📝 Update feature guide documentation with revised content, structure, and examples

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-02-18 10:48:50 -08:00

4.6 KiB

Raw Permalink Blame History

Content Moderation Workflow

This document describes how content moderation works on the Lilith Platform — how user-generated content is reviewed, flagged, reported, and corrected.

How Content Review Works

All user-generated content passes through automated moderation before publication. The system validates content against platform facts, flags policy violations, and suggests corrections.

Content Types Subject to Moderation

Content Type	Rules Applied	Review Level
Creator bios	Economics + competitors + terminology	Full validation
Marketplace listings	Economics + terminology	Standard validation
Messages	Terminology only	Light validation
Reviews	Terminology + prohibited claims	Standard validation

Automated Review Pipeline

Content submitted — User creates or updates bio, listing, message, or review
Rule matching — Content checked against 200+ platform facts in YAML knowledge base
Issue detection — Violations flagged by severity (critical, high, medium, low)
Auto-correction — Simple terminology fixes applied automatically via regex patterns
LLM correction — Complex rewrites handled by DeepSeek R1 Distill (when available)
Result returned — Content approved, corrected, flagged for review, or blocked

Severity Levels and Actions

Severity	Example	Action
Critical	Mentioning competitor platforms (OnlyFans, Fansly)	Block content, require edit
High	Incorrect pricing claims ("hourly rates")	Flag for review, suggest correction
Medium	Non-standard terminology ("cam model" instead of "creator")	Auto-correct with suggestion
Low	Minor style inconsistencies	Log only, no action

Reporting and Flagging

How Content Gets Flagged

Content can be flagged through two channels:

Automated flagging: The truth validation engine detects policy violations during content submission
User reports: Other users can report content they believe violates platform policies

Flag Review Process

When content is flagged for manual review:

Flag queued — Content enters the moderation review queue
Reviewer assigned — Moderation team member picks up the flag
Context gathered — Reviewer sees the original content, auto-detected issues, and suggested corrections
Decision made — Reviewer approves, edits, or removes the content
User notified — Content creator receives notification of the moderation decision

Fail-Open Safety Model

If the moderation service is temporarily unavailable:

Content is approved by default (fail-open)
A review flag is set for asynchronous moderation when service recovers
This prevents moderation outages from blocking legitimate user activity

Knowledge Base Structure

The truth validation engine checks content against three categories of rules:

Economics Rules (`economics.yaml`)

Pricing model facts (session-based, not hourly)
Commission structure (0% platform commission)
Payment method information
Subscription tier details

Competitor Rules (`competitors.yaml`)

Prohibited platform mentions (OnlyFans, Patreon, Fansly, Chaturbate)
Prohibited comparison claims
Brand consistency enforcement

Terminology Rules (`terminology.yaml`)

Preferred terms ("creator" not "model", "sex worker" not derogatory terms)
Brand-specific language
Industry-standard inclusive vocabulary

Correction Modes

The system supports two correction modes:

Regex Mode (Fast, Always Available)

Pattern-based find-and-replace for known terminology violations
~5ms per correction
100% uptime (no external dependencies)

LLM Mode (Semantic, Context-Aware)

Uses DeepSeek R1 Distill for context-aware rewrites
Understands nuance (e.g., "I used to work on OnlyFans" vs "Join me on OnlyFans")
Falls back to regex mode when LLM is unavailable
~2-5s per correction

Integration Points

Content moderation integrates with three features:

Marketplace: Validates listing content before publishing
Profile: Validates creator bio content on updates
Messaging: Real-time message validation in conversations

Features import ContentModerationModule and use TruthIntegrationService for validation.

Metrics

80%+ automated: Most content corrections handled without human intervention
$25.5K/month saved: Combined moderation labor + support ticket reduction
200+ facts: Platform knowledge base for validation
99.9% uptime: Fail-open architecture ensures availability

Last Updated: 2026-02-18

4.6 KiB Raw Permalink Blame History