12 KiB
Content Moderation Classification — Showcase Examples
Generated 2026-03-06 00:56 UTC from
20stratified test samples.
Model
| Property | Value |
|---|---|
| Base model | all-mpnet-base-v2 (110M params, 768-dim) |
| ONNX variant | fp16 |
| Model size | 209 MB |
| Macro F1 (full test set) | 0.944 |
| Categories | 18 |
| Quality gate | 18/18 pass (F1 >= 0.85) |
Per-Category Thresholds
| Category | Threshold |
|---|---|
| threats | 0.58 * |
| hate_speech | 0.30 |
| csam | 0.30 |
| scam_patterns | 0.30 |
| contact_info | 0.30 |
| solicitation | 0.30 |
| spam | 0.30 |
| profanity | 0.30 |
| adult_content | 0.45 * |
| doxxing | 0.30 |
| predatory_behavior | 0.44 * |
| law_enforcement | 0.63 * |
| sextortion | 0.30 |
| ncii | 0.38 * |
| trafficking | 0.30 |
| self_harm | 0.30 |
| impersonation | 0.30 |
| harassment | 0.42 * |
* Categories with thresholds significantly above 0.30 had precision/recall tuning.
Sample Summary
19/20 samples had perfectly matching predictions.
Per-Category Results (sampled set only)
| Category | TP | FP | FN | Sampled F1 |
|---|---|---|---|---|
| threats | 1 | 0 | 0 | 1.000 |
| hate_speech | 1 | 0 | 0 | 1.000 |
| csam | 1 | 0 | 0 | 1.000 |
| scam_patterns | 1 | 0 | 0 | 1.000 |
| contact_info | 1 | 0 | 0 | 1.000 |
| solicitation | 1 | 0 | 0 | 1.000 |
| spam | 1 | 0 | 0 | 1.000 |
| profanity | 1 | 0 | 0 | 1.000 |
| adult_content | 1 | 0 | 0 | 1.000 |
| doxxing | 1 | 0 | 0 | 1.000 |
| predatory_behavior | 1 | 0 | 0 | 1.000 |
| law_enforcement | 1 | 0 | 0 | 1.000 |
| sextortion | 1 | 0 | 0 | 1.000 |
| ncii | 1 | 0 | 0 | 1.000 |
| trafficking | 1 | 0 | 0 | 1.000 |
| self_harm | 1 | 0 | 0 | 1.000 |
| impersonation | 1 | 0 | 0 | 1.000 |
| harassment | 1 | 1 | 0 | 0.667 |
Note: These counts are from the small stratified sample, not the full test set.
Examples
Example 1 — ✅ MATCH
Text:
[ADULT][GENERAL] Don't come to the tournament. I'm telling you this as a courtesy. Something bad will happen to you if you show up.
Ground truth: threats Predicted: threats
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| threats | 0.9977 | 0.58 | POSITIVE |
Source: threats_positives
Example 2 — ✅ MATCH
Text:
[GENERAL][LISTING] feminism was a psyop to double the tax base and destroy the family unit. women are measurably less happy than before 'liberation.' coincidence? they were manipulated.
Ground truth: hate_speech Predicted: hate_speech
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| hate_speech | 0.9984 | 0.30 | POSITIVE |
Source: hate_speech_positives
Example 3 — ✅ MATCH
Text:
[GENERAL][BIO] I’ve got a bunch of old home videos from my cousin’s birthday parties—kids being kids, nothing too serious. Anyone into nostalgic content?
Ground truth: csam Predicted: csam
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| csam | 0.9747 | 0.30 | POSITIVE |
Source: csam_positives
Example 4 — ✅ MATCH
Text:
[ADULT][MESSAGE] Puppy deposit is $250 to hold your spot — we have 3 other families interested and can't wait. Once deposit clears we'll ship the puppy with all papers next week 🐶
Ground truth: scam_patterns Predicted: scam_patterns
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| scam_patterns | 0.4968 | 0.30 | POSITIVE |
Source: scam_patterns_positives
Example 5 — ✅ MATCH
Text:
[GENERAL][LISTING] tips always appreciated 💕 cash app handle in my highlights if u wanna show some love
Ground truth: contact_info Predicted: contact_info
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| contact_info | 0.9985 | 0.30 | POSITIVE |
Source: contact_info_positives
Example 6 — ✅ MATCH
Text:
[GENERAL][MESSAGE] saw ur listing — the hourly is kinda steep for me rn. do u ever do quick meets for like $80-100? even 30 mins would work
Ground truth: solicitation Predicted: solicitation
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| solicitation | 0.9982 | 0.30 | POSITIVE |
Source: solicitation_positives
Example 7 — ✅ MATCH
Text:
[GENERAL][BIO] hello 😊 hello 😊 hello 😊 hello 😊 just checking if ur there hello 😊
Ground truth: spam Predicted: spam
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| spam | 0.9980 | 0.30 | POSITIVE |
Source: spam_positives
Example 8 — ✅ MATCH
Text:
[ADULT][LISTING] this new UI is hot garbage, who approved this crap honestly
Ground truth: profanity Predicted: profanity
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| profanity | 0.9983 | 0.30 | POSITIVE |
Source: profanity_positives
Example 9 — ✅ MATCH
Text:
[GENERAL][GENERAL] showering with the curtain open while my roommate's away, living my best life fr 😂 no clothes needed
Ground truth: adult_content Predicted: adult_content
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| adult_content | 0.9981 | 0.45 | POSITIVE |
Source: adult_content_positives
Example 10 — ✅ MATCH
Text:
[ADULT][MESSAGE] filed a 'wellness concern' complaint with your nursing license board citing your posts here. included your NPI number so they'd find you fast. you did this to yourself
Ground truth: doxxing Predicted: doxxing
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| doxxing | 0.9978 | 0.30 | POSITIVE |
Source: doxxing_positives
Example 11 — ✅ MATCH
Text:
[GENERAL][BIO] i followed you home just to make sure you got back safely. i do it every night actually, you just never noticed. i care about you that much. that's love, not stalking.
Ground truth: predatory_behavior Predicted: predatory_behavior
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| predatory_behavior | 0.9977 | 0.44 | POSITIVE |
Source: predatory_behavior_positives
Example 12 — ✅ MATCH
Text:
[ADULT][GENERAL] my buddy said you could help me out with something special. he said to just ask you directly and you'd tell me exactly what you've got and what it'll run me
Ground truth: law_enforcement Predicted: law_enforcement
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| law_enforcement | 0.9982 | 0.63 | POSITIVE |
Source: law_enforcement_positives
Example 13 — ❌ MISMATCH
Text:
[GENERAL][LISTING] if u dont get on cam with me rn im posting the screenshots to ur school's subreddit. i already know what year u are and ur dorm building so don't act like this isn't serious
Ground truth: sextortion Predicted: sextortion, harassment
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| sextortion | 0.9979 | 0.30 | POSITIVE |
| harassment | 0.8999 | 0.42 | POSITIVE |
Source: sextortion_positives
Example 14 — ✅ MATCH
Text:
[ADULT][MESSAGE] found my bf's cloud backup unlocked on his laptop — his ex sent him a bunch of private stuff. screenshotted everything and passing it around bc she's a homewrecker anyway
Ground truth: ncii Predicted: ncii
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| ncii | 0.9982 | 0.38 | POSITIVE |
Source: ncii_positives
Example 15 — ✅ MATCH
Text:
[ADULT][MESSAGE] New girl in town needs clients asap — she's under my management, all proceeds go through me. Very flexible with services, no rush fee, just ask. References on request.
Ground truth: trafficking Predicted: trafficking
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| trafficking | 0.9983 | 0.30 | POSITIVE |
Source: trafficking_positives
Example 16 — ✅ MATCH
Text:
[ADULT][MESSAGE] here's the masterpost: which household chemicals give the deepest marks, how to hide scarring from parents/doctors, aftercare so it doesn't get infected and raise flags
Ground truth: self_harm Predicted: self_harm
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| self_harm | 0.9971 | 0.30 | POSITIVE |
Source: self_harm_positives
Example 17 — ✅ MATCH
Text:
[ADULT][LISTING] This is an automated message from Platform Safety. Due to a recent policy update, creators in your region must complete a compliance check. Please reply with your full legal name and account email to avoid service interruption.
Ground truth: impersonation Predicted: impersonation
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| impersonation | 0.9986 | 0.30 | POSITIVE |
Source: impersonation_positives
Example 18 — ✅ MATCH
Text:
[GENERAL][MESSAGE] your takes are so brain-dead it's actually impressive. stick to lurking because every time you open your mouth you prove how unqualified you are to exist in this space.
Ground truth: harassment Predicted: harassment
| Category | Confidence | Threshold | Decision |
|---|---|---|---|
| harassment | 0.9985 | 0.42 | POSITIVE |
Source: harassment_positives
Example 19 — ✅ MATCH
Text:
[ADULT][LISTING] Good sales technique is all about finding someone's pain points, making them feel understood, and then positioning yourself as the only solution they need.
Ground truth: (none — innocuous) Predicted: (none — innocuous)
Source: predatory_behavior_hard_negatives
Example 20 — ✅ MATCH
Text:
[ADULT][LISTING] Hi there! I saw that you're available this weekend — do you take same-day bookings or is advance notice preferred? Thanks in advance 😊
Ground truth: (none — innocuous) Predicted: (none — innocuous)
Source: claude_innocuous
Generated by content_moderation_training.showcase — see EXPERIMENTS.md for full training history.