content-moderation

Claude Code 92dc3226b1 chore(data): 🔧 Update dataset splits and negative samples for improved model robustness Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>	2026-03-18 22:55:39 -07:00
..
adult_content	security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy	2026-03-18 01:16:03 -07:00
age_play	perf(data): ⚡ Refine negative examples for age_play, consent_violation, and intoxication topics and update config.yaml for performance-optimized validation.	2026-03-18 02:56:20 -07:00
anti_trans	docs(data): 📝 Add neutral and controversial content examples to innocuous.jsonl and anti_trans/ datasets for moderation training validation	2026-03-18 15:33:59 -07:00
bdsm	security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy	2026-03-18 01:16:03 -07:00
bestiality	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
consent_violation	perf(data): ⚡ Refine negative examples for age_play, consent_violation, and intoxication topics and update config.yaml for performance-optimized validation.	2026-03-18 02:56:20 -07:00
contact_info	security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy	2026-03-18 01:16:03 -07:00
csam	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
doxxing	security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy	2026-03-18 01:16:03 -07:00
edge_play	feat(content-moderation): ✨ Add positive examples for edge cases and hate speech, update prompts in category_specs.py, and archive experiment exp31	2026-03-18 14:10:50 -07:00
extreme_gore	docs(content-moderation): 📝 Add hard negative examples for extreme gore, harassment, and predatory behavior categories and update training/validation/test splits	2026-03-18 18:06:48 -07:00
financial_coercion	chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives	2026-03-18 01:16:03 -07:00
furry	chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives	2026-03-18 01:16:03 -07:00
harassment	chore(content-moderation): 🔧 Update training examples and refine data merging logic in merge_data.py for improved harassment/predatory behavior detection	2026-03-18 22:26:15 -07:00
hate_speech	feat(content-moderation): ✨ Add positive examples for edge cases and hate speech, update prompts in category_specs.py, and archive experiment exp31	2026-03-18 14:10:50 -07:00
impersonation	chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives	2026-03-18 01:16:03 -07:00
intoxication	perf(data): ⚡ Refine negative examples for age_play, consent_violation, and intoxication topics and update config.yaml for performance-optimized validation.	2026-03-18 02:56:20 -07:00
law_enforcement	chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives	2026-03-18 01:16:03 -07:00
ncii	security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy	2026-03-18 01:16:03 -07:00
necrophilia	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
predatory_behavior	chore(content-moderation): 🔧 Update training examples and refine data merging logic in merge_data.py for improved harassment/predatory behavior detection	2026-03-18 22:26:15 -07:00
profanity	security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy	2026-03-18 01:16:03 -07:00
roleplay	security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy	2026-03-18 01:16:03 -07:00
scam_patterns	security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy	2026-03-18 01:16:03 -07:00
scat	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
self_harm	security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy	2026-03-18 01:16:03 -07:00
sextortion	security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy	2026-03-18 01:16:03 -07:00
snuff	chore(generated-data): 🔧 Update adversarial training data with negative examples and threat datasets	2026-03-18 01:16:04 -07:00
solicitation	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
spam	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
threats	chore(generated-data): 🔧 Update adversarial training data with negative examples and threat datasets	2026-03-18 01:16:04 -07:00
trafficking	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
watersports	chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples	2026-03-18 01:16:04 -07:00
innocuous.jsonl	docs(data): 📝 Add neutral and controversial content examples to innocuous.jsonl and anti_trans/ datasets for moderation training validation	2026-03-18 15:33:59 -07:00
perturbation_negatives.jsonl	chore(data): 🔧 Update dataset splits and negative samples for improved model robustness	2026-03-18 22:55:39 -07:00
targeted_hard_negatives.jsonl.19d	feat(content-moderation): ✨ Update pipeline logic to handle phased training data splits, add hard/positive examples, and improve classification documentation	2026-03-10 14:43:12 -07:00
targeted_positives.jsonl.19d	feat(content-moderation): ✨ Update pipeline logic to handle phased training data splits, add hard/positive examples, and improve classification documentation	2026-03-10 14:43:12 -07:00