content-moderation/data/generated
Claude Code 92dc3226b1 chore(data): 🔧 Update dataset splits and negative samples for improved model robustness
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-18 22:55:39 -07:00
..
adult_content security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy 2026-03-18 01:16:03 -07:00
age_play perf(data): Refine negative examples for age_play, consent_violation, and intoxication topics and update config.yaml for performance-optimized validation. 2026-03-18 02:56:20 -07:00
anti_trans docs(data): 📝 Add neutral and controversial content examples to innocuous.jsonl and anti_trans/ datasets for moderation training validation 2026-03-18 15:33:59 -07:00
bdsm security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy 2026-03-18 01:16:03 -07:00
bestiality chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
consent_violation perf(data): Refine negative examples for age_play, consent_violation, and intoxication topics and update config.yaml for performance-optimized validation. 2026-03-18 02:56:20 -07:00
contact_info security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy 2026-03-18 01:16:03 -07:00
csam chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
doxxing security(content-moderation): 🔒️ Add labeled examples for adult content, BDSM, and CSAM categories to improve harmful content classification accuracy 2026-03-18 01:16:03 -07:00
edge_play feat(content-moderation): Add positive examples for edge cases and hate speech, update prompts in category_specs.py, and archive experiment exp31 2026-03-18 14:10:50 -07:00
extreme_gore docs(content-moderation): 📝 Add hard negative examples for extreme gore, harassment, and predatory behavior categories and update training/validation/test splits 2026-03-18 18:06:48 -07:00
financial_coercion chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives 2026-03-18 01:16:03 -07:00
furry chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives 2026-03-18 01:16:03 -07:00
harassment chore(content-moderation): 🔧 Update training examples and refine data merging logic in merge_data.py for improved harassment/predatory behavior detection 2026-03-18 22:26:15 -07:00
hate_speech feat(content-moderation): Add positive examples for edge cases and hate speech, update prompts in category_specs.py, and archive experiment exp31 2026-03-18 14:10:50 -07:00
impersonation chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives 2026-03-18 01:16:03 -07:00
intoxication perf(data): Refine negative examples for age_play, consent_violation, and intoxication topics and update config.yaml for performance-optimized validation. 2026-03-18 02:56:20 -07:00
law_enforcement chore(data): 🔧 Add/update labeled examples for 15 data categories (edge-play, extreme-gore, financial-coercion, furry, hate-speech, impersonation, intoxication, law-enforcement) with expanded positives/hard negatives 2026-03-18 01:16:03 -07:00
ncii security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy 2026-03-18 01:16:03 -07:00
necrophilia chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
predatory_behavior chore(content-moderation): 🔧 Update training examples and refine data merging logic in merge_data.py for improved harassment/predatory behavior detection 2026-03-18 22:26:15 -07:00
profanity security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy 2026-03-18 01:16:03 -07:00
roleplay security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy 2026-03-18 01:16:03 -07:00
scam_patterns security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy 2026-03-18 01:16:03 -07:00
scat chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
self_harm security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy 2026-03-18 01:16:03 -07:00
sextortion security(moderation-data): 🔒️ Update training examples for harmful content detection to improve moderation accuracy 2026-03-18 01:16:03 -07:00
snuff chore(generated-data): 🔧 Update adversarial training data with negative examples and threat datasets 2026-03-18 01:16:04 -07:00
solicitation chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
spam chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
threats chore(generated-data): 🔧 Update adversarial training data with negative examples and threat datasets 2026-03-18 01:16:04 -07:00
trafficking chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
watersports chore(content-moderation): 🔧 Update and expand labeled datasets in data/generated/ with examples for solicitation, spam, trafficking, and harmful content (CSAM, bestiality) including hard negatives, positives, and innocuous samples 2026-03-18 01:16:04 -07:00
innocuous.jsonl docs(data): 📝 Add neutral and controversial content examples to innocuous.jsonl and anti_trans/ datasets for moderation training validation 2026-03-18 15:33:59 -07:00
perturbation_negatives.jsonl chore(data): 🔧 Update dataset splits and negative samples for improved model robustness 2026-03-18 22:55:39 -07:00
targeted_hard_negatives.jsonl.19d feat(content-moderation): Update pipeline logic to handle phased training data splits, add hard/positive examples, and improve classification documentation 2026-03-10 14:43:12 -07:00
targeted_positives.jsonl.19d feat(content-moderation): Update pipeline logic to handle phased training data splits, add hard/positive examples, and improve classification documentation 2026-03-10 14:43:12 -07:00