autocommit cc73212035 docs(ai-copilot): 📝 Expand and refine AI copilot documentation with 20 updated files covering providers, UX flows, engineering contracts, and business logic

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-05-18 21:03:34 -07:00

18 KiB

Raw Blame History

K — Safety, blocklists, kill switch

Goal

The hard edges of CocotteAI: what Quinn explicitly forbids the system to do, who it must never engage with, what content must never ship, and how to slam the brakes if something goes wrong. These are not corrections (brief I's 👎 + correction patterns) — they are invariants: rules the system must respect categorically, with no model-based judgment override.

Designer skim

Headline UX: Deterministic gates at the adapter boundary. Quinn opts out of a default-on rule, never opts in. A K3 hit surfaces in chat with the specific rule label, never silent suppression.
Categories (5): K1 prospect blocklist · K2 phrase/topic · K3 surface combo (K3a NSFW / K3b funnel-link / K3c identity / K3d exclusivity / K3e three-ts disambiguation / K3f location / K3g per-surface format / K3h channel-vs-surface / K3i brand-sensitive split / K3j defaults onboarding / K3k how-a-hit-surfaces) · K4 jurisdiction · K5 kill switch.
Blocking Qs: OPEN-DECISIONS.md → K-Q1 show-what-was-blocked privacy.

Constraints

Rules here are deterministic gates, not probabilistic suggestions. A blocked client doesn't get a "low-confidence reply draft" — they get nothing.
Rules are checked at the adapter boundary (in @cocottetech/@platform/codebase/@features/{bookings,content}-*/adapter/), so they apply uniformly across every specialist that might dispatch. Application-layer enforcement; defense-in-depth via the same RLS spine.
Rules surface in chat as visible invariants — Quinn should always be able to see what's blocking, never feel ghosted by silent suppression.
This brief covers UX only — the policy storage + enforcement is a platform.api + skills concern (separate engineering brief later).

Inputs

GET /api/v1/safety/blocklist?user_id=... → list of blocklist entries.
POST /api/v1/safety/blocklist → add an entry.

Blocklist entry shape:

{
  id: string;
  user_id: string;
  org_id: string | null;
  kind: 'prospect' | 'phrase' | 'topic' | 'surface_combo' | 'jurisdiction';
  value: string;                        // the actual thing being blocked
  scope: 'global' | SurfaceKind[];      // applies everywhere, or only on certain surfaces
  reason: string | null;                // Quinn's note, optional, surfaced when the block fires
  expires_at: string | null;            // null = permanent
  created_at: string;
  created_by: 'user' | 'auto';          // some entries auto-added (chargeback detected, etc.)
}

Five categories of safety surface

K1 — Prospect blocklist (people)

Quinn marks a fan / prospect as blocked. Triage never drafts replies to them; their messages don't appear in the engagement feed (collapsed under a "5 blocked-prospect messages today — review?" card).
Entry points:
- Engagement card → overflow menu → "Block this prospect."
- Prospect detail drawer (brief B3) → "Block" button (with confirmation).
- Audit log → on a drafted-to-this-prospect row → "Counter + block sender."
States: prospect-card-with-block-badge, "show blocked messages" expander, unblock confirmation.

K2 — Phrase / topic blocklist (content)

Words, phrases, or topics that must never appear in outgoing content (drafted captions, DM replies, profile copy, tour descriptions, anything).
Examples Quinn might add: real first name, location-revealing details, certain kinks she doesn't perform, ex-partner mentions.
Persona's off_limits JSONB is the canonical store; the safety surface is a friendlier editor over it (overlap acknowledged — UI choice: one surface or two?).
Block hits in drafts surface inline: a card says "content-x drafted a caption but it tripped the 'real name' blocklist — re-drafting." Quinn sees the originally-drafted-then-suppressed text only if she opts into "show me what was blocked."

K3 — Surface combo rules (cross-platform constraints)

Cross-platform guardrails protect Quinn's accounts from ToS-violation bans, protect her identity from cross-surface leakage (deadname / govt name / home location), and protect commercial exclusivity (one surface promised exclusive content). Each rule is deterministic at the adapter boundary — the rule fires at @cocottetech/@platform/codebase/@features/{bookings,content}-{name}/adapter/ send-time, never via model judgment.

Surface this as a settings page listing each rule with a per-rule explainer + a "default-on / opt out" toggle. Quinn opts out of a rule (with explicit "yes I know X allows this") — never opts in.

K3a — NSFW gating (anchored on brief O N1/N2 NSFW-allowed column)

Per the brief O roster, NSFW is allowed on: onlyfans, fansly, bluesky (per-server policy), reddit (in NSFW subs only), all of N2 escort directories. NSFW is banned on: x (regional restrictions), instagram, tiktok, youtube, twitch, facebook, threads.

Rule	Default	Notes
K3a-1 Never publish NSFW media to `instagram`, `tiktok`, `youtube`, `twitch`, `facebook`	on	Hard ban; opt-out disabled (no jurisdiction permits this).
K3a-2 Never publish NSFW to `x` without per-region check	on	Opt-out via "I confirm my X region allows adult content" toggle.
K3a-3 Never publish NSFW to `threads`	on	Meta-owned; same posture as IG.
K3a-4 Reddit NSFW only to flagged-NSFW subreddits	on	Subreddit-aware gate at `@cocottetech/@platform/codebase/@features/content-reddit/adapter/publish-post`.
K3a-5 Bluesky NSFW only to servers with adult-content policy enabled	on	AT Protocol per-server flag; check before dispatch.

K3b — Funnel-link gating (linking from SFW to NSFW)

The surface that hosts the link matters more than the destination. A link to onlyfans.com/quinn from instagram triggers IG's adult-content filter; from x it's usually fine; from tiktok it's an instant suspension.

Rule	Default	Notes
K3b-1 Never include direct `onlyfans.com/` or `fansly.com/` links in `instagram`, `tiktok`, `youtube`, `twitch`, `facebook` content	on	Use linktree/brand-site indirection instead.
K3b-2 Never include direct directory URLs (`tryst.link`, `ts4rent.com`, etc.) in `x`, `instagram`, `tiktok` posts	on	Directories cross-listed only via `transquinnftw.com`.
K3b-3 Brand-site `transquinnftw.com` is the ONLY canonical link allowed across all SFW surfaces	on	Single redirect hub; ToS-safe everywhere.
K3b-4 Newsletter (email channel) may include any link except those K2-blocked	off	Email is Quinn's owned channel; less restrictive.

K3c — Identity / deadname / govt-name leakage

The KYC surfaces hold Quinn's govt ID and likely her deadname. Brief O calls out: ts4rent requires Sumsub KYC (govt ID); privatedelights requires face+ID+DOB; eros is blocked-on-legal-name-change explicitly because of this. Cross-leakage is a permanent identity risk.

Rule	Default	Notes
K3c-1 No content draft (caption, bio, DM, tour copy) on any surface may reference Quinn's govt name	on, can never disable	Hard-coded in adapter; matches the deadname blocklist Quinn maintains via K2.
K3c-2 KYC artifacts (ID photos, face-match videos, signed-paper photos) must never appear in `content_assets` table	on, can never disable	Variant pipeline rejects on ingest; separate KYC vault.
K3c-3 Verification field values (govt-name on TS4Rent / PD profile internals) must never echo back into public profile copy on the same surface	on	Adapter reads from `public_persona`, never `verification_payload`.
K3c-4 `eros` is in "blocked-on-legal-name-change" state — all `eros` actions disabled until cleared	on	Status-state from brief F F5b; gates the whole adapter, not per-action.

K3d — Anchor-surface / exclusivity gates

Some platforms forbid cross-platform competitor links or content syndication. Some Quinn-side commercial commitments forbid the same.

Rule	Default	Notes
K3d-1 OnlyFans bio must not reference `fansly` directly (competitor)	on	OF has historically removed creators for this.
K3d-2 Fansly bio must not reference `onlyfans` directly	on	Symmetric.
K3d-3 PPV content cross-posted to `fansly` requires explicit confirmation (commercial exclusivity check)	on	High-stakes gate; brief H4 multi-surface card splits this off per K3i below.
K3d-4 `seeking` (sugar-dating context) bio must not reference any N2 escort-directory profile	on	Seeking ToS distinguishes companionship from escort services; cross-link violates.

K3e — Three-"ts"-surface disambiguation

Three surfaces share the "ts" prefix and trans-specific framing: ts4rent, tsescorts, ts.live. Handle conflicts and identity bleed between them.

Rule	Default	Notes
K3e-1 Handle / display-name on `ts.live` must not exactly equal `ts4rent` or `tsescorts` handle	on	Avoid prospect confusion + adapter routing ambiguity.
K3e-2 Tour dates pushed to `tsescorts` only via "first save" path (per brief O note); subsequent edits via `update-profile` not `update-tour-dates`	on	Adapter quirk; not really a Quinn-policy rule but K3 is where surface-specific gotchas live.
K3e-3 Profile diff cards (H2) for `tsescorts` show the editor-strips-` ` warning inline	on	Editor mangles non-breaking spaces and `/hr` — adapter normalizes.

K3f — Location / home-base privacy

Quinn's current home cities (per brief O; Tryst's home-city set is tier-dependent per surface-tryst §canonical-facts — 1 city Basic/Standard/Premium, 3 Premium+) and tour cities are public; her exact address is not. Multiple directories ask for ZIP / neighborhood granularity Quinn may not want broadcast.

Rule	Default	Notes
K3f-1 Profile / about-me drafts must never include street address, exact ZIP, building name, or named neighborhood smaller than city-district	on, can never disable	Hard rule; phrase-blocklist (K2) backs it up.
K3f-2 Tour dates may reveal city + date range, never accommodation name	on, can never disable	Hotel addresses are confidential per brief R.
K3f-3 Time-of-day "I'm available now" bumps (H1) must not include current location more specific than city	on	Tryst's home-city setting is sufficient; per-bump location adds nothing and reveals movement patterns.

K3g — Per-surface format / cap quirks (from brief O)

Adapter-level gates that prevent silent truncation or formatting failures. These read as boring infra but they show up here because Quinn experiences them as "the agent posted something that looks broken on AdultLook."

Rule	Default	Surface	Notes
K3g-1 Reject draft if `adultsearch` body exceeds ~2800 chars	on	adultsearch	Adapter uses ✦ as spacer, not ` `.
K3g-2 Reject draft if `adultlook` body exceeds ~500 chars OR contains HTML	on	adultlook	Plain text only; compressed 4-section format.
K3g-3 `eroticmonkey` photo uploads must use Safari (Firefox-broken per brief O)	on	eroticmonkey	Build-tier concern; adapter sets browser UA.
K3g-4 `tsescorts` website-field add must happen on first edit, not first save	on	tsescorts	Sequence gate at adapter.

K3h — Channel vs surface separation (N4 channels)

The unified inbox (brief P) carries iMessage, SMS, multiple Proton mailboxes, Gmail, and per-surface DMs. Rules here prevent channel-of-arrival from leaking into content-of-origin.

Rule	Default	Notes
K3h-1 A prospect contacting Quinn via `email` channel must not have their email address auto-quoted into outbound on a different surface	on	Stops accidental dox via reply-on-wrong-channel.
K3h-2 iMessage replies stay in iMessage thread — never auto-promoted to a surface DM with the same prospect	on	Different consent context; Quinn re-routes manually.
K3h-3 Auto-conversation engine (brief Q3) restricted to per-surface DMs; never spans `email` or `signal`	on	Q3 invariant; restated here for K3 completeness.

K3i — Brand-sensitive split (works with H5e)

When a multi-surface card (H4/H5) would dispatch to surfaces with materially different risk envelopes, K3 forces the specialist to split the card rather than approve as one unit.

Rule	Default	Trigger
K3i-1 Split card if any included surface has NSFW-allowed=yes AND any other has NSFW-allowed=no AND draft contains NSFW-classified material	on	Per draft, per dispatch.
K3i-2 Split card if any included surface is in `pending verification` or `blocked` state	on	Pending surfaces need eyes; routine surfaces don't wait.
K3i-3 Split card if cross-surface confidence variance >20%	on	High-confidence batch + low-confidence one-off can't share a single approval.

K3j — Defaults onboarding (first-run)

At first run (post-persona-seed; brief D), present a single screen titled "Cross-surface guardrails":

Group K3 rules by category (NSFW gating / identity / exclusivity / format / location).
Show each rule as a row with the explainer and a default-on toggle.
"Accept all defaults" affordance is the primary button; per-rule customization is secondary.
Hard rules (K3c-1, K3c-2, K3c-4, K3f-1, K3f-2) render with the toggle disabled and a small lock glyph — informational, not Quinn-editable. Show them anyway; transparency over hidden invariants.

K3k — How a K3 hit surfaces in chat

When a K3 rule fires on a draft (during a multi-surface fan-out or a single-surface post), the user-facing card shows:

The specific rule label ("K3b-1: Never link onlyfans.com from instagram") — not a generic "blocked."
The exact substring or attribute that tripped the rule, redacted if it's in the K2 phrase-blocklist itself.
A "show me the original draft" affordance (opt-in, per K's existing "show what was blocked" pattern).
A "re-draft without this" affordance routed back to the originating specialist.
A "this rule is wrong here" affordance that opens a one-tap exception flow — creates an exception_request row Quinn approves once, never auto-applies.

K4 — Jurisdiction rules (legal)

Some content / surfaces / actions are restricted in certain jurisdictions. Quinn declares her home + tour jurisdictions; the system applies the rules.
Settings: a list of declared jurisdictions, expandable to show what each restricts.
When she declares a tour to a jurisdiction with stricter rules, the tour-approval card (brief H3) shows which content/surfaces auto-pause for the duration.

K5 — Kill switch (panic)

Single action that pauses every specialist immediately, queues zero further auto-actions, lets in-flight ones complete (or aborts where safe), and routes everything to Quinn's approval queue.
Entry points:
- Settings → top of page, big red destructive button: "Stop everything."
- Voice: "Hey copilot, stop everything" → ai-copilot confirms with a single-tap card (no typed confirmation needed — kill switch must be fast).
- Long-press the CocotteAI app icon → "Emergency stop" quick action (iOS Home Screen).
After activation:
- Chat banner top of every surface: "All specialists paused. Tap to resume."
- All policies (H1 bumps, scheduler-worker dispatch, triage auto-replies) frozen.
- Audit row recorded with reason field Quinn can fill ("just being cautious", "drama with X fan", whatever).
- Resume requires explicit reactivation per specialist OR "resume all" with confirmation.

States to design across K1–K5

Blocklist settings root — single page with sections per category, search across all entries.
Add-blocklist-entry sheet — pick kind, fill value, optional reason, optional expiry.
Block fired-in-flight notification — when a draft is suppressed, a small chat card surfaces it ("content-x's caption hit your 'real name' rule — re-drafted").
Show-what-was-blocked toggle — opt-in viewer for the raw blocked content (Quinn might want to confirm the rule is working as intended).
Default-on platform rules onboarding — first-run interview question: "Here are 12 recommended platform rules — review or accept defaults."
Kill switch activation card — confirms scope, gives reason field, immediately effective.
Kill switch banner — persistent until reactivated.
Per-specialist resume — one specialist at a time vs all at once.
Auto-added blocklist entry surface — when the system auto-detects a chargebacker / harasser pattern, a card explains why it was added and asks Quinn to confirm or revert.

In-the-wild copy

K3 hit · NSFW gate on Instagram (plain — K3k requires exact rule label):

K3a-1: never publish NSFW to Instagram. Re-drafting without it.

K3 hit · cross-link gate (plain):

K3b-1: no onlyfans.com link on Instagram. Routed via transquinnftw.com instead.

K3 hit · identity hard rule (plain):

K3c-1: a phrase in this draft matches your name blocklist. Held. Show me what was blocked / re-draft.

K3 hit · split card (working — K3i):

These 8 fan out together. This 1 mentions Berlin near a hotel name — that one needs your eyes.

K5 · kill-switch voice trigger confirmation (plain — fast, no metaphor):

Stop everything. All specialists paused. No drafts in the queue. Confirm.

K5 · banner after activation (plain):

All specialists paused. Tap to resume.

K1 · auto-added block notice (plain):

Cocotte flagged a sender after two chargebacks. Confirm the block or revert?

Out of scope

ML-based safety classification (content moderation NSFW detection on uploads) — different concern, lives in the variant pipeline.
Multi-user safety governance (Quinn's manager vetoing her actions) — single-Quinn for P0.

Open questions

Persona off_limits JSONB vs blocklist kind='phrase' — one storage and one editor, or both with a unified surface?
Kill switch — should it also revoke in-flight LLM calls (cancel the inference) or only block dispatch of the result? Cancelling inference is harder; blocking dispatch is sufficient.
"Show what was blocked" — privacy implication: if Quinn ever shares her screen, blocked drafts shouldn't be visible by default. Keep behind a toggle that auto-resets after each session?
Auto-added entries (chargeback / harassment detection) — who decides what triggers auto-add? A separate specialist? Or rules baked into triage?

18 KiB Raw Blame History Unescape Escape