cocottetech/@platform/codebase/@features/ai-copilot/docs/degraded-mode.flow.md
natalie 1b719e1fd7 chore(bootstrap): initial V4 commit
Clean successor to V3 (forge: lilith/atlilith). Seeded from local Mac
working tree at ~/Code/@projects/@cocottetech/. node_modules and build
artifacts excluded via .gitignore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 08:11:41 -07:00

7.4 KiB
Raw Blame History

Degraded-mode flow

End-to-end: from first failure → user-facing degradation → recovery + reconciliation. Pairs with brief M §M1M7.

Voice register throughout: plain (voice §V2c).

Trigger taxonomy (which failure path applies)

Layer Example Path
External adapter (Tryst, OF, mac-sync) Tryst returns 401 / 429 / 5xx A — soft local degradation
platform.api → specialist content-onlyfans process down B — soft specialist degradation
platform.api → platform.db DB unreachable C — backend degradation
App → platform.api API 5xx C — backend degradation
App offline iPhone has no signal D — offline-tolerant
Kill-switch tripped Quinn or auto-trip E — catastrophic (see kill-switch.flow.md)

Path A — Adapter failure (the common case)

  1. Attempt 1 — adapter dispatches, gets 429. Specialist records agent_action.outcome.failure_reason=rate_limit and queues a 30s retry.
  2. Attempt 2 — same. Within 60s window per M constraint "bounded retry."
  3. Escalation — third attempt skipped. Specialist publishes specialist.degraded event keyed by {specialist_id, adapter, reason}.
  4. UI update:
    • Fleet-status-strip dot for that specialist flips to red (per brief L §L2c).
    • Inside that specialist's drawer only, banner appears:

      Tryst rejected the last two bumps. Auto-bumps paused. Re-auth or fix and resume.

    • Other specialists keep running. Chat home keeps responding.
  5. Quinn options:
    • Re-auth inline — single-tap if it's a token issue; opens a flow.
    • Pause — flips the specialist's policy to "draft only" until Quinn manually resumes.
    • Investigate — opens audit drawer (brief I) scoped to this specialist's last hour.
  6. Auto-resolution — if the adapter recovers within 2 minutes (M open Q-2 default-on), the banner self-clears with a small receipt:

    Tryst recovered at 14:09. Resumed.

    • An agent_actions row records the gap regardless; the audit drawer can replay.

Path B — Specialist process down

Same UX as Path A but the failed dependency is the specialist itself, not the external adapter. Banner copy adjusts:

content-onlyfans is offline. Your draft queue is safe. Trying to bring it back.

The platform-api retries the specialist's health endpoint every 10s. If 3 consecutive checks pass, the specialist returns; banner clears; receipt logged. If 6 consecutive checks fail, the banner promotes:

content-onlyfans isn't coming back on its own. Open the audit log or restart from settings.

(Restart-from-settings is an advanced affordance; surfaces only after the 6-failure mark.)

Path C — Backend degradation

Top-of-chat banner (above every drawer header):

Cocotte can't reach her memory. Drafts you're writing now will save when she's back.

While in this state:

  • Chat-home stays interactive but read-only at the data layer. ai-copilot keeps streaming responses to direct Quinn questions (the LLM doesn't need the DB to talk).
  • Approval swipes are disabled with toast: "saved locally, dispatching when she's back."
  • New mutations queue in SyncEngine (per brief A offline-tolerance).
  • Auto-actions across all specialists pause — degraded state is global at this layer.

Reconciliation on recovery (M6):

  1. platform.api comes back. Banner flips to:

    Memory's back. Replaying 14 queued actions in order.

  2. Each replayed action carries replay_from_degraded=true in its audit row.
  3. Approval gates apply normally — nothing auto-promotes to skip approval.
  4. After replay, banner clears; digest entry appended:

    14:02 to 14:49 — platform paused. 14 actions replayed, 0 failures.

Path D — App offline

iPhone has no signal. Behavior per brief A §offline:

  • Offline banner (yellow chip, top of chat): offline — queued mutations: 3.
  • Cached calendar + recent assets browseable.
  • Approval swipes work and queue; SyncEngine retries on reconnect.
  • No specialist degradation banners — the app can't know the server state until reconnect. On reconnect, server-side state fans out: the device may discover several specialists were degraded while offline, surfaced as a single rollup card:

    While you were offline: bookings-tryst had 2 failed bumps (recovered 14:09). content-x paused 6 minutes (recovered). No actions lost.

Path E — Catastrophic (kill-switch)

Different surface entirely — see kill-switch.flow.md. Banner copy + behavior is more restrictive than Path C (no auto-replay on resume; Quinn dictates).

Cross-cutting: failure interrupts (M3)

If a failure occurs during a Quinn-initiated action (e.g. she tapped approve and the dispatch failed), the card animates back in with a failure badge instead of disappearing:

┌──────────────────────────────────────┐
│ ⚠ Couldn't publish to OF.            │
│ Reason: 429 rate limit · attempt 2/2 │
│                                      │
│ [ Retry now ] [ Hold ] [ See log ]   │
└──────────────────────────────────────┘

Plain register. Stakes badge stays whatever it was before — failure doesn't change stakes, only outcome.

Conflict resolution (M7) — special case of recovery

When two devices edited the same draft and both are reconciling:

Two versions diverged.

[ Yours (iPhone, 14:02) ]
| about-me: warm, dry, no marketing line.

[ Theirs (web, 14:11) ]
| about-me: warm, no marketing line — tour focused.

[ Cocotte's merge ]
| about-me: warm, dry, no marketing line — tour focused.

Pick one or edit.

iPhone-narrow collapses the three to a button row → tap opens full-screen diff drawer (per brief M M7 open Q resolution).

Notification fallback hierarchy (M4) — applied in Paths AC

For any high-stakes notification while a delivery channel is down:

  1. Try iMessage (mac-sync) → if mac-sync degraded, fall to push.
  2. Push → if APNS rejects (rare), fall to in-app banner on next launch.
  3. In-app banner → always present even after the originating notification was delivered, so Quinn sees it on app open regardless.

The fallback is never silent — every fallback hop logs an agent_actions row and the high-stakes notification is queued for the next available channel.

Edge cases

  • Multiple failures cascade (Tryst + TS4Rent fail in the same minute): two specialist banners, each in their own drawer; fleet status strip shows two red dots. No global banner unless the cascade reaches Path C territory.
  • Failure during the failure (re-auth flow itself errors): plain banner promotes to:

    Re-auth didn't go through. Hold off and try again, or open the audit log.

  • A high-stakes notification fires while in Path C: ai-copilot drafts a chat message (works because read-only chat is fine) and the notification is queued for after recovery. Quinn sees the chat message; the notification fires on recovery so she gets both records.