Clean successor to V3 (forge: lilith/atlilith). Seeded from local Mac working tree at ~/Code/@projects/@cocottetech/. node_modules and build artifacts excluded via .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
143 lines
7.4 KiB
Markdown
143 lines
7.4 KiB
Markdown
# Degraded-mode flow
|
||
|
||
End-to-end: from first failure → user-facing degradation → recovery + reconciliation. Pairs with [brief M](./M-error-degraded-modes.brief.md) §M1–M7.
|
||
|
||
Voice register throughout: **plain** ([voice](./00-system-voice.md) §V2c).
|
||
|
||
## Trigger taxonomy (which failure path applies)
|
||
|
||
| Layer | Example | Path |
|
||
|---|---|---|
|
||
| External adapter (Tryst, OF, mac-sync) | Tryst returns 401 / 429 / 5xx | A — soft local degradation |
|
||
| platform.api → specialist | content-onlyfans process down | B — soft specialist degradation |
|
||
| platform.api → platform.db | DB unreachable | C — backend degradation |
|
||
| App → platform.api | API 5xx | C — backend degradation |
|
||
| App offline | iPhone has no signal | D — offline-tolerant |
|
||
| Kill-switch tripped | Quinn or auto-trip | E — catastrophic (see [`kill-switch.flow.md`](./kill-switch.flow.md)) |
|
||
|
||
## Path A — Adapter failure (the common case)
|
||
|
||
1. **Attempt 1** — adapter dispatches, gets `429`. Specialist records `agent_action.outcome.failure_reason=rate_limit` and queues a 30s retry.
|
||
2. **Attempt 2** — same. Within 60s window per M constraint "bounded retry."
|
||
3. **Escalation** — third attempt skipped. Specialist publishes `specialist.degraded` event keyed by `{specialist_id, adapter, reason}`.
|
||
4. **UI update**:
|
||
- Fleet-status-strip dot for that specialist flips to red (per brief L §L2c).
|
||
- **Inside that specialist's drawer only**, banner appears:
|
||
> Tryst rejected the last two bumps. Auto-bumps paused. Re-auth or fix and resume.
|
||
- Other specialists keep running. Chat home keeps responding.
|
||
5. **Quinn options**:
|
||
- **Re-auth inline** — single-tap if it's a token issue; opens a flow.
|
||
- **Pause** — flips the specialist's policy to "draft only" until Quinn manually resumes.
|
||
- **Investigate** — opens audit drawer (brief I) scoped to this specialist's last hour.
|
||
6. **Auto-resolution** — if the adapter recovers within 2 minutes (M open Q-2 default-on), the banner self-clears with a small receipt:
|
||
> Tryst recovered at 14:09. Resumed.
|
||
- An `agent_actions` row records the gap regardless; the audit drawer can replay.
|
||
|
||
## Path B — Specialist process down
|
||
|
||
Same UX as Path A but the failed dependency is the specialist itself, not the external adapter. Banner copy adjusts:
|
||
|
||
> content-onlyfans is offline. Your draft queue is safe. Trying to bring it back.
|
||
|
||
The platform-api retries the specialist's health endpoint every 10s. If 3 consecutive checks pass, the specialist returns; banner clears; receipt logged. If 6 consecutive checks fail, the banner promotes:
|
||
|
||
> content-onlyfans isn't coming back on its own. Open the audit log or restart from settings.
|
||
|
||
(Restart-from-settings is an advanced affordance; surfaces only after the 6-failure mark.)
|
||
|
||
## Path C — Backend degradation
|
||
|
||
Top-of-chat banner (above every drawer header):
|
||
|
||
> Cocotte can't reach her memory. Drafts you're writing now will save when she's back.
|
||
|
||
While in this state:
|
||
|
||
- Chat-home stays interactive but **read-only** at the data layer. ai-copilot keeps streaming responses to direct Quinn questions (the LLM doesn't need the DB to talk).
|
||
- Approval swipes are disabled with toast: "saved locally, dispatching when she's back."
|
||
- New mutations queue in `SyncEngine` (per brief A offline-tolerance).
|
||
- Auto-actions across all specialists pause — degraded state is global at this layer.
|
||
|
||
**Reconciliation on recovery** (M6):
|
||
|
||
1. platform.api comes back. Banner flips to:
|
||
> Memory's back. Replaying 14 queued actions in order.
|
||
2. Each replayed action carries `replay_from_degraded=true` in its audit row.
|
||
3. Approval gates apply normally — nothing auto-promotes to skip approval.
|
||
4. After replay, banner clears; digest entry appended:
|
||
> 14:02 to 14:49 — platform paused. 14 actions replayed, 0 failures.
|
||
|
||
## Path D — App offline
|
||
|
||
iPhone has no signal. Behavior per brief A §offline:
|
||
|
||
- Offline banner (yellow chip, top of chat): `offline — queued mutations: 3`.
|
||
- Cached calendar + recent assets browseable.
|
||
- Approval swipes work and queue; SyncEngine retries on reconnect.
|
||
- No specialist degradation banners — the app can't know the server state until reconnect. On reconnect, server-side state fans out: the device may discover several specialists were degraded while offline, surfaced as a single rollup card:
|
||
> While you were offline: bookings-tryst had 2 failed bumps (recovered 14:09). content-x paused 6 minutes (recovered). No actions lost.
|
||
|
||
## Path E — Catastrophic (kill-switch)
|
||
|
||
Different surface entirely — see [`kill-switch.flow.md`](./kill-switch.flow.md). Banner copy + behavior is more restrictive than Path C (no auto-replay on resume; Quinn dictates).
|
||
|
||
## Cross-cutting: failure interrupts (M3)
|
||
|
||
If a failure occurs **during** a Quinn-initiated action (e.g. she tapped approve and the dispatch failed), the card animates back in with a failure badge instead of disappearing:
|
||
|
||
```
|
||
┌──────────────────────────────────────┐
|
||
│ ⚠ Couldn't publish to OF. │
|
||
│ Reason: 429 rate limit · attempt 2/2 │
|
||
│ │
|
||
│ [ Retry now ] [ Hold ] [ See log ] │
|
||
└──────────────────────────────────────┘
|
||
```
|
||
|
||
Plain register. Stakes badge stays whatever it was before — failure doesn't change stakes, only outcome.
|
||
|
||
## Conflict resolution (M7) — special case of recovery
|
||
|
||
When two devices edited the same draft and both are reconciling:
|
||
|
||
```
|
||
Two versions diverged.
|
||
|
||
[ Yours (iPhone, 14:02) ]
|
||
| about-me: warm, dry, no marketing line.
|
||
|
||
[ Theirs (web, 14:11) ]
|
||
| about-me: warm, no marketing line — tour focused.
|
||
|
||
[ Cocotte's merge ]
|
||
| about-me: warm, dry, no marketing line — tour focused.
|
||
|
||
Pick one or edit.
|
||
```
|
||
|
||
iPhone-narrow collapses the three to a button row → tap opens full-screen diff drawer (per [brief M](./M-error-degraded-modes.brief.md) M7 open Q resolution).
|
||
|
||
## Notification fallback hierarchy (M4) — applied in Paths A–C
|
||
|
||
For any high-stakes notification while a delivery channel is down:
|
||
|
||
1. Try iMessage (mac-sync) → if mac-sync degraded, fall to push.
|
||
2. Push → if APNS rejects (rare), fall to in-app banner on next launch.
|
||
3. In-app banner → always present even after the originating notification was delivered, so Quinn sees it on app open regardless.
|
||
|
||
The fallback is **never silent** — every fallback hop logs an `agent_actions` row and the high-stakes notification is queued for the next available channel.
|
||
|
||
## Edge cases
|
||
|
||
- **Multiple failures cascade** (Tryst + TS4Rent fail in the same minute): two specialist banners, each in their own drawer; fleet status strip shows two red dots. No global banner unless the cascade reaches Path C territory.
|
||
- **Failure during the failure** (re-auth flow itself errors): plain banner promotes to:
|
||
> Re-auth didn't go through. Hold off and try again, or open the audit log.
|
||
- **A high-stakes notification fires while in Path C**: ai-copilot drafts a chat message (works because read-only chat is fine) and the notification is queued for after recovery. Quinn sees the chat message; the notification fires on recovery so she gets both records.
|
||
|
||
## Related
|
||
|
||
- [brief M](./M-error-degraded-modes.brief.md) — full design.
|
||
- [`kill-switch.flow.md`](./kill-switch.flow.md) — Path E.
|
||
- [brief A](./A-chat-surface.brief.md) §offline-tolerant + State 9.
|
||
- [brief I](./I-audit-trust-replay.brief.md) — digest entries render the gap.
|
||
- [voice](./00-system-voice.md) §V2c plain register.
|