Clean successor to V3 (forge: lilith/atlilith). Seeded from local Mac working tree at ~/Code/@projects/@cocottetech/. node_modules and build artifacts excluded via .gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.8 KiB
A — Chat surface (primary)
Goal
The chat with ai-copilot is Quinn's daily driver in CocotteAI (iOS app, consumer-facing name). She opens the app → sees what needs her attention → approves, asks, gets nudged. Conversation IS navigation; visual surfaces are reachable from chat (see brief B), not the default.
Designer skim
- Headline UX: Open app → see attention-cards inline in the chat stream → swipe-right approve / tap-edit / swipe-left reject. Voice push-to-talk + photo drop are first-class input.
- States (10): first-run, mid-conversation, awaiting approval, voice push-to-talk, voice hands-free, streaming reply, post-decision, specialist mention, error/blocked publish, offline.
- Pair-with:
chat-home.screen.md,approval-card.screen.md,day-in-life.flow.md. - Blocking Qs: see OPEN-DECISIONS.md → A-Q1 voice-trigger, A-Q2 card-count collapse, A-Q3 streaming-reply renderer.
Constraints
- iOS 17+, SwiftUI, Swift 5.9+; built from
~/Code/@applications/lilith-messenger-iosfoundations. - Companion-led IA: no bottom tabs. Top-bar reveals specialist threads + settings overlay.
- Voice-capable from P0 (push-to-talk; long-press hands-free for short answers).
- Multimodal input: drag a photo into the input bar → drops as a chat message that triggers the variant-producer flow.
- Streaming responses (token-by-token) for ai-copilot replies.
- Offline-tolerant: reuse
lilith-messenger-ios/Core/Persistence/{SyncEngine,MessageStore,ThreadCache}.swiftadapted to V3 schema. - Optimistic UI on approvals: card animates out immediately; mutation queues via SyncEngine; retry on reconnect.
Inputs
POST /api/v1/chatonai-copilot:3791returns{ reply: string, cards: ApprovalCard[], specialist: string }.GET /api/v1/chat/pending-approvals/:user_idreturnsApprovalCard[].ApprovalCardshape:{ card_id: string; kind: 'content_post' | 'content_plan' | 'engagement_event'; ref_id: string; surface: SurfaceKind; // 'onlyfans' | 'x' | 'instagram' | ... scheduled_for: string | null; stakes: 'low' | 'medium' | 'high'; confidence: number; // 0..1 title: string; body_preview: string; }
States to design
- First-run / empty — no plans, no engagement, no approvals. Suggest "tell me about yourself" → persona-seed flow (brief D).
- Mid-conversation — Quinn typing; companion replied; no cards. 2b. Multi-message turn assembling — Quinn just sent N messages within the silence window; ai-copilot hasn't started replying. Visual: thin pulsing dot under last bubble, "Send now" affordance visible. 2c. Interrupted streaming reply — Quinn sent a new message mid-reply; the partial reply gets a small "stopped here" marker; new turn starts.
- Awaiting approval — N cards inline. Mix of stakes / kinds / surfaces.
- Voice-active (push-to-talk) — mic depressed; waveform; transcript scrolling.
- Voice-active (long-press hands-free) — short-window listening with "yes / no / edit" intent recognition.
- Streaming reply — partial assistant message rendering token-by-token.
- Post-decision confirmation — Quinn approved/edited/rejected; toast or inline confirmation.
- Specialist mention — companion says "strategist drafted 14 days" —
strategistis tappable → opens specialist drawer (brief B). - Error / blocked publish — surface adapter failed; card flips to
needs_attentionwith retry/reschedule/escalate. - Offline — read-only with queued-mutations indicator; cached calendar still browsable.
Interactions
Multi-message partial requests (debounced turns)
Real chat is fragmented — Quinn sends "wait", "actually", "and put it Friday not Thursday" across three messages in 4 seconds. ai-copilot must not fire three separate responses. The chat surface debounces incoming Quinn messages into a single turn.
Behavior:
- After Quinn sends a message, ai-copilot waits a silence window (default: 2.5s, voice 1.5s) before treating the turn as complete.
- If Quinn sends another message within the window, the timer resets and both messages compose the turn.
- Streaming reply doesn't start until the window closes.
- Hard cap: 30s of accumulated typing — at that point ai-copilot starts the turn anyway, with the messages-so-far. Quinn's later messages become a new turn.
Visible affordances:
- While the silence window is counting down, a thin progress dot pulses under Quinn's last message ("composing turn…"). Tap-to-skip immediately fires the turn.
- "Send now" affordance (small ↗ button next to input) ends the window early.
- Multi-message bubbles render visually grouped — same author cluster, no avatar between, tighter vertical spacing — so the reader sees them as one thought.
Voice equivalent:
- Push-to-talk: Quinn's speech segments coalesce until she taps "done" or pauses 1.5s.
- Hands-free: VAD-based turn-taking; ai-copilot replies after sustained silence (1.5s) — matches V2 §V2a hearth-register's read-aloud constraint.
Edge cases:
- Quinn sends a message → immediately taps an approval card → that's a single turn with both: chat message + card action. ai-copilot sees the full context, not two split inputs.
- Quinn sends a message during ai-copilot's streaming reply → reply cancels, both messages compose a new turn (per voice §V2c plain register: never talk over the user).
- Quinn types, walks away mid-sentence — at the 30s hard cap, ai-copilot fires the turn with what it has. Quinn returning later just starts a new turn.
Open Q (added as A-Q4): should the silence window be Quinn-tunable (settings: "I type slow" / "fast")? Lean: tunable via voice ("type slower" / "type faster") rather than settings — keep settings minimal.
Other gestures
- Approval card gestures: swipe right = approve, swipe left = reject, tap = open edit drawer. Haptic on each.
- Stakes badge top-right (low = gray dot, medium = yellow chip, high = red chip + display haptic).
- Long-press the badge reveals a small "why" popover: "High because: PPV pricing involves a money commitment + this fan's first purchase. Confidence 0.78 — below the auto-publish threshold."
- The popover cites the policy that produced the classification — pulled from the same
outcome_json.whyfield that the audit row exposes (brief I3). One source, two surfaces.
- Confidence as thin bar (0–100%) along card top.
- Batch mode: long-press a card → enters multi-select; bottom bar offers Approve N / Reject N / Defer N.
- Photo drop: drag image into input bar OR iOS Share Sheet from Photos → ai-copilot opens variant-producer with vision request to
@model-boss.
In-the-wild copy
(Pulled from voice §V5; register noted.)
State 1 · first-run empty (hearth, dialed warmer):
Welcome. Tell Cocotte what you tend to, and what you don't want anyone touching. Five minutes, then she takes over.
State 2 · multi-message turn assembling (hearth — ambient cue, not interrupt):
· composing turn… (pulses under Quinn's last bubble; tap to fire now)
State 6 · interrupted streaming reply (plain marker on the cancelled partial):
ai-copilot stopped here · your turn
State 3 · awaiting approval, OF post card (working):
content-onlyfans has three drafts in the drawer. The middle one's the tour-tease — confidence 0.83. Approve to send 9pm, edit before you send, or set aside.
State 7 · post-decision confirmation, low-stakes (hearth):
Tucked in. Receipt's in the digest.
State 9 · error / blocked publish (plain):
Tryst rejected the last bump. You're not visible there right now. Re-auth or pause?
Stakes-badge long-press popover, high (working, terse):
High because PPV pricing involves a money commitment + this fan's first purchase. Confidence 0.78 — below the auto-publish floor.
Out of scope
- Multiple-thread navigation (one direct thread per specialist is P5).
- Avatar / @chobit integration (P5+).
- iPad/macOS Catalyst layouts (see brief E).
Open questions
- A-Q1 Voice trigger word vs always tap-to-talk?
- A-Q2 What is the upper bound on inline cards before we collapse to a "+N more" stack?
- A-Q3 Streaming reply rendering: full markdown vs plaintext vs constrained rich-card markup?
- A-Q4 Silence-window duration tunable per-Quinn? Lean: tunable via voice command ("type slower"/"type faster"), not buried in settings. Default 2.5s text / 1.5s voice.
- A-Q5 When Quinn interrupts a streaming reply with a new message, does ai-copilot keep the partial reply visible (with a "cancelled here" marker) or remove it entirely? Lean: keep visible with marker; helps Quinn see what the model was about to say.