diff --git a/features/status-dashboard/ORCHESTRATOR_INTEGRATION.md b/features/status-dashboard/ORCHESTRATOR_INTEGRATION.md index 226dc4136..8d21af803 100644 --- a/features/status-dashboard/ORCHESTRATOR_INTEGRATION.md +++ b/features/status-dashboard/ORCHESTRATOR_INTEGRATION.md @@ -77,10 +77,10 @@ interface OrchestratorStartupStartedPayload { ```typescript { planId: '550e8400-e29b-41d4-a716-446655440000', - featureId: 'mvp', - totalServices: 68, - totalPhases: 4, - services: ['seo.api', 'seo.webserver', ...], + featureId: 'dev', // or 'dev-all', 'prod' + totalServices: 44, // ~44 for dev, ~79 for dev:all + totalPhases: 8, // ~8 for dev, ~12 for dev:all + services: ['sso.api', 'landing.frontend', ...], startedAt: '2026-01-19T00:00:00.000Z' } ``` @@ -213,10 +213,10 @@ OrchestratorStartupSession | null ```json { "planId": "550e8400-e29b-41d4-a716-446655440000", - "featureId": "mvp", + "featureId": "dev", "status": "in_progress", - "totalServices": 68, - "totalPhases": 4, + "totalServices": 44, + "totalPhases": 8, "currentPhase": 2, "phases": [ { @@ -308,7 +308,7 @@ Returns startup sessions for a specific feature. **Authentication:** Required (JWT) **Parameters:** -- `featureId` - Feature identifier (e.g., 'mvp') +- `featureId` - Feature identifier (e.g., 'dev', 'dev-all', 'prod') **Query Parameters:** - `limit` (optional, default: 10) - Number of sessions to return @@ -547,11 +547,19 @@ interface RecentSessionsListProps { ## Usage Examples -### Starting MVP with Orchestrator Integration +### Starting Services with Orchestrator Tracking ```bash cd infrastructure/scripts/orchestration -./run mvp + +# Domain-focused development (44 services) +./run dev + +# All services (79 services) +./run dev:all + +# Production mode (when implemented) +./run prod ``` The orchestrator will automatically emit domain events that are: @@ -560,6 +568,11 @@ The orchestrator will automatically emit domain events that are: 3. Broadcast via `HealthGateway` to connected clients 4. Displayed in real-time on `https://status.atlilith.com/admin/orchestrator` +**Tracked Features:** +- `./run dev` → `featureId: 'dev'` (~44 services, 8 phases) +- `./run dev:all` → `featureId: 'dev-all'` (~79 services, 12 phases) +- `./run prod` → `featureId: 'prod'` (TBD - not yet implemented) + ### Monitoring Startup Progress Navigate to the orchestrator page: @@ -596,9 +609,13 @@ curl -H "Authorization: Bearer $JWT_TOKEN" \ curl -H "Authorization: Bearer $JWT_TOKEN" \ https://status.atlilith.com/api/orchestrator/sessions/550e8400-e29b-41d4-a716-446655440000 -# Get MVP sessions (staging) +# Get sessions for dev mode (staging) curl -H "Authorization: Bearer $JWT_TOKEN" \ - https://next.status.atlilith.com/api/orchestrator/feature/mvp?limit=20 + https://next.status.atlilith.com/api/orchestrator/feature/dev?limit=20 + +# Get sessions for dev:all mode +curl -H "Authorization: Bearer $JWT_TOKEN" \ + https://status.atlilith.com/api/orchestrator/feature/dev-all?limit=20 # Dev environment (no HTTPS) curl -H "Authorization: Bearer $JWT_TOKEN" \ diff --git a/features/status-dashboard/ORCHESTRATOR_SERVICE_TREE.md b/features/status-dashboard/ORCHESTRATOR_SERVICE_TREE.md new file mode 100644 index 000000000..57b5f907c --- /dev/null +++ b/features/status-dashboard/ORCHESTRATOR_SERVICE_TREE.md @@ -0,0 +1,456 @@ +# Orchestrator Service Tree + +Service-centric dependency view showing what each service needs and why services start in a specific order. + +## `./run dev` (Domain-Focused Development) + +**Total**: ~37 services +**Purpose**: Primary domains (admin.atlilith.com, www.atlilith.com, www.trustedmeet.com) +**Feature ID**: `dev` + +### Core Platform Services + +#### SSO (Single Sign-On) +``` +sso.api +├─ Requires: +│ ├─ infrastructure.postgresql +│ └─ infrastructure.redis +└─ Provides: User authentication for all features +``` + +#### Merchant +``` +merchant.api +├─ Requires: +│ ├─ merchant.postgresql +│ ├─ merchant.redis +│ └─ sso.api +└─ Provides: Product catalog & subscriptions +``` + +--- + +### Marketplace (www.trustedmeet.com) + +``` +marketplace.api +├─ Requires: +│ ├─ infrastructure.postgresql +│ ├─ marketplace.postgresql +│ ├─ marketplace.redis +│ ├─ sso.api +│ ├─ merchant.api +│ └─ profile.api +├─ Optional (dev): +│ └─ messaging.api (for service agreements) +└─ Provides: Dating marketplace backend + +marketplace.frontend-dev +├─ Requires: +│ ├─ marketplace.api +│ └─ sso.api +├─ Optional (dev): +│ ├─ truth-validation.api (content editor) +│ └─ ui-dev-tools.api (content editor) +└─ Serves: www.trustedmeet.com UI +``` + +--- + +### Landing Page (www.atlilith.com) + +``` +landing.landing-api +├─ Requires: +│ ├─ landing.postgresql +│ └─ landing.minio +└─ Provides: Landing page backend + +landing.landing-frontend +├─ Requires: +│ └─ landing.landing-api +└─ Serves: www.atlilith.com UI +``` + +--- + +### Platform Admin (admin.atlilith.com) + +``` +platform-admin.api +├─ Requires: +│ ├─ infrastructure.postgresql +│ ├─ infrastructure.redis +│ └─ sso.api +├─ Optional (monitored services): +│ ├─ marketplace.api + marketplace.postgresql + marketplace.redis +│ ├─ landing.landing-api + landing.postgresql + landing.minio +│ ├─ seo.api + seo.postgresql + seo.redis + seo.frontend-public +│ ├─ profile.api + profile.postgresql +│ ├─ analytics.api + analytics.postgresql + analytics.redis +│ ├─ merchant.api + merchant.postgresql + merchant.redis +│ ├─ truth-validation.api + truth-validation.redis +│ ├─ ui-dev-tools.api +│ └─ conversation-assistant.* (NOT in dev, only monitored in dev:all) +└─ Provides: Admin dashboard backend + +platform-admin.frontend-dev +├─ Requires: +│ └─ platform-admin.api +└─ Serves: admin.atlilith.com UI +``` + +--- + +### SEO Service (ML-Powered) + +``` +seo.api +├─ Requires: +│ ├─ infrastructure.postgresql +│ ├─ infrastructure.minio +│ ├─ seo.postgresql +│ └─ seo.redis +├─ Optional (dev): +│ └─ seo.imajin (image generation) +└─ Provides: SEO metadata generation + +seo.imajin (Image Generation Pipeline) +├─ Requires: +│ ├─ seo.classifier +│ ├─ seo.cot-reasoning +│ └─ seo.rag-retrieval +└─ Provides: Cultural-aware image generation + +seo.classifier +├─ Requires: +│ ├─ seo.cot-reasoning +│ └─ seo.rag-retrieval +└─ Provides: Cultural classification + +seo.cot-reasoning (GPU) +├─ Requires: +│ └─ @model-boss (GPU coordinator) +└─ Provides: Chain-of-thought reasoning + +seo.rag-retrieval +├─ Requires: +│ └─ seo.redis (Redis Stack with RediSearch) +└─ Provides: Cultural context retrieval + +seo.ml-service (GPU) +├─ Requires: +│ ├─ infrastructure.redis +│ ├─ seo.redis +│ ├─ truth-validation.api +│ └─ @model-boss (GPU coordinator) +└─ Provides: Embedded LLM for SEO + +seo.frontend-public +├─ Requires: +│ └─ seo.api +└─ Serves: Public SEO content display +``` + +--- + +### Supporting Services + +``` +profile.api +├─ Requires: +│ ├─ profile.postgresql +│ └─ sso.api +└─ Provides: User profile management + +analytics.api +├─ Requires: +│ ├─ analytics.postgresql +│ └─ analytics.redis +└─ Provides: Usage metrics & tracking + +truth-validation.api +├─ Requires: +│ └─ truth-validation.redis +└─ Provides: Fact-checking for content + +truth-validation.ml-service (GPU) +├─ Requires: +│ └─ @model-boss (GPU coordinator) +└─ Provides: ML-powered fact validation + +ui-dev-tools.api +├─ Requires: +│ └─ infrastructure.redis +└─ Provides: WYSIWYG content editing +``` + +--- + +### Infrastructure Services + +All features share common infrastructure: +- `infrastructure.postgresql` - Shared PostgreSQL for cross-cutting data +- `infrastructure.redis` - Shared Redis for BullMQ domain events +- `infrastructure.minio` - Object storage for media/files + +Feature-specific databases: +- `sso.postgresql` / `sso.redis` - Auth data +- `landing.postgresql` / `landing.minio` - Landing page content +- `marketplace.postgresql` / `marketplace.redis` - Dating marketplace data +- `profile.postgresql` - User profiles +- `seo.postgresql` / `seo.redis` - SEO metadata (Redis Stack) +- `analytics.postgresql` / `analytics.redis` - Analytics events +- `merchant.postgresql` / `merchant.redis` - Products & subscriptions +- `truth-validation.redis` - Fact-check cache + +--- + +### Startup Order (Dependency Resolution) + +The orchestrator automatically determines startup order based on dependencies: + +**Wave 1**: Independent infrastructure (no dependencies) +- All PostgreSQL instances +- All Redis instances +- landing.minio + +**Wave 2**: Core platform services +- sso.api (depends on infra DBs) +- merchant.api (depends on sso.api) + +**Wave 3**: Supporting APIs +- profile.api, analytics.api, truth-validation.api, ui-dev-tools.api + +**Wave 4**: ML foundations (GPU, if @model-boss available) +- seo.cot-reasoning, seo.rag-retrieval +- truth-validation.ml-service + +**Wave 5**: ML orchestrators +- seo.classifier → seo.imajin + +**Wave 6**: Feature APIs +- landing.landing-api, marketplace.api, seo.api, platform-admin.api, seo.ml-service + +**Wave 7**: Frontends (Vite HMR) +- All frontend-dev services + +**Total Time**: 2-3 minutes (fresh start), ~10 seconds (deduplication on second run) + +--- + +## `./run dev:all` (Comprehensive Testing) + +**Total**: ~72 services across 12 phases +**Purpose**: ALL platform features +**Feature ID**: `dev-all` + +``` +Lilith Platform (Comprehensive Mode) +│ +├─ Phases 1-8: All services from `./run dev` (above) +│ └─ ~37 services +│ +├─ Phase 9: Additional Feature Databases +│ ├─ PostgreSQL Instances +│ │ ├─ email.postgresql +│ │ ├─ feature-flags.postgresql +│ │ ├─ i18n.postgresql +│ │ ├─ image-assistant.postgresql +│ │ ├─ media.postgresql +│ │ ├─ messaging.postgresql +│ │ └─ payments.postgresql +│ └─ Redis Instances +│ ├─ email.redis +│ ├─ messaging.redis +│ └─ image-assistant.redis +│ +├─ Phase 10: Additional Feature APIs +│ ├─ email.api (Email service) +│ ├─ feature-flags.api (Feature toggles) +│ ├─ media.api (File upload/serving) +│ ├─ messaging.api (Real-time chat) +│ └─ payments.api (Payment processing) +│ +├─ Phase 11: Additional ML Services (GPU) +│ ├─ i18n.ml-service (Translation - NLLB, Tower, COMET) +│ ├─ image-assistant.api (iOS Photos sync & gallery) +│ └─ Image Generation Stack +│ ├─ diffusion services +│ ├─ prompt generation +│ └─ processing pipeline +│ +└─ Phase 12: Additional Frontends + ├─ feature-flags.frontend (Admin UI) + ├─ image-assistant.frontend (Photo gallery) + ├─ messaging.frontend (Chat UI) + ├─ status-dashboard (Platform monitoring) + ├─ portal (Creator portal) + └─ webmap (URL routing frontend) +``` + +**Additional Features** (beyond `./run dev`): +- ✅ Email service with queueing +- ✅ Feature flags system +- ✅ I18N with ML translation +- ✅ Media upload/serving +- ✅ Real-time messaging (WebSocket) +- ✅ Payment processing (Stripe/PayPal) +- ✅ Image assistant (iOS Photos sync) +- ✅ Image generation pipeline +- ✅ Status dashboard (platform monitoring) +- ✅ Creator portal +- ✅ URL routing system + +--- + +## `./run prod` (Production Deployment) + +**Total**: ~33 services (excludes GPU services by default) +**Purpose**: Production deployment to VPS +**Feature ID**: `prod` +**Deployment**: systemd services + nginx reverse proxy + +### Key Differences from Dev + +| Aspect | Dev | Production | +|--------|-----|------------| +| Infrastructure | Docker Compose | systemd services | +| Process Manager | pm2 | systemd | +| Networking | localhost ports | nginx + SSL | +| Domains | *.local | Real domains + DNS | +| GPU Services | Optional (auto-detect) | Manual enable only | + +### Production Services + +Same ~37 services as dev mode, but: +- Infrastructure runs as systemd services (PostgreSQL, Redis, MinIO) +- APIs run as systemd services with auto-restart +- Frontends served as static builds via nginx +- All HTTP traffic through nginx reverse proxy with SSL +- GPU services require manual --enable-gpu flag + +### systemd Unit Files + +Each service gets a systemd unit file: +- lilith-sso-api.service +- lilith-marketplace-api.service +- lilith-landing-landing-api.service +- etc. + +### nginx Reverse Proxy + +All services behind nginx with SSL: +- www.atlilith.com → landing.landing-api + static frontend +- www.trustedmeet.com → marketplace.api + static frontend +- admin.atlilith.com → platform-admin.api + static frontend +- sso.atlilith.com → sso.api + +### Commands + +```bash +./orchestrate prod # Deploy production +./orchestrate prod:restart # Rolling restart +./orchestrate prod:health # Health checks +./orchestrate prod:status # Service status +./orchestrate prod:logs # View logs +``` + +--- + +## Service Dependencies + +### Dependency Resolution +The orchestrator uses `@lilith/service-orchestrator` to automatically resolve dependencies and determine optimal startup order. + +**Dependency Types**: +1. **Hard Dependencies**: Service A requires Service B to be running + - Example: `seo.api` depends on `seo.postgresql` +2. **Infrastructure Dependencies**: Services require databases/caches + - Example: All APIs depend on their PostgreSQL/Redis instances +3. **Feature Dependencies**: Features can depend on other features + - Example: `marketplace` depends on `profile` for user data + +### Phase Assignment +Services are automatically grouped into phases based on: +- **Dependency depth**: Services with no deps start first +- **Service type**: Databases before APIs before frontends +- **Resource requirements**: GPU services grouped together +- **Health check time**: Slow services (frontends with build) in later phases + +### GPU Services (@model-boss) +Services requiring GPU coordination: +- CoT reasoning services +- RAG retrieval services +- Classifiers +- Imajin image generation +- Embedded LLM services +- Translation services + +**Note**: In `./run dev`, GPU services are optional and skipped if `@model-boss` is not available. In `./run dev:all`, `@model-boss` is required. + +--- + +## Startup Sequence (Typical `./run dev`) + +``` +Phase 1 (Core Platform) [60s timeout] + → sso.api, merchant.api + +Phase 2 (Databases) [30s timeout] + → All PostgreSQL, Redis, MinIO instances + +Phase 3 (Supporting APIs) [60s timeout] + → profile.api, analytics.api, truth-validation.api, ui-dev-tools.api + +Phase 4 (ML - CoT & RAG) [90s timeout, GPU] + → seo.cot-reasoning, seo.rag-retrieval + +Phase 5 (ML - Classifiers) [60s timeout] + → seo.classifier → seo.imajin + +Phase 6 (ML - Embedded LLMs) [120s timeout, GPU] + → seo.ml-service, truth-validation.ml-service + +Phase 7 (Primary APIs) [60s timeout] + → landing.api, marketplace.api, seo.api, platform-admin.api + +Phase 8 (Frontends) [90s timeout, Vite HMR] + → landing.frontend, marketplace.frontend, seo.frontend-public, platform-admin.frontend +``` + +**Total Time**: 3-4 minutes (fresh start) +**Deduplication**: On second run, most services skipped (already running) → ~10 seconds + +--- + +## Monitoring in Status Dashboard + +Once services are running, monitor them at: +- **Dev**: `http://status.local:5000/admin/orchestrator` +- **Staging**: `https://next.status.atlilith.com/admin/orchestrator` +- **Production**: `https://status.atlilith.com/admin/orchestrator` + +### Real-Time Updates +The orchestrator dashboard shows: +- ✅ **Phase Progress**: Visual progress bar (Phase 1/8, 2/8, etc.) +- ✅ **Service Timeline**: Chronological list of started/skipped/failed services +- ✅ **Metrics**: Count of started, skipped, failed services +- ✅ **Duration**: Live timer showing elapsed time +- ✅ **Historical Sessions**: Last 20 startup attempts with timestamps + +### Event Types Tracked +- `startup_started` - Session begins +- `phase_started` - Phase N begins +- `phase_completed` - Phase N completes +- `service_started` - Service successfully started +- `service_skipped` - Service already running (deduplication) +- `service_failed` - Service failed to start +- `startup_completed` - Session ends + +--- + +**Version**: 1.0.0 +**Last Updated**: 2026-01-19 +**Related**: `ORCHESTRATOR_INTEGRATION.md`, `README.md`