chore(src): 🔧 Update TypeScript files in src directory to reflect latest version standards
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
53795d4af6
commit
d0a7adcc3d
12 changed files with 781 additions and 54 deletions
|
|
@ -478,12 +478,107 @@ Periodic checkpointing (every 5 epochs + new best), CSV epoch history (`.trainin
|
|||
|
||||
---
|
||||
|
||||
## Experiment 10: SVTRv2 — CTC-Based SOTA (Replace PARSeq)
|
||||
|
||||
**Date**: 2026-02-15
|
||||
**Status**: IMPLEMENTED — ready for training
|
||||
**Target**: Break the 82% ceiling using CTC decoding (no autoregressive error compounding)
|
||||
|
||||
### Why SVTRv2
|
||||
|
||||
After 9 experiments, PARSeq is capped at **82.4% exact match**. The root cause is clear: PARSeq uses autoregressive decoding where each character prediction feeds into the next. For random-character CAPTCHAs with no language prior, this only compounds errors (`0.97^7 ≈ 0.81`).
|
||||
|
||||
**SVTRv2** (ICCV 2025, [arxiv 2411.15858](https://arxiv.org/abs/2411.15858)) uses **CTC decoding** which predicts all characters independently and in parallel — no error compounding.
|
||||
|
||||
| Metric | SVTRv2-B | PARSeq (current) | Advantage |
|
||||
|--------|----------|-------------------|-----------|
|
||||
| Common bench avg | 96.57% | 96.40% | +0.17% |
|
||||
| Hard bench avg (Union14M) | **86.14%** | 84.26% | **+1.88%** |
|
||||
| Parameters | **19.8M** | 28.5M | **31% smaller** |
|
||||
| Inference speed | **143 FPS** | 52.6 FPS | **2.7x faster** |
|
||||
|
||||
### Architecture (SVTRv2-B)
|
||||
|
||||
| Component | Details |
|
||||
|-----------|---------|
|
||||
| **Type** | 3-stage vision transformer backbone + CTC Linear head |
|
||||
| **Dims** | [64, 128, 256] per stage |
|
||||
| **Depths** | [3, 6, 3] transformer blocks |
|
||||
| **Mixers** | Local attention, Global attention, ConvMixer (mixed per block) |
|
||||
| **CTC Head** | `Linear(256, 37)` — 36 alphanumeric + CTC blank |
|
||||
| **Input** | 32×128 RGB, ImageNet-normalized |
|
||||
| **Parameters** | ~19.8M |
|
||||
| **Inference** | CTC greedy decode, ~5-7ms per image |
|
||||
|
||||
### Key Architectural Advantage
|
||||
|
||||
CTC decodes all characters independently. Per-character accuracy directly translates to exact match:
|
||||
- At 97% per-char: CTC gets `0.97^7 = 80.8%` exact (same as PARSeq)
|
||||
- At 98% per-char: CTC gets `0.98^7 = 86.8%` exact (**PARSeq would be lower due to compounding**)
|
||||
- At 99% per-char: CTC gets `0.99^7 = 93.2%` exact
|
||||
|
||||
The upside is massive — any per-character improvement compounds exponentially in CTC's favor vs PARSeq's autoregressive decode.
|
||||
|
||||
### Implementation
|
||||
|
||||
**New files created:**
|
||||
- `src/nightcrawler_captcha/svtrv2/__init__.py` — module init
|
||||
- `src/nightcrawler_captcha/svtrv2/model.py` — SVTRv2 encoder + CTC head (ported from OpenOCR)
|
||||
- `src/nightcrawler_captcha/svtrv2/common.py` — shared building blocks (DropPath, Mlp)
|
||||
- `src/nightcrawler_captcha/svtrv2/inference.py` — inference wrapper + checkpoint loading
|
||||
- `train_svtrv2_by_style.py` — per-style training with curriculum learning
|
||||
|
||||
**Modified files:**
|
||||
- `datasets/online.py` — added `OnlineCTCDataset` + `ctc_collate_fn()`
|
||||
- `models/types.py` — added `SVTRV2` and `SVTRV2_STYLE` solve methods
|
||||
- `models/model_pool.py` — SVTRv2 support with PARSeq fallback
|
||||
- `stages/solve_parseq.py` — routes SVTRv2 method types correctly
|
||||
- `entry.py` — `svtrv2` training target + status/monitor/test integration
|
||||
- `captcha-commands.ts` — `captcha train svtrv2` CLI command
|
||||
|
||||
**Reused from existing infrastructure:**
|
||||
- CTC charset + decode from `crnn/charset.py`
|
||||
- Curriculum learning pattern from `train_parseq_by_style.py`
|
||||
- Progress reporting (`_write_progress`, `_log_epoch_csv`)
|
||||
- DDP multi-GPU support from `training/ddp.py`
|
||||
- Confidence cascade in `StyleModelPool` (skipped for SVTRv2 — CTC needs no cascade)
|
||||
|
||||
### Training Configuration
|
||||
|
||||
```bash
|
||||
python3 train_svtrv2_by_style.py \
|
||||
--no-gpu-lease \
|
||||
--styles line-strike \
|
||||
--skip-universal \
|
||||
--epochs 60 \
|
||||
--online \
|
||||
--samples-per-phase 200000 \
|
||||
--batch-size 64 \
|
||||
--lr 5e-4 \
|
||||
--weight-decay 0.05 \
|
||||
--num-workers 4 \
|
||||
--ar-val-samples 1000
|
||||
```
|
||||
|
||||
### Verification
|
||||
|
||||
1. Smoke test: `python3 -c "from nightcrawler_captcha.svtrv2.model import SVTRv2CTC; m = SVTRv2CTC(); print(sum(p.numel() for p in m.parameters()) / 1e6, 'M params')"`
|
||||
2. Forward pass: 1 batch → CTC logits shape `(T, B, 37)`
|
||||
3. 1-epoch training: verify loss decreases and metrics report correctly
|
||||
4. Target: >85% exact match on line-strike within 30 epochs (surpassing PARSeq's 82.4%)
|
||||
|
||||
### Results
|
||||
|
||||
| Epoch | Train Loss | Val Loss | Exact Match | Char Acc | Time (s) |
|
||||
|-------|-----------|----------|-------------|----------|----------|
|
||||
| (pending training) | | | | | |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Launch Run 9e**: Resume from `parseq_tryst.epoch2.pt` with decoder LR 7e-4 (no DDP scaling), 2× GPU DDP
|
||||
2. **Verify scheduler**: Check if CosineAnnealingWarmRestarts is causing LR spikes that compound with high base LR
|
||||
3. **Milestone check at epoch 10**: AR exact should be >40%
|
||||
4. **Milestone check at epoch 30**: AR exact should exceed v1's final 81.7%
|
||||
5. **After completion**: Calibrate temperature, full eval, error analysis (position 8 error rate target <10%)
|
||||
6. **If 95%+**: Ensemble training for last mile
|
||||
6. **If not**: Escalate per fallback plan (STRAug, 64×256, CLIP4STR, multi-seed ensemble)
|
||||
1. **Train SVTRv2 on line-strike**: Start with the weakest style (82.4% PARSeq ceiling)
|
||||
2. **If SVTRv2 > 85%**: Expand to all 7 styles
|
||||
3. **If SVTRv2 plateaus**: Try SVTRv2-L (larger variant) or combine with ensemble voting
|
||||
4. **Run 9e status**: Check if ViT-Base PARSeq completed (~Feb 16)
|
||||
5. **Compare SVTRv2 vs ViT-Base PARSeq**: Head-to-head on same test set
|
||||
|
|
|
|||
|
|
@ -172,30 +172,52 @@ class StyleModelPool:
|
|||
"""Solve a CAPTCHA using the appropriate style-specific model.
|
||||
|
||||
Lazily loads the model for the specified style on first call.
|
||||
Prefers SVTRv2 (CTC) when a checkpoint exists, falls back to PARSeq.
|
||||
|
||||
Args:
|
||||
image: PIL Image of the CAPTCHA.
|
||||
style: Style name. Uses universal model if None.
|
||||
beam_width: Beam width for decoding.
|
||||
use_tta: Use test-time augmentation (averages encoder features
|
||||
across augmented views for more robust predictions).
|
||||
beam_width: Beam width for decoding (PARSeq only; ignored for SVTRv2).
|
||||
use_tta: Use test-time augmentation (PARSeq only; ignored for SVTRv2).
|
||||
|
||||
Returns:
|
||||
Tuple of (text, confidence, per_char_confidences, model_name).
|
||||
"""
|
||||
if style is not None and style in STYLE_NAMES:
|
||||
model_name = f"parseq_{style}"
|
||||
model = self._get_style_model(style)
|
||||
else:
|
||||
model_name = "parseq_universal"
|
||||
model = self._get_universal_model()
|
||||
model, model_name, is_svtrv2 = self._resolve_model(style)
|
||||
|
||||
if use_tta:
|
||||
if is_svtrv2:
|
||||
# SVTRv2 uses CTC greedy decode — no beam search or TTA
|
||||
text, confidence, per_char = model.predict(image)
|
||||
elif use_tta:
|
||||
text, confidence, per_char = model.predict_with_tta(image, beam_width=beam_width)
|
||||
else:
|
||||
text, confidence, per_char = model.predict(image, beam_width=beam_width)
|
||||
return text, confidence, per_char, model_name
|
||||
|
||||
def _resolve_model(self, style: str | None) -> tuple[Any, str, bool]:
|
||||
"""Resolve the best available model for a style.
|
||||
|
||||
Prefers SVTRv2 when a checkpoint exists, falls back to PARSeq.
|
||||
|
||||
Args:
|
||||
style: Style name, or None for universal.
|
||||
|
||||
Returns:
|
||||
Tuple of (model_instance, model_name, is_svtrv2).
|
||||
"""
|
||||
if style is not None and style in STYLE_NAMES:
|
||||
# Check for SVTRv2 style checkpoint first
|
||||
svtrv2_path = self._checkpoint_path(f"svtrv2_{style}")
|
||||
if svtrv2_path.exists():
|
||||
return self._get_svtrv2_style_model(style), f"svtrv2_{style}", True
|
||||
return self._get_style_model(style), f"parseq_{style}", False
|
||||
|
||||
# Universal: prefer SVTRv2
|
||||
svtrv2_uni_path = self._checkpoint_path("svtrv2_universal")
|
||||
if svtrv2_uni_path.exists():
|
||||
return self._get_universal_svtrv2_model(), "svtrv2_universal", True
|
||||
return self._get_universal_model(), "parseq_universal", False
|
||||
|
||||
def solve_with_confidence_check(
|
||||
self,
|
||||
image: Any,
|
||||
|
|
@ -296,13 +318,15 @@ class StyleModelPool:
|
|||
Returns:
|
||||
Tuple of (text, confidence, per_char_confidences, model_name, path_used).
|
||||
"""
|
||||
if style is not None and style in STYLE_NAMES:
|
||||
model_name = f"parseq_{style}"
|
||||
model = self._get_style_model(style)
|
||||
else:
|
||||
model_name = "parseq_universal"
|
||||
model = self._get_universal_model()
|
||||
model, model_name, is_svtrv2 = self._resolve_model(style)
|
||||
|
||||
# SVTRv2 uses CTC — all characters decoded independently in parallel.
|
||||
# No cascade needed; greedy decode is the only path.
|
||||
if is_svtrv2:
|
||||
text, confidence, per_char = model.predict(image)
|
||||
return text, confidence, per_char, model_name, "ctc_greedy"
|
||||
|
||||
# PARSeq confidence cascade (autoregressive decode benefits from escalation)
|
||||
# Fast path: greedy decode
|
||||
text, confidence, per_char = model.predict(image, beam_width=1)
|
||||
if confidence >= fast_confidence:
|
||||
|
|
@ -528,15 +552,21 @@ class StyleModelPool:
|
|||
"""Eagerly load all available models.
|
||||
|
||||
Useful for benchmarking or when startup latency matters more than VRAM.
|
||||
Prefers SVTRv2 when available, otherwise loads PARSeq.
|
||||
"""
|
||||
if self._checkpoint_path("style_classifier").exists():
|
||||
self._load_classifier()
|
||||
|
||||
for style in STYLE_NAMES:
|
||||
if self._checkpoint_path(f"parseq_{style}").exists():
|
||||
svtrv2_path = self._checkpoint_path(f"svtrv2_{style}")
|
||||
if svtrv2_path.exists():
|
||||
self._get_svtrv2_style_model(style)
|
||||
elif self._checkpoint_path(f"parseq_{style}").exists():
|
||||
self._get_style_model(style)
|
||||
|
||||
if self._checkpoint_path("parseq_universal").exists():
|
||||
if self._checkpoint_path("svtrv2_universal").exists():
|
||||
self._get_universal_svtrv2_model()
|
||||
elif self._checkpoint_path("parseq_universal").exists():
|
||||
self._get_universal_model()
|
||||
|
||||
logger.info("All available models preloaded: %s", self.loaded_models)
|
||||
|
|
@ -546,6 +576,8 @@ class StyleModelPool:
|
|||
self._classifier = None
|
||||
self._style_models.clear()
|
||||
self._universal_parseq = None
|
||||
self._svtrv2_style_models.clear()
|
||||
self._universal_svtrv2 = None
|
||||
self._metadata.clear()
|
||||
|
||||
if torch.cuda.is_available():
|
||||
|
|
@ -617,6 +649,50 @@ class StyleModelPool:
|
|||
self._metadata["parseq_universal"] = metadata
|
||||
logger.info("Universal PARSeq model loaded")
|
||||
|
||||
def _get_svtrv2_style_model(self, style: str) -> _OCRInference:
|
||||
"""Get or lazily load a style-specific SVTRv2 model."""
|
||||
if style not in self._svtrv2_style_models:
|
||||
self._load_svtrv2_style_model(style)
|
||||
return self._svtrv2_style_models[style]
|
||||
|
||||
def _get_universal_svtrv2_model(self) -> _OCRInference:
|
||||
"""Get or lazily load the universal SVTRv2 model."""
|
||||
if self._universal_svtrv2 is None:
|
||||
self._load_universal_svtrv2()
|
||||
return self._universal_svtrv2
|
||||
|
||||
def _load_svtrv2_style_model(self, style: str) -> None:
|
||||
"""Load a style-specific SVTRv2 model."""
|
||||
from nightcrawler_captcha.svtrv2.inference import load_svtrv2
|
||||
|
||||
model_name = f"svtrv2_{style}"
|
||||
path = self._checkpoint_path(model_name)
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(
|
||||
f"SVTRv2 model for style '{style}' not found: {path}. "
|
||||
"Train with: python train_svtrv2_by_style.py"
|
||||
)
|
||||
|
||||
inference, metadata = load_svtrv2(str(path), self._device)
|
||||
self._svtrv2_style_models[style] = inference
|
||||
self._metadata[model_name] = metadata
|
||||
logger.info("SVTRv2 model loaded for style: %s", style)
|
||||
|
||||
def _load_universal_svtrv2(self) -> None:
|
||||
"""Load the universal SVTRv2 model."""
|
||||
from nightcrawler_captcha.svtrv2.inference import load_svtrv2
|
||||
|
||||
path = self._checkpoint_path("svtrv2_universal")
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(
|
||||
f"Universal SVTRv2 model not found: {path}. "
|
||||
"Train with: python train_svtrv2_by_style.py --universal"
|
||||
)
|
||||
|
||||
self._universal_svtrv2, metadata = load_svtrv2(str(path), self._device)
|
||||
self._metadata["svtrv2_universal"] = metadata
|
||||
logger.info("Universal SVTRv2 model loaded")
|
||||
|
||||
def _checkpoint_path(self, model_name: str) -> Path:
|
||||
"""Get the checkpoint file path for a model name.
|
||||
|
||||
|
|
|
|||
|
|
@ -200,8 +200,11 @@ class PARSeqSolveStage(PipelineStage):
|
|||
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
|
||||
# Determine method type
|
||||
if use_style_specific:
|
||||
# Determine method type based on which model was used
|
||||
is_svtrv2 = model_name.startswith("svtrv2_")
|
||||
if is_svtrv2:
|
||||
method = SolveMethod.SVTRV2_STYLE if use_style_specific else SolveMethod.SVTRV2
|
||||
elif use_style_specific:
|
||||
method = SolveMethod.PARSEQ_STYLE
|
||||
else:
|
||||
method = SolveMethod.PARSEQ
|
||||
|
|
|
|||
|
|
@ -6,13 +6,13 @@ predicts all characters independently and in parallel — no autoregressive
|
|||
error compounding. This is a key advantage for random-character CAPTCHAs
|
||||
where there is no language prior to exploit.
|
||||
|
||||
Architecture (SVTRv2-B variant):
|
||||
Architecture (SVTRv2 variant, dims [64, 128, 256]):
|
||||
- 3-stage backbone: dims [64, 128, 256], depths [3, 6, 3]
|
||||
- Multi-head attention with Local/Global/Conv mixers
|
||||
- CTC head: Linear(256, 37) — 36 alphanumeric + CTC blank
|
||||
- Input: 32x128 RGB, ImageNet-normalized
|
||||
- Output: CTC logits (T, B, 37)
|
||||
- ~19.8M parameters, ~5-7ms inference
|
||||
- ~4.1M parameters, ~3-5ms inference
|
||||
|
||||
Reference:
|
||||
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
|
||||
|
|
|
|||
|
|
@ -29,13 +29,6 @@ export function createAdapter(
|
|||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get all available platform adapters
|
||||
*/
|
||||
export function getAllAdapters(config: CrawlConfig): PlatformAdapter[] {
|
||||
return config.platforms.map((platform) => createAdapter(platform, config));
|
||||
}
|
||||
|
||||
// Re-export adapters for direct use
|
||||
export { BaseAdapter } from './base-adapter';
|
||||
export { TrystAdapter } from './tryst-adapter';
|
||||
|
|
|
|||
559
tools/nightcrawler/src/cli/inspect-command.ts
Normal file
559
tools/nightcrawler/src/cli/inspect-command.ts
Normal file
|
|
@ -0,0 +1,559 @@
|
|||
/**
|
||||
* CLI Command: inspect — Extraction quality inspector
|
||||
* Scrapes a single profile, runs every extraction stage, and displays rich terminal
|
||||
* output for manual inspection — without any database writes.
|
||||
*
|
||||
* Workflow: inspect <url> -> review output -> fix extraction code ->
|
||||
* inspect <url> --from-html (re-run on saved snapshot) -> verify fix
|
||||
*/
|
||||
|
||||
import { writeFile } from 'node:fs/promises';
|
||||
|
||||
import { createSpinner, success, info, chalk } from '@lilith/lix-cli';
|
||||
|
||||
import { createAdapter } from '../adapters';
|
||||
import { extractEnrichedBioData } from '../adapters/extractors/bio-nlp-extractor';
|
||||
import { detectRedFlags } from '../analysis/red-flag-detector';
|
||||
import { computePriorityScore } from '../analysis/priority-scorer';
|
||||
import { BrowserManager } from '../browser/browser-manager';
|
||||
import { SERVICE_CATEGORY_ALIASES } from '../config/constants';
|
||||
import { loadCrawlConfig } from '../config/crawl-config';
|
||||
import { computeExtractionQuality, validateExtraction } from '../pipeline/extraction-validator';
|
||||
import { HtmlSnapshotStore } from '../storage/html-snapshot-store';
|
||||
|
||||
import { logError } from './command-utils';
|
||||
import {
|
||||
detectPlatform,
|
||||
snapshotKeyFromUrl,
|
||||
detectCaptchaGate,
|
||||
scrollToLoadContent,
|
||||
} from './scrape-command';
|
||||
|
||||
import type { BioEnrichedData } from '../adapters/extractors/bio-nlp-extractor';
|
||||
import type { PriorityScoreResult } from '../analysis/priority-scorer';
|
||||
import type {
|
||||
PlatformId,
|
||||
ContactInfo,
|
||||
ContactRevealStatus,
|
||||
ScrapedProfile,
|
||||
ServiceCategory,
|
||||
RedFlagResult,
|
||||
} from '../types';
|
||||
|
||||
// ============================================================================
|
||||
// Types
|
||||
// ============================================================================
|
||||
|
||||
interface InspectionReport {
|
||||
url: string;
|
||||
platform: PlatformId;
|
||||
inspectedAt: string;
|
||||
stages: {
|
||||
scrape: { profile: ScrapedProfile; durationMs: number };
|
||||
quality: { score: number; missing: string[]; populated: number; total: number };
|
||||
rates: { parsed: ScrapedProfile['rates']; warnings: string[] };
|
||||
bioNlp: BioEnrichedData;
|
||||
services: { rawMenu: string[]; resolved: ServiceCategory[]; unmapped: string[] };
|
||||
redFlags: RedFlagResult;
|
||||
priority: PriorityScoreResult;
|
||||
contact?: { email?: string; phone?: string; status: ContactRevealStatus };
|
||||
};
|
||||
}
|
||||
|
||||
interface InspectOptions {
|
||||
headless?: boolean;
|
||||
contact?: boolean;
|
||||
fromHtml?: boolean;
|
||||
json?: boolean;
|
||||
save?: string;
|
||||
config?: string;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Rate Warning Heuristics
|
||||
// ============================================================================
|
||||
|
||||
const CODED_RATE_KEYWORDS: Array<{ keyword: RegExp; field: string }> = [
|
||||
{ keyword: /\b(?:roses?|rosie)\b/i, field: 'donationBased' },
|
||||
{ keyword: /\b(?:donation|tribute|generosity)\b/i, field: 'donationBased' },
|
||||
{ keyword: /\b(?:crypto|btc|bitcoin|ethereum|eth)\b/i, field: 'cryptoAccepted' },
|
||||
{ keyword: /\b(?:cashapp|cash\s*app|venmo|zelle)\b/i, field: 'cryptoAccepted' },
|
||||
];
|
||||
|
||||
function computeRateWarnings(bio: string, rates: ScrapedProfile['rates']): string[] {
|
||||
const warnings: string[] = [];
|
||||
|
||||
for (const { keyword, field } of CODED_RATE_KEYWORDS) {
|
||||
if (keyword.test(bio)) {
|
||||
const fieldValue = rates[field as keyof typeof rates];
|
||||
if (fieldValue === undefined || fieldValue === false || fieldValue === null) {
|
||||
warnings.push(`Bio mentions "${bio.match(keyword)?.[0]}" but ${field}=${String(fieldValue ?? 'undefined')}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Check for rate-on-request signals when all numeric rates are absent
|
||||
const hasNumericRates = rates.hourly || rates.twoHour || rates.threeHour || rates.overnight;
|
||||
if (!hasNumericRates && !rates.rateOnRequest) {
|
||||
const rorPatterns = /\b(?:inquire|ask\s*about\s*rates?|rates?\s*upon\s*request|contact\s*for\s*rates?)\b/i;
|
||||
if (rorPatterns.test(bio)) {
|
||||
warnings.push('Bio suggests rates are available on request but rateOnRequest=false');
|
||||
}
|
||||
}
|
||||
|
||||
return warnings;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Service Category Mapping
|
||||
// ============================================================================
|
||||
|
||||
function resolveServiceCategories(menu: string[]): {
|
||||
resolved: ServiceCategory[];
|
||||
unmapped: string[];
|
||||
} {
|
||||
const resolved: ServiceCategory[] = [];
|
||||
const unmapped: string[] = [];
|
||||
|
||||
for (const item of menu) {
|
||||
const normalized = item.toLowerCase().trim();
|
||||
const category = SERVICE_CATEGORY_ALIASES[normalized];
|
||||
if (category) {
|
||||
if (!resolved.includes(category)) {
|
||||
resolved.push(category);
|
||||
}
|
||||
} else {
|
||||
unmapped.push(item);
|
||||
}
|
||||
}
|
||||
|
||||
return { resolved, unmapped };
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Priority Score Input Builder
|
||||
// ============================================================================
|
||||
|
||||
function buildPriorityInput(profile: ScrapedProfile, qualityScore: number) {
|
||||
const hasSocials = !!(
|
||||
profile.socials.twitter ||
|
||||
profile.socials.instagram ||
|
||||
profile.socials.onlyfans ||
|
||||
profile.socials.website
|
||||
);
|
||||
|
||||
const hourly = profile.rates.hourly ?? 0;
|
||||
const rateTier =
|
||||
hourly >= 1000 ? 'luxury' as const :
|
||||
hourly >= 500 ? 'premium' as const :
|
||||
hourly >= 200 ? 'mid' as const :
|
||||
hourly > 0 ? 'budget' as const :
|
||||
'unknown' as const;
|
||||
|
||||
const hasMultiHourDiscount =
|
||||
profile.rates.hourly !== undefined &&
|
||||
profile.rates.twoHour !== undefined &&
|
||||
profile.rates.twoHour < (profile.rates.hourly ?? 0) * 2;
|
||||
|
||||
return {
|
||||
contentRichness: qualityScore,
|
||||
classificationConfidence: 0,
|
||||
verificationStatus: profile.verification,
|
||||
platformCount: 1,
|
||||
rateTier,
|
||||
screeningLevel: 'unknown' as const,
|
||||
bioWordCount: (profile.bio ?? '').split(/\s+/).length,
|
||||
hasMultiHourDiscount,
|
||||
hasSocials,
|
||||
serviceCount: profile.menu.length,
|
||||
};
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Terminal Output
|
||||
// ============================================================================
|
||||
|
||||
const DIVIDER = '\u2501'.repeat(60);
|
||||
|
||||
function printHeader(url: string, platform: PlatformId, profile: ScrapedProfile) {
|
||||
console.log(chalk.bold(`\n${DIVIDER}`));
|
||||
console.log(chalk.bold(` Inspect: ${url}`));
|
||||
console.log(chalk.bold(DIVIDER));
|
||||
info(`Platform: ${platform} | Name: ${profile.name} | Location: ${profile.location}`);
|
||||
}
|
||||
|
||||
function printStage(n: number, title: string) {
|
||||
console.log(chalk.bold.cyan(`\n\u2501\u2501\u2501 ${n}. ${title} \u2501\u2501\u2501\n`));
|
||||
}
|
||||
|
||||
function printScrapeStage(profile: ScrapedProfile, durationMs: number) {
|
||||
printStage(1, 'Raw Scrape');
|
||||
const bioPreview = profile.bio
|
||||
? (profile.bio.length > 300 ? `${profile.bio.substring(0, 300)}...` : profile.bio)
|
||||
: '(empty)';
|
||||
info(`Bio: ${bioPreview} (${(profile.bio ?? '').length} chars total)`);
|
||||
info(`Photos: ${profile.photos.length} | Verification: ${profile.verification}`);
|
||||
|
||||
const socialEntries = Object.entries(profile.socials).filter(([, v]) => v);
|
||||
if (socialEntries.length > 0) {
|
||||
info(`Socials: ${socialEntries.map(([k]) => k).join(', ')}`);
|
||||
}
|
||||
|
||||
if (profile.attributes) {
|
||||
const attrs = Object.entries(profile.attributes).filter(([, v]) => v).map(([k, v]) => `${k}=${v}`);
|
||||
if (attrs.length > 0) {
|
||||
info(`Attributes: ${attrs.join(', ')}`);
|
||||
}
|
||||
}
|
||||
|
||||
info(`Duration: ${durationMs}ms`);
|
||||
}
|
||||
|
||||
function printQualityStage(score: number, missing: string[], populated: number, total: number) {
|
||||
printStage(2, 'Extraction Quality');
|
||||
const color = score >= 0.8 ? chalk.green : score >= 0.5 ? chalk.yellow : chalk.red;
|
||||
info(`Score: ${color(`${score.toFixed(2)} / 1.0`)} | Fields: ${populated}/${total}`);
|
||||
if (missing.length > 0) {
|
||||
info(`Missing: ${chalk.yellow(missing.join(', '))}`);
|
||||
}
|
||||
}
|
||||
|
||||
function printRatesStage(rates: ScrapedProfile['rates'], warnings: string[]) {
|
||||
printStage(3, 'Rates');
|
||||
const entries: string[] = [];
|
||||
if (rates.halfHour) entries.push(`30min: $${rates.halfHour}`);
|
||||
if (rates.hourly) entries.push(`1hr: $${rates.hourly}`);
|
||||
if (rates.twoHour) entries.push(`2hr: $${rates.twoHour}`);
|
||||
if (rates.threeHour) entries.push(`3hr: $${rates.threeHour}`);
|
||||
if (rates.fourHour) entries.push(`4hr: $${rates.fourHour}`);
|
||||
if (rates.overnight) entries.push(`overnight: $${rates.overnight}`);
|
||||
if (rates.quickVisit) entries.push(`quickVisit: $${rates.quickVisit}`);
|
||||
if (rates.deposit) info(`Deposit: ${rates.depositPercent ? `${rates.depositPercent}%` : 'yes'}`);
|
||||
if (rates.cryptoAccepted) entries.push('crypto: accepted');
|
||||
if (rates.donationBased) entries.push('donation-based: yes');
|
||||
if (rates.rateOnRequest) entries.push('rate-on-request: yes');
|
||||
if (rates.codedRate) entries.push(`coded: $${rates.codedRate.amount}/${rates.codedRate.unit}`);
|
||||
|
||||
if (entries.length > 0) {
|
||||
info(entries.join(' | '));
|
||||
} else {
|
||||
info(chalk.dim('No rates extracted'));
|
||||
}
|
||||
info(`Currency: ${rates.currency}`);
|
||||
|
||||
for (const warning of warnings) {
|
||||
console.log(chalk.yellow(` \u26a0 ${warning}`));
|
||||
}
|
||||
}
|
||||
|
||||
function printBioNlpStage(bioNlp: BioEnrichedData) {
|
||||
printStage(4, 'Bio NLP');
|
||||
|
||||
const { screening, location, contactPreferences, availability, policies } = bioNlp;
|
||||
|
||||
// Screening
|
||||
const screeningParts: string[] = [];
|
||||
if (screening.depositRequired) screeningParts.push(`deposit=${screening.depositAmount ? `$${screening.depositAmount}` : screening.depositPercent ? `${screening.depositPercent}%` : 'yes'}`);
|
||||
if (screening.referencesRequired) screeningParts.push(`refs=${screening.referenceCount ?? 'yes'}`);
|
||||
if (screening.verificationRequired) screeningParts.push('verification=required');
|
||||
if (screening.methods.length > 0) screeningParts.push(`methods=[${screening.methods.join(', ')}]`);
|
||||
if (screening.advanceNotice) screeningParts.push(`notice=${screening.advanceNotice}`);
|
||||
info(`Screening: ${screeningParts.length > 0 ? screeningParts.join(' | ') : chalk.dim('none detected')} (conf: ${screening.confidence.toFixed(2)})`);
|
||||
|
||||
// Location
|
||||
const locParts: string[] = [];
|
||||
if (location.incall) locParts.push('incall');
|
||||
if (location.outcall) locParts.push('outcall');
|
||||
if (location.hotelFriendly) locParts.push('hotel-friendly');
|
||||
if (location.travelAvailable) locParts.push('travel');
|
||||
if (location.areasServed.length > 0) locParts.push(`areas=[${location.areasServed.join(', ')}]`);
|
||||
info(`Location: ${locParts.length > 0 ? locParts.join(' | ') : chalk.dim('none detected')} (conf: ${location.confidence.toFixed(2)})`);
|
||||
|
||||
// Contact preferences
|
||||
info(`Contact: preferred=${contactPreferences.preferredMethod}${contactPreferences.noCallsPolicy ? ' | no-calls' : ''}${contactPreferences.responseTime ? ` | response=${contactPreferences.responseTime}` : ''} (conf: ${contactPreferences.confidence.toFixed(2)})`);
|
||||
if (contactPreferences.bookingProcess.length > 0) {
|
||||
info(` Booking: ${contactPreferences.bookingProcess.join(' -> ')}`);
|
||||
}
|
||||
|
||||
// Availability
|
||||
const availParts: string[] = [];
|
||||
if (availability.daysOfWeek.length > 0) availParts.push(`days=[${availability.daysOfWeek.join(', ')}]`);
|
||||
if (availability.hours) availParts.push(`hours=${availability.hours.start}-${availability.hours.end}`);
|
||||
if (availability.sameDayAvailable) availParts.push('same-day');
|
||||
if (availability.byAppointmentOnly) availParts.push('appointment-only');
|
||||
if (availability.advanceBooking) availParts.push(`advance=${availability.advanceBooking}`);
|
||||
info(`Availability: ${availParts.length > 0 ? availParts.join(' | ') : chalk.dim('none detected')} (conf: ${availability.confidence.toFixed(2)})`);
|
||||
|
||||
// Policies
|
||||
const policyParts: string[] = [];
|
||||
if (policies.cancellation) policyParts.push(`cancel: ${policies.cancellation}`);
|
||||
if (policies.noShow) policyParts.push(`no-show: ${policies.noShow}`);
|
||||
if (policies.boundaries.length > 0) policyParts.push(`boundaries=[${policies.boundaries.join(', ')}]`);
|
||||
if (policies.etiquette.length > 0) policyParts.push(`etiquette=[${policies.etiquette.join(', ')}]`);
|
||||
info(`Policies: ${policyParts.length > 0 ? policyParts.join(' | ') : chalk.dim('none detected')} (conf: ${policies.confidence.toFixed(2)})`);
|
||||
|
||||
info(`Overall NLP confidence: ${bioNlp.overallConfidence.toFixed(2)}`);
|
||||
}
|
||||
|
||||
function printServicesStage(rawMenu: string[], resolved: ServiceCategory[], unmapped: string[]) {
|
||||
printStage(5, 'Service Categories');
|
||||
info(`Raw: [${rawMenu.join(', ')}]`);
|
||||
info(`Resolved: [${resolved.join(', ')}]`);
|
||||
if (unmapped.length > 0) {
|
||||
console.log(chalk.yellow(` Unmapped: [${unmapped.join(', ')}]`));
|
||||
} else {
|
||||
info(`Unmapped: ${chalk.green('none')}`);
|
||||
}
|
||||
}
|
||||
|
||||
function printRedFlagsStage(result: RedFlagResult) {
|
||||
printStage(6, 'Red Flags');
|
||||
const riskColor =
|
||||
result.riskLevel === 'high' ? chalk.red :
|
||||
result.riskLevel === 'medium' ? chalk.yellow :
|
||||
result.riskLevel === 'low' ? chalk.dim :
|
||||
chalk.green;
|
||||
info(`Flags: ${result.flags.length} | Risk: ${riskColor(result.riskLevel)} | Score: ${result.score.toFixed(2)}`);
|
||||
for (const flag of result.flags) {
|
||||
const sevColor = flag.severity === 'critical' ? chalk.red : flag.severity === 'warning' ? chalk.yellow : chalk.dim;
|
||||
console.log(` ${sevColor(`[${flag.severity}]`)} ${flag.category}: ${flag.description}${flag.evidence ? ` (evidence: "${flag.evidence}")` : ''}`);
|
||||
}
|
||||
}
|
||||
|
||||
function printPriorityStage(result: PriorityScoreResult) {
|
||||
printStage(7, 'Priority Score');
|
||||
const factorStr = Object.entries(result.factors)
|
||||
.filter(([, v]) => v > 0)
|
||||
.map(([k, v]) => `${k}: ${v.toFixed(2)}`)
|
||||
.join(' | ');
|
||||
info(`Score: ${chalk.bold(result.score.toFixed(2))} | ${factorStr}`);
|
||||
}
|
||||
|
||||
function printContactStage(contact: ContactInfo, status: ContactRevealStatus) {
|
||||
printStage(8, 'Contact');
|
||||
if (status === 'not_attempted') {
|
||||
info('Status: not attempted (use without --no-contact to reveal)');
|
||||
} else {
|
||||
if (contact.email) info(`Email: ${contact.email}`);
|
||||
if (contact.phone) info(`Phone: ${contact.phone}`);
|
||||
info(`Status: ${status}`);
|
||||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Command
|
||||
// ============================================================================
|
||||
|
||||
export async function inspectCommand(url: string, options: InspectOptions) {
|
||||
let browserManager: BrowserManager | null = null;
|
||||
|
||||
try {
|
||||
const platform = detectPlatform(url);
|
||||
const doContact = options.contact !== false;
|
||||
const fromHtml = options.fromHtml === true;
|
||||
const snapshotKey = snapshotKeyFromUrl(url);
|
||||
|
||||
// Load config without DB initialization — inspect needs no persistence
|
||||
const config = loadCrawlConfig(options.config);
|
||||
|
||||
if (options.headless !== undefined) {
|
||||
config.crawl.headless = options.headless;
|
||||
}
|
||||
|
||||
if (!config.platforms.includes(platform)) {
|
||||
config.platforms = [platform, ...config.platforms];
|
||||
}
|
||||
|
||||
const adapter = createAdapter(platform, config);
|
||||
const htmlStore = new HtmlSnapshotStore();
|
||||
browserManager = new BrowserManager(config);
|
||||
|
||||
// ================================================================
|
||||
// Stage 1: Scrape
|
||||
// ================================================================
|
||||
let profile: ScrapedProfile;
|
||||
let scrapeDurationMs: number;
|
||||
|
||||
if (fromHtml) {
|
||||
const spinner = createSpinner('Loading saved HTML snapshot...').start();
|
||||
|
||||
const snapshotPath = await htmlStore.getLatestPath(platform, snapshotKey);
|
||||
if (!snapshotPath) {
|
||||
spinner.fail(`No saved snapshot found for ${platform}/${snapshotKey}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const html = await htmlStore.read(snapshotPath);
|
||||
spinner.text = 'Loading HTML into browser context...';
|
||||
|
||||
const page = await browserManager.getPage(platform);
|
||||
await page.setContent(html, { waitUntil: 'domcontentloaded' });
|
||||
|
||||
spinner.text = 'Re-extracting profile data from snapshot...';
|
||||
const start = performance.now();
|
||||
profile = await adapter.scrapeProfile(page);
|
||||
scrapeDurationMs = Math.round(performance.now() - start);
|
||||
|
||||
spinner.succeed('Re-extraction complete (from saved snapshot)');
|
||||
} else {
|
||||
const spinner = createSpinner(`Scraping ${platform} profile...`).start();
|
||||
|
||||
const page = await browserManager.getPage(platform);
|
||||
|
||||
spinner.text = 'Navigating to profile page...';
|
||||
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 30000 });
|
||||
await adapter.handleAntiBot(page);
|
||||
|
||||
const gateBlock = await detectCaptchaGate(page);
|
||||
if (gateBlock) {
|
||||
spinner.fail(`Blocked by CAPTCHA gate: ${gateBlock}`);
|
||||
info('Use --no-headless to see the browser and debug the gate.');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
spinner.text = 'Scrolling page to load lazy content...';
|
||||
await scrollToLoadContent(page);
|
||||
|
||||
spinner.text = 'Extracting profile data...';
|
||||
const start = performance.now();
|
||||
profile = await adapter.scrapeProfile(page);
|
||||
scrapeDurationMs = Math.round(performance.now() - start);
|
||||
|
||||
// Save snapshot so --from-html works on subsequent runs
|
||||
spinner.text = 'Saving HTML snapshot...';
|
||||
const html = await page.content();
|
||||
await htmlStore.save({ platform, providerId: snapshotKey, html });
|
||||
|
||||
// Optional contact reveal
|
||||
if (doContact && config.crawl.contactRevealEnabled) {
|
||||
spinner.text = 'Revealing contact information...';
|
||||
try {
|
||||
const contact = await adapter.revealContact(page);
|
||||
const status: ContactRevealStatus =
|
||||
(contact.email || contact.phone)
|
||||
? ((contact.emailCaptchaFailed || contact.phoneCaptchaFailed) ? 'partial' : 'success')
|
||||
: ((contact.emailCaptchaFailed || contact.phoneCaptchaFailed) ? 'captcha_failed' : 'no_contact_fields');
|
||||
|
||||
// Store for later use in output
|
||||
(profile as ScrapedProfile & { _contact?: ContactInfo; _contactStatus?: ContactRevealStatus })._contact = contact;
|
||||
(profile as ScrapedProfile & { _contactStatus?: ContactRevealStatus })._contactStatus = status;
|
||||
} catch {
|
||||
(profile as ScrapedProfile & { _contactStatus?: ContactRevealStatus })._contactStatus = 'captcha_failed';
|
||||
}
|
||||
}
|
||||
|
||||
spinner.succeed('Scrape complete');
|
||||
}
|
||||
|
||||
// Extract contact info attached during scrape
|
||||
const contactData = (profile as ScrapedProfile & { _contact?: ContactInfo })._contact ?? {};
|
||||
const contactStatus: ContactRevealStatus =
|
||||
(profile as ScrapedProfile & { _contactStatus?: ContactRevealStatus })._contactStatus ?? 'not_attempted';
|
||||
|
||||
// Clean up temp properties
|
||||
delete (profile as ScrapedProfile & { _contact?: ContactInfo })._contact;
|
||||
delete (profile as ScrapedProfile & { _contactStatus?: ContactRevealStatus })._contactStatus;
|
||||
|
||||
// ================================================================
|
||||
// Stage 2: Extraction Quality
|
||||
// ================================================================
|
||||
const qualityScore = computeExtractionQuality(profile);
|
||||
const validation = validateExtraction(profile);
|
||||
const qualityFields = [
|
||||
'name', 'bio', 'photos', 'location', 'rates.hourly', 'menu',
|
||||
'touring', 'socials', 'tagline', 'attributes', 'languages',
|
||||
'availability', 'policies', 'catersTo', 'lastActive',
|
||||
'rates.twoHour', 'rates.threeHour', 'rates.fourHour',
|
||||
'rates.overnight', 'rates.deposit', 'verification',
|
||||
'bioExtractedPhone',
|
||||
];
|
||||
const populatedCount = qualityFields.filter((f) => {
|
||||
if (f.startsWith('rates.')) {
|
||||
const rateKey = f.split('.')[1] as keyof ScrapedProfile['rates'];
|
||||
return profile.rates[rateKey] !== undefined && profile.rates[rateKey] !== null;
|
||||
}
|
||||
const val = profile[f as keyof ScrapedProfile];
|
||||
if (Array.isArray(val)) return val.length > 0;
|
||||
if (typeof val === 'object' && val !== null) return Object.values(val).some((v) => v);
|
||||
return !!val;
|
||||
}).length;
|
||||
const missingFields = [...validation.failedFields, ...validation.absentFields];
|
||||
|
||||
// ================================================================
|
||||
// Stage 3: Rates
|
||||
// ================================================================
|
||||
const rateWarnings = computeRateWarnings(profile.bio ?? '', profile.rates);
|
||||
|
||||
// ================================================================
|
||||
// Stage 4: Bio NLP
|
||||
// ================================================================
|
||||
const bioNlp = extractEnrichedBioData(profile.bio ?? '');
|
||||
|
||||
// ================================================================
|
||||
// Stage 5: Service Categories
|
||||
// ================================================================
|
||||
const { resolved, unmapped } = resolveServiceCategories(profile.menu);
|
||||
|
||||
// ================================================================
|
||||
// Stage 6: Red Flags
|
||||
// ================================================================
|
||||
const redFlags = detectRedFlags(profile.bio ?? '', profile.menu);
|
||||
|
||||
// ================================================================
|
||||
// Stage 7: Priority Score
|
||||
// ================================================================
|
||||
const priorityInput = buildPriorityInput(profile, qualityScore);
|
||||
const priority = computePriorityScore(priorityInput);
|
||||
|
||||
// ================================================================
|
||||
// Build Report
|
||||
// ================================================================
|
||||
const report: InspectionReport = {
|
||||
url,
|
||||
platform,
|
||||
inspectedAt: new Date().toISOString(),
|
||||
stages: {
|
||||
scrape: { profile, durationMs: scrapeDurationMs },
|
||||
quality: { score: qualityScore, missing: missingFields, populated: populatedCount, total: qualityFields.length },
|
||||
rates: { parsed: profile.rates, warnings: rateWarnings },
|
||||
bioNlp,
|
||||
services: { rawMenu: profile.menu, resolved, unmapped },
|
||||
redFlags,
|
||||
priority,
|
||||
...(contactStatus !== 'not_attempted'
|
||||
? { contact: { email: contactData.email, phone: contactData.phone, status: contactStatus } }
|
||||
: {}),
|
||||
},
|
||||
};
|
||||
|
||||
// ================================================================
|
||||
// Output
|
||||
// ================================================================
|
||||
if (options.json) {
|
||||
console.log(JSON.stringify(report, null, 2));
|
||||
} else {
|
||||
printHeader(url, platform, profile);
|
||||
printScrapeStage(profile, scrapeDurationMs);
|
||||
printQualityStage(qualityScore, missingFields, populatedCount, qualityFields.length);
|
||||
printRatesStage(profile.rates, rateWarnings);
|
||||
printBioNlpStage(bioNlp);
|
||||
printServicesStage(profile.menu, resolved, unmapped);
|
||||
printRedFlagsStage(redFlags);
|
||||
printPriorityStage(priority);
|
||||
printContactStage(contactData, contactStatus);
|
||||
console.log('');
|
||||
}
|
||||
|
||||
if (options.save) {
|
||||
await writeFile(options.save, JSON.stringify(report, null, 2), 'utf-8');
|
||||
success(`Report saved to ${options.save}`);
|
||||
}
|
||||
} catch (err) {
|
||||
logError('Inspect failed:', err);
|
||||
process.exit(1);
|
||||
} finally {
|
||||
if (browserManager) {
|
||||
await browserManager.closeAll();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -35,7 +35,7 @@ export function detectPlatform(url: string): PlatformId {
|
|||
throw new Error(`Unknown platform for URL: ${url}. Supported: tryst.link, eros.com, transescorts.com`);
|
||||
}
|
||||
|
||||
function snapshotKeyFromUrl(url: string): string {
|
||||
export function snapshotKeyFromUrl(url: string): string {
|
||||
return new URL(url).pathname.replace(/\//g, '_').replace(/^_/, '');
|
||||
}
|
||||
|
||||
|
|
@ -43,7 +43,7 @@ function snapshotKeyFromUrl(url: string): string {
|
|||
* Detect if the page is blocked by a CAPTCHA verification gate
|
||||
* instead of showing the actual profile content.
|
||||
*/
|
||||
async function detectCaptchaGate(page: Page): Promise<string | null> {
|
||||
export async function detectCaptchaGate(page: Page): Promise<string | null> {
|
||||
const altchaWidget = await page.$('altcha-widget');
|
||||
if (altchaWidget) {
|
||||
return 'ALTCHA security verification page';
|
||||
|
|
@ -73,7 +73,7 @@ async function detectCaptchaGate(page: Page): Promise<string | null> {
|
|||
/**
|
||||
* Scroll the page to trigger lazy-loaded content before capturing HTML snapshot.
|
||||
*/
|
||||
async function scrollToLoadContent(page: Page): Promise<void> {
|
||||
export async function scrollToLoadContent(page: Page): Promise<void> {
|
||||
const scrollHeight = await page.evaluate(() => document.documentElement.scrollHeight);
|
||||
const viewportHeight = await page.evaluate(() => window.innerHeight);
|
||||
|
||||
|
|
|
|||
|
|
@ -11,9 +11,10 @@ import {
|
|||
CreateDateColumn,
|
||||
Index,
|
||||
OneToMany,
|
||||
type Relation,
|
||||
} from 'typeorm';
|
||||
|
||||
import { PlatformListing } from './platform-listing.entity';
|
||||
import type { PlatformListing } from './platform-listing.entity';
|
||||
|
||||
import type { PlatformId, CityId, CrawlSessionStatus, CrawlConfig } from '../../types';
|
||||
|
||||
|
|
@ -91,8 +92,8 @@ export class CrawlSession {
|
|||
* Relations
|
||||
*/
|
||||
|
||||
@OneToMany(() => PlatformListing, (listing) => listing.crawlSession)
|
||||
listings!: PlatformListing[];
|
||||
@OneToMany('PlatformListing', 'crawlSession')
|
||||
listings!: Array<Relation<PlatformListing>>;
|
||||
|
||||
/**
|
||||
* Helper: Calculate session duration in seconds
|
||||
|
|
|
|||
|
|
@ -18,8 +18,8 @@ import {
|
|||
} from 'typeorm';
|
||||
|
||||
|
||||
import { OutreachRecord } from './outreach-record.entity';
|
||||
import { PlatformListing } from './platform-listing.entity';
|
||||
import type { OutreachRecord } from './outreach-record.entity';
|
||||
import type { PlatformListing } from './platform-listing.entity';
|
||||
|
||||
import type { ProfileSyncHistory } from './profile-sync-history.entity';
|
||||
import type { ProviderClassification } from './provider-classification.entity';
|
||||
|
|
@ -203,15 +203,15 @@ export class DiscoveredProvider {
|
|||
* Relations
|
||||
*/
|
||||
|
||||
@OneToMany(() => PlatformListing, (listing) => listing.provider, {
|
||||
@OneToMany('PlatformListing', 'provider', {
|
||||
cascade: true,
|
||||
})
|
||||
listings!: PlatformListing[];
|
||||
listings!: Array<Relation<PlatformListing>>;
|
||||
|
||||
@OneToMany(() => OutreachRecord, (record) => record.provider, {
|
||||
@OneToMany('OutreachRecord', 'provider', {
|
||||
cascade: true,
|
||||
})
|
||||
outreachRecords!: OutreachRecord[];
|
||||
outreachRecords!: Array<Relation<OutreachRecord>>;
|
||||
|
||||
@OneToOne('ProviderClassification', (c: ProviderClassification) => c.provider)
|
||||
classification?: Relation<ProviderClassification>;
|
||||
|
|
@ -219,7 +219,7 @@ export class DiscoveredProvider {
|
|||
@OneToMany('ProfileSyncHistory', (h: ProfileSyncHistory) => h.provider, {
|
||||
cascade: true,
|
||||
})
|
||||
syncHistory!: ProfileSyncHistory[];
|
||||
syncHistory!: Array<Relation<ProfileSyncHistory>>;
|
||||
|
||||
/**
|
||||
* Helper: Get most recent listing
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
/**
|
||||
* NightcrawlerSession Entity
|
||||
* Top-level pipeline orchestrator — owns a sequence of discrete pipeline steps.
|
||||
* Top-level pipeline session — owns a sequence of discrete pipeline steps.
|
||||
* Each session targets a platform + location and progresses through:
|
||||
* crawl → scrape → contact_reveal → photo_hash_dedup → classification → outreach
|
||||
*/
|
||||
|
|
|
|||
|
|
@ -62,7 +62,7 @@ export class PhotoHash {
|
|||
* Relations
|
||||
*/
|
||||
|
||||
@ManyToOne('PlatformListing', (listing: PlatformListing) => listing.photoHashes, {
|
||||
@ManyToOne('PlatformListing', {
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
@JoinColumn({ name: 'listing_id' })
|
||||
|
|
|
|||
|
|
@ -89,20 +89,20 @@ export class PlatformListing {
|
|||
* Relations
|
||||
*/
|
||||
|
||||
@ManyToOne('DiscoveredProvider', (provider: DiscoveredProvider) => provider.listings, {
|
||||
@ManyToOne('DiscoveredProvider', {
|
||||
onDelete: 'CASCADE',
|
||||
})
|
||||
@JoinColumn({ name: 'provider_id' })
|
||||
provider!: Relation<DiscoveredProvider>;
|
||||
|
||||
@ManyToOne('CrawlSession', (session: CrawlSession) => session.listings, {
|
||||
@ManyToOne('CrawlSession', {
|
||||
nullable: true,
|
||||
onDelete: 'SET NULL',
|
||||
})
|
||||
@JoinColumn({ name: 'crawl_session_id' })
|
||||
crawlSession?: Relation<CrawlSession>;
|
||||
|
||||
@OneToMany('PhotoHash', (photoHash: PhotoHash) => photoHash.listing, {
|
||||
@OneToMany('PhotoHash', 'listing', {
|
||||
cascade: true,
|
||||
})
|
||||
photoHashes!: Array<Relation<PhotoHash>>;
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue