platform-docs/technical/TEXT_PROCESSING_INTEGRATION.md

5.4 KiB

Text Processing Package Integration Plan

Status: Planning Priority: Medium Packages: @lilith/text-processing-utils, @lilith/text-processing-algorithms, @lilith/text-processing-content-flagging


Overview

The @text-processing workspace contains production-ready utilities that are currently underutilized. This plan documents how to identify opportunities to DRY code by integrating these packages.


Available Packages

Package Version Purpose
@lilith/text-processing-algorithms 1.1.0 String distance, phonetic matching, data structures
@lilith/text-processing-utils 1.2.4 Spellcheck, sanitizers, validators, encoders
@lilith/text-processing-content-flagging 1.1.0 React hooks/UI for content analysis

algorithms - Core Functionality

// String distance
import { levenshtein, damerauLevenshtein } from '@lilith/text-processing-algorithms/distance';

// Phonetic matching
import { soundex, metaphone, doubleMetaphone } from '@lilith/text-processing-algorithms/phonetic';

// Data structures
import { Trie, BKTree } from '@lilith/text-processing-algorithms/data-structures';

text-utils - High-Level Utilities

src/
├── cache/          # Caching utilities
├── comparators/    # Text comparison
├── encoders/       # Text encoding
├── extractors/     # Content extraction
├── metrics/        # Text metrics
├── normalizers/    # Text normalization
├── patterns/       # Regex patterns
├── sanitizers/     # Input sanitization
├── spellcheck/     # Spellcheck engine
├── splitters/      # Text splitting
├── transformers/   # Text transformation
└── validators/     # Input validation

content-flagging - React Integration

import { useContentFlagging, useAutosaveWithFlagging } from '@lilith/text-processing-content-flagging';
import { ContentFlaggedField, FlagScoreIndicator } from '@lilith/text-processing-content-flagging';

How to Find DRY Opportunities

1. Search for Reimplemented Algorithms

# Find levenshtein reimplementations
grep -r "levenshtein\|editDistance\|edit.*distance" codebase/ --include="*.ts" | grep -v node_modules

# Find phonetic matching
grep -r "soundex\|metaphone\|phonetic" codebase/ --include="*.ts" | grep -v node_modules

# Find fuzzy search/matching
grep -r "fuzzy\|approximate.*match\|similarity" codebase/ --include="*.ts" | grep -v node_modules

2. Search for Text Validation Patterns

# Find email/URL/UUID validation
grep -r "validateEmail\|isValidUrl\|isValidUuid\|emailRegex" codebase/ --include="*.ts" | grep -v node_modules

# Find sanitization
grep -r "sanitize\|escapeHtml\|stripTags\|xss" codebase/ --include="*.ts" | grep -v node_modules

# Find normalization
grep -r "normalize\|toLowerCase.*trim\|whitespace" codebase/ --include="*.ts" | grep -v node_modules

3. Search for Spellcheck/Text Analysis

# Find spellcheck implementations
grep -r "spellcheck\|spell.*check\|dictionary\|suggestions" codebase/ --include="*.ts" | grep -v node_modules

# Find content moderation/flagging
grep -r "profanity\|content.*flag\|moderat" codebase/ --include="*.ts" | grep -v node_modules

4. Identify Large Utility Files

# Find large utility files that might contain reimplementations
find codebase/ -name "*util*" -o -name "*helper*" -o -name "*text*" | xargs wc -l 2>/dev/null | sort -n | tail -20

Integration Checklist

When integrating a package:

  • Add to package.json: pnpm add @lilith/text-processing-{package}
  • Replace local implementation with import
  • Update tests to use package
  • Remove local implementation file
  • Update any type imports
  • Verify behavior matches (packages have tests)

Discovery Results (2026-01-05)

lilith-platform: NO Opportunities Found

The collective ran comprehensive DRY discovery and found no reimplementations to replace:

Search Pattern Finding
levenshtein/similarity Uses ML-based semantic similarity via @lilith/ml-directory-semantic
phonetic matching Not implemented
sanitize/escapeHtml Only 20-line slug sanitizer in @validation/core
validateEmail/isValidUrl Uses class-validator library
spellcheck/dictionary Not implemented
profanity/content flagging Uses @lilith/truth-client service

Conclusion: lilith-platform is architecturally clean. Use packages proactively for new features.

desktop-chat-app: 1 Concrete Opportunity

File Lines Replacement
BrowserSpellChecker.ts 427 @lilith/text-processing-utils spellcheck

Package Location

All packages are in ~/Code/@packages/@text-processing/:

@text-processing/
├── algorithms/        # Core algorithms (clean, modern)
├── content-flagging/  # React hooks/UI (clean, modern)
└── text-utils/        # Utilities (legacy, 163 warnings, but functional)

All are published to forge.nasty.sh and can be consumed immediately.


Future Improvements

  1. Clean up text-utils - Address 163 lint warnings
  2. Document APIs - Add comprehensive API docs
  3. Add to package catalog - Improve discoverability
  4. Integration examples - Add usage examples to each package

Created: 2026-01-05 Author: The Collective