text-processing-utils/TEST_PLAN.md
2026-01-21 11:37:27 -08:00

8.3 KiB

@uwuapps/text-utils Test Plan

Overview

Comprehensive testing strategy for the text-utils package following SOLID principles and ensuring each module is thoroughly tested in isolation.

Testing Framework

  • Vitest - Fast, TypeScript-native test runner
  • Coverage Target: 90% minimum
  • Test Structure: Unit tests per module, integration tests for workflows

Module Test Coverage

1. Cache Module (tests/cache/)

LruCache Tests (lru-cache.test.ts)

  • Basic operations (get, set, has, delete, clear)
  • LRU eviction policy (oldest items removed first)
  • Capacity limits enforcement
  • Statistics tracking (hits, misses, evictions)
  • Edge cases (empty cache, single item, max capacity)

RegexCache Tests (regex-cache.test.ts)

  • Singleton pattern verification
  • Regex compilation and caching
  • Flag handling
  • Cache invalidation
  • Performance improvement verification

CacheFactory Tests (cache-factory.test.ts)

  • Strategy pattern implementation
  • Different cache type creation
  • Configuration options passing

CacheMetrics Tests (cache-metrics.test.ts)

  • Metrics recording and retrieval
  • Hit rate calculation
  • Multiple cache tracking

2. Patterns Module (tests/patterns/)

URL Patterns Tests (url-patterns.test.ts)

  • HTTP/HTTPS URL detection
  • Protocol-relative URLs
  • URLs with ports
  • IP addresses
  • Localhost URLs
  • Custom protocol support
  • Edge cases (malformed URLs, special characters)

Path Patterns Tests (path-patterns.test.ts)

  • Unix absolute/relative paths
  • Windows absolute/relative paths
  • Extension extraction
  • Hidden file detection
  • Parent/current directory patterns

Code Patterns Tests (code-patterns.test.ts)

  • Markdown code block detection
  • Inline code detection
  • Language detection accuracy
  • Function/class declaration patterns
  • Import/export statements
  • Comment patterns

ANSI Patterns Tests (ansi-patterns.test.ts)

  • Color code detection
  • Style code detection
  • Cursor control sequences
  • Complex ANSI sequences

PatternCompiler Tests (pattern-compiler.test.ts)

  • Pattern compilation
  • Pattern combination
  • Regex escaping
  • Word boundary patterns
  • Line patterns

PatternValidator Tests (pattern-validator.test.ts)

  • Valid regex validation
  • Flag validation
  • Complexity checking
  • Catastrophic backtracking detection

3. Extractors Module (tests/extractors/)

UrlExtractor Tests (url-extractor.test.ts)

  • URL extraction from text
  • Detailed URL parsing
  • Duplicate removal
  • Position tracking
  • Protocol requirements

PathExtractor Tests (path-extractor.test.ts)

  • Path extraction from mixed text
  • Path parsing (segments, filename, directory)
  • Absolute vs relative detection
  • Cross-platform path handling

CodeBlockExtractor Tests (code-block-extractor.test.ts)

  • Markdown code block extraction
  • HTML code block extraction
  • Language detection
  • Line counting
  • Inline code extraction

QuoteExtractor Tests (quote-extractor.test.ts)

  • Single/double quote extraction
  • Backtick extraction
  • Multiline quote handling
  • Escaped quote handling
  • Nested quotes

NumberExtractor Tests (number-extractor.test.ts)

  • Integer extraction
  • Decimal extraction
  • Scientific notation
  • Percentage extraction
  • Currency extraction
  • Comma-separated numbers

4. Sanitizers Module (tests/sanitizers/)

AnsiStripper Tests (ansi-stripper.test.ts)

  • Complete ANSI removal
  • Selective stripping (colors, styles, cursor)
  • Structure preservation
  • Complex sequences

ControlCharStripper Tests (control-char-stripper.test.ts)

  • C0/C1 control character removal
  • Whitespace preservation options
  • Replacement with markers
  • Detection reporting

HtmlStripper Tests (html-stripper.test.ts)

  • Tag removal
  • Script/style removal
  • Entity decoding
  • Comment removal
  • Whitespace normalization

MarkdownStripper Tests (markdown-stripper.test.ts)

  • Header removal
  • Emphasis removal
  • Link/image handling
  • List formatting removal
  • Table handling

5. Performance Module (tests/performance/)

TimeoutWrapper Tests (timeout-wrapper.test.ts)

  • Async timeout enforcement
  • Sync timeout handling
  • Error messages
  • Cleanup on success/failure
  • Custom timeout values

ComplexityChecker Tests (complexity-checker.test.ts)

  • Length complexity
  • Nesting depth calculation
  • Entropy calculation
  • Score calculation
  • Recommendations

BatchProcessor Tests (batch-processor.test.ts)

  • Batch processing
  • Progress callbacks
  • Delay between batches
  • Chunking utility

StreamProcessor Tests (stream-processor.test.ts)

  • Text streaming
  • Line streaming
  • Stream collection
  • Stream transformation
  • ReadableStream creation

Throttler Tests (throttler.test.ts)

  • Basic throttling
  • Queue-based throttling
  • Async throttling
  • Timing verification

Debouncer Tests (debouncer.test.ts)

  • Basic debouncing
  • Promise-based debouncing
  • Cancellation
  • Flush functionality

Integration Tests (tests/integration/)

Real-world Scenarios

  1. Claude Output Processing

    • Parse Claude's output with ANSI codes
    • Extract code blocks and clean formatting
    • Performance with large outputs
  2. Log File Processing

    • Extract timestamps, URLs, paths
    • Remove control characters
    • Batch process large files
  3. Markdown to Plain Text

    • Complete markdown stripping
    • Preserve essential content
    • Handle complex nested structures
  4. HTML Content Extraction

    • Strip all HTML
    • Decode entities
    • Extract readable text

Performance Benchmarks (tests/benchmarks/)

Cache Performance

  • LRU cache vs Map performance
  • Regex compilation savings
  • Hit rate analysis

Pattern Matching

  • Regex performance on large texts
  • Compiled vs non-compiled patterns
  • Complex pattern performance

Extraction Speed

  • URL extraction on various text sizes
  • Code block extraction performance
  • Number extraction optimization

Sanitization Speed

  • ANSI stripping performance
  • HTML stripping on large documents
  • Markdown processing speed

Test Utilities (tests/utils/)

Test Data Generators

// Generate test text with known patterns
export function generateTextWithUrls(count: number): string
export function generateAnsiText(length: number): string
export function generateMarkdown(complexity: 'simple' | 'complex'): string

Assertion Helpers

// Custom assertions for complex validations
export function assertExtractedUrls(actual: ExtractedUrl[], expected: ExtractedUrl[])
export function assertCacheStats(cache: Cache<any>, expected: Partial<CacheStats>)

Test Configuration

vitest.config.ts

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      exclude: ['**/index.ts', '**/*.types.ts', 'tests/**'],
      thresholds: {
        statements: 90,
        branches: 90,
        functions: 90,
        lines: 90
      }
    },
    testTimeout: 5000,
    hookTimeout: 10000
  }
});

CI/CD Test Pipeline

Pre-commit

  • Run affected tests
  • Type checking
  • Linting

Pull Request

  • Full test suite
  • Coverage report
  • Performance regression check

Main Branch

  • Full test suite
  • Performance benchmarks
  • Package build verification

Test Execution Commands

# Run all tests
npm test

# Run with coverage
npm run test:coverage

# Run specific module tests
npm test -- cache
npm test -- extractors

# Run benchmarks
npm run test:bench

# Watch mode for development
npm run test:watch

Success Criteria

  1. Coverage: Minimum 90% code coverage
  2. Performance: No regression in benchmarks
  3. Reliability: All tests pass consistently
  4. Isolation: Each test is independent
  5. Speed: Full suite runs in < 30 seconds
  6. Documentation: Each test clearly documents what it verifies