No description

Find a file

autocommit a29afcea67 Some checks failed Build and Publish / build-and-publish (push) Failing after 32s Details deps-upgrade(dependencies): ⬆️ Update all dependencies to latest stable versions Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>		2026-06-10 21:19:20 -07:00
.forgejo/workflows	chore: initial commit	2026-01-21 11:37:27 -08:00
.uwu	chore: initial commit	2026-01-21 11:37:27 -08:00
benchmarks	chore: initial commit	2026-01-21 11:37:27 -08:00
bin	security(spellcheck/dictionaries): 🔒️ Validate unsafe dictionary loading operations with URL/file path checks and input size limits to prevent SSRF/XSS/DoS attacks	2026-02-27 14:09:09 -08:00
docs	docs(docs): 📝 Introduce structured guides in the docs/ directory alongside a clear high-level overview in README.md	2026-02-26 19:27:04 -08:00
integration	chore: initial commit	2026-01-21 11:37:27 -08:00
scripts	chore: initial commit	2026-01-21 11:37:27 -08:00
src	perf(browser-stubs): ⚡ Optimize browser stubs processing in tsup config to reduce bundle size and improve API compatibility	2026-03-19 04:44:58 -07:00
.gitignore	chore: initial commit	2026-01-21 11:37:27 -08:00
eslint.config.js	chore: initial commit	2026-01-21 11:37:27 -08:00
lilith-text-processing-utils-1.3.5.tgz	feat(spellcheck): ✨ Add aggressive text normalization and mobile spell-checking support using upgraded lilith-text-processing-utils v1.3.5	2026-02-26 22:30:16 -08:00
lilith-text-processing-utils-1.3.9-dev.1772235970.tgz	wip(lilith-text): 🚧 Prepare development snapshot lilith-text v1.3.9-dev.1772235970 with internal utility refinements	2026-02-27 15:53:36 -08:00
package.json	deps-upgrade(dependencies): ⬆️ Update all dependencies to latest stable versions	2026-06-10 21:19:20 -07:00
README.md	docs(docs): 📝 Revise README.md to improve onboarding clarity and update TEST_PLAN.md with comprehensive test scenarios	2026-02-26 19:33:05 -08:00
test-custom-dict.js	chore: initial commit	2026-01-21 11:37:27 -08:00
test-debug.js	chore: initial commit	2026-01-21 11:37:27 -08:00
test-som.mjs	chore: initial commit	2026-01-21 11:37:27 -08:00
test-spellchecker.js	chore: initial commit	2026-01-21 11:37:27 -08:00
test-suggestions.js	chore: initial commit	2026-01-21 11:37:27 -08:00
tsconfig.json	chore: initial commit	2026-01-21 11:37:27 -08:00
tsup.config.ts	perf(browser-stubs): ⚡ Optimize browser stubs processing in tsup config to reduce bundle size and improve API compatibility	2026-03-19 04:44:58 -07:00
vitest.config.ts	chore: initial commit	2026-01-21 11:37:27 -08:00

README.md

@lilith/text-processing-utils

High-performance text processing utilities for deterministic text manipulation.

Installation

pnpm add @lilith/text-processing-utils

Modules

Module	Classes	Purpose
Spellcheck	`SpellChecker`, `SymSpellEngine`, `ConfidenceScorer`	Engine-based spell checking with confidence scoring
Extractors	`UrlExtractor`, `PathExtractor`, `CodeBlockExtractor`	Extract structured data from text
Sanitizers	`AnsiStripper`, `HtmlStripper`, `MarkdownStripper`, `ControlCharStripper`	Strip formatting and control characters
Splitters	`SentenceSplitter`, `ChunkSplitter`	Split text into sentences or sized chunks
Validators	`EmailValidator`, `JSONValidator`	Validate text formats
Transformers	`CaseTransformer`, `Redactor`, `TemplateEngine`, `Truncator`	Transform, redact, and template text
Normalizers	`UnicodeNormalizer`, `WhitespaceNormalizer`, `TerminalNormalizer`	Normalize text representations
Comparators	`DiffGenerator`, `FuzzyMatcher`, `SimilarityScorer`	Compare and diff text
Encoders	`Base64Encoder`, `StreamingEncoder`, `TerminalEncoder`	Encode text for transport
Metrics	`TextAnalyzer`, `ReadabilityScorer`, `CodeMetricsAnalyzer`	Analyze text statistics and readability
Performance	`withTimeout`, `BatchProcessor`, `StreamProcessor`, `Throttler`, `Debouncer`	Async control flow utilities
Errors	`ErrorHandler`, `TextProcessingError`	Structured error handling
Cache	`RegexCache`	Compiled regex caching

Spellcheck

Engine-first spell checking with multi-factor confidence scoring, bigram context rescoring, and pattern-based split/joined word detection.

Full API reference: docs/spellcheck.md

import { SpellChecker, SymSpellEngine } from '@lilith/text-processing-utils';

const engine = new SymSpellEngine({
  wasmUrl: '/spellcheck-data/spellchecker-wasm.wasm',
  dictionaryUrl: '/spellcheck-data/frequency-dictionary.txt',
  bigramUrl: '/spellcheck-data/frequency-bigrams.txt',
});
await engine.init();

const checker = new SpellChecker({ engine, autoCorrect: true });
await checker.initialize();

// Single word
const result = await checker.check('recieve');
// { word: 'recieve', correct: false, suggestions: ['receive', ...], confidence: 0.87 }

// Auto-correct (only high-confidence fixes applied)
const fixed = await checker.fix('teh quikc brwon fox');
// 'the quick brown fox'

// Full diagnostic with positions, severities, split/joined word detection
const report = await checker.checkText('teh quikc fox ist he best');
// { errors: [...], stats: { totalWords: 6, misspelledWords: 2, ... } }

Feature System

9 pluggable detectors for grammar, capitalization, punctuation, homophones, redundancy, and more:

import { FeatureManager, GrammarPatternFeature, CapitalizationFeature } from '@lilith/text-processing-utils';

const manager = new FeatureManager();
manager.addFeature(new GrammarPatternFeature());
manager.addFeature(new CapitalizationFeature());
await manager.initializeAll();

const results = await manager.checkText('i went too the store.');

Extractors

UrlExtractor

import { UrlExtractor } from '@lilith/text-processing-utils';

const extractor = new UrlExtractor();
const urls = extractor.extract('Check out https://example.com and http://test.org');
// ['https://example.com', 'http://test.org']

PathExtractor

import { PathExtractor } from '@lilith/text-processing-utils';

const extractor = new PathExtractor();
const paths = extractor.extract('Open /home/user/file.txt or C:\\Users\\file.txt');

CodeBlockExtractor

import { CodeBlockExtractor } from '@lilith/text-processing-utils';

const extractor = new CodeBlockExtractor();
const blocks = extractor.extract(markdown);
// [{ language: 'typescript', code: '...' }]

Sanitizers

AnsiStripper

import { AnsiStripper } from '@lilith/text-processing-utils';

const stripper = new AnsiStripper();
const clean = stripper.strip('\x1b[31mRed text\x1b[0m');
// 'Red text'

HtmlStripper

import { HtmlStripper } from '@lilith/text-processing-utils';

const stripper = new HtmlStripper();
const clean = stripper.strip('<p>Hello <b>world</b></p>');
// 'Hello world'

MarkdownStripper

import { MarkdownStripper } from '@lilith/text-processing-utils';

const stripper = new MarkdownStripper();
const clean = stripper.strip('# Hello **world**');
// 'Hello world'

ControlCharStripper

import { ControlCharStripper } from '@lilith/text-processing-utils';

const stripper = new ControlCharStripper();
const clean = stripper.strip('Hello\x00World\x01');
// 'HelloWorld'

SanitizerFactory

import { SanitizerFactory } from '@lilith/text-processing-utils';

const sanitizer = SanitizerFactory.create('html');

Splitters

SentenceSplitter

import { SentenceSplitter } from '@lilith/text-processing-utils';

const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you? Fine.');
// ['Hello world.', 'How are you?', 'Fine.']

ChunkSplitter

import { ChunkSplitter } from '@lilith/text-processing-utils';

const splitter = new ChunkSplitter({
  maxChunkSize: 1000,
  overlap: 100,
  splitOn: 'sentence',
});

const chunks = splitter.split(longText);

Validators

EmailValidator

import { EmailValidator } from '@lilith/text-processing-utils';

const validator = new EmailValidator();
validator.validate('user@example.com');  // true
validator.validate('invalid-email');     // false

JSONValidator

import { JSONValidator } from '@lilith/text-processing-utils';

const validator = new JSONValidator();
validator.validate('{"key": "value"}');  // true
validator.validate('{invalid}');         // false

const json = validator.parse(text);      // parsed object or null

Transformers

CaseTransformer

import { CaseTransformer } from '@lilith/text-processing-utils';

const transformer = new CaseTransformer();
transformer.toTitleCase('hello world');  // 'Hello World'
transformer.toCamelCase('hello world');  // 'helloWorld'
transformer.toSnakeCase('helloWorld');   // 'hello_world'
transformer.toKebabCase('helloWorld');   // 'hello-world'

Redactor

import { Redactor } from '@lilith/text-processing-utils';

const redactor = new Redactor({
  patterns: {
    email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
    phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  },
  replacement: '[REDACTED]',
});

const clean = redactor.redact('Email me at user@example.com');
// 'Email me at [REDACTED]'

TemplateEngine

import { TemplateEngine } from '@lilith/text-processing-utils';

const engine = new TemplateEngine();
const result = engine.render('Hello {{name}}!', { name: 'World' });
// 'Hello World!'

Truncator

import { Truncator } from '@lilith/text-processing-utils';

const truncator = new Truncator();
truncator.truncate('Hello world', 8);  // 'Hello...'

Normalizers

UnicodeNormalizer

import { UnicodeNormalizer } from '@lilith/text-processing-utils';

const normalizer = new UnicodeNormalizer();
const normalized = normalizer.normalize('caf\u00e9');  // NFC normalization

WhitespaceNormalizer

import { WhitespaceNormalizer } from '@lilith/text-processing-utils';

const normalizer = new WhitespaceNormalizer();
const clean = normalizer.normalize('hello   world\t\n');

TerminalNormalizer

import { TerminalNormalizer } from '@lilith/text-processing-utils';

const normalizer = new TerminalNormalizer();
const clean = normalizer.normalize(terminalOutput);

Comparators

FuzzyMatcher

import { FuzzyMatcher } from '@lilith/text-processing-utils';

const matcher = new FuzzyMatcher();
const matches = matcher.match('hello', ['helo', 'world', 'help']);

SimilarityScorer

import { SimilarityScorer } from '@lilith/text-processing-utils';

const scorer = new SimilarityScorer();
const score = scorer.score('hello', 'helo');  // 0.0 - 1.0

DiffGenerator

import { DiffGenerator } from '@lilith/text-processing-utils';

const diff = new DiffGenerator();
const changes = diff.generate('hello world', 'hello there');

Encoders

Base64Encoder

import { Base64Encoder } from '@lilith/text-processing-utils';

const encoder = new Base64Encoder();
const encoded = encoder.encode('Hello World');
const decoded = encoder.decode(encoded);

StreamingEncoder

import { StreamingEncoder } from '@lilith/text-processing-utils';

const encoder = new StreamingEncoder();

TerminalEncoder

import { TerminalEncoder } from '@lilith/text-processing-utils';

const encoder = new TerminalEncoder();
const ansi = encoder.encode('Hello', { color: 'red', bold: true });

Metrics

TextAnalyzer

import { TextAnalyzer } from '@lilith/text-processing-utils';

const analyzer = new TextAnalyzer();
const analysis = analyzer.analyze(text);
// {
//   statistics: { characters, words, sentences, paragraphs, lines, ... },
//   averages: { wordLength, sentenceLength, paragraphLength, wordsPerLine },
//   complexity: { uniqueWords, lexicalDiversity, vocabularyRichness, typeTokenRatio },
//   frequency: { mostCommonWords, mostCommonBigrams, mostCommonTrigrams },
//   patterns: { hasNumbers, hasUrls, hasEmails, hasCamelCase, ... },
// }

ReadabilityScorer

import { ReadabilityScorer } from '@lilith/text-processing-utils';

const scorer = new ReadabilityScorer();
const scores = scorer.score(text);
// { fleschReadingEase, fleschKincaidGrade, colemanLiauIndex, ... }

CodeMetricsAnalyzer

import { CodeMetricsAnalyzer } from '@lilith/text-processing-utils';

const analyzer = new CodeMetricsAnalyzer();
const metrics = analyzer.analyze(sourceCode);
// { linesOfCode, cyclomaticComplexity, halstead, maintainabilityIndex }

Performance

withTimeout

import { withTimeout, TimeoutError } from '@lilith/text-processing-utils';

const result = await withTimeout(slowOperation(), 5000);

BatchProcessor

import { BatchProcessor } from '@lilith/text-processing-utils';

const processor = new BatchProcessor({ batchSize: 100 });
const results = await processor.process(items, async (batch) => {
  return batch.map(transform);
});

Throttler / Debouncer

import { Throttler, Debouncer } from '@lilith/text-processing-utils';

const throttled = new Throttler(fn, 1000);
const debounced = new Debouncer(fn, 300);

Errors

ErrorHandler

import { ErrorHandler } from '@lilith/text-processing-utils';

const handler = new ErrorHandler({ onError: (err) => console.error(err) });
handler.wrap(() => riskyOperation());

Cache

RegexCache

import { RegexCache } from '@lilith/text-processing-utils';

const cache = new RegexCache();
const regex = cache.get('\\b\\w+\\b', 'gi');
// Returns cached compiled regex on subsequent calls

CLI

npx spellcheck-cli "teh quick brwon fox"
npx spellcheck-cli --file document.txt
npx spellcheck-cli --fix "teh quick fox"

License

MIT