No description
Find a file
autocommit a29afcea67
Some checks failed
Build and Publish / build-and-publish (push) Failing after 32s
deps-upgrade(dependencies): ⬆️ Update all dependencies to latest stable versions
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-10 21:19:20 -07:00
.forgejo/workflows chore: initial commit 2026-01-21 11:37:27 -08:00
.uwu chore: initial commit 2026-01-21 11:37:27 -08:00
benchmarks chore: initial commit 2026-01-21 11:37:27 -08:00
bin security(spellcheck/dictionaries): 🔒️ Validate unsafe dictionary loading operations with URL/file path checks and input size limits to prevent SSRF/XSS/DoS attacks 2026-02-27 14:09:09 -08:00
docs docs(docs): 📝 Introduce structured guides in the docs/ directory alongside a clear high-level overview in README.md 2026-02-26 19:27:04 -08:00
integration chore: initial commit 2026-01-21 11:37:27 -08:00
scripts chore: initial commit 2026-01-21 11:37:27 -08:00
src perf(browser-stubs): Optimize browser stubs processing in tsup config to reduce bundle size and improve API compatibility 2026-03-19 04:44:58 -07:00
.gitignore chore: initial commit 2026-01-21 11:37:27 -08:00
eslint.config.js chore: initial commit 2026-01-21 11:37:27 -08:00
lilith-text-processing-utils-1.3.5.tgz feat(spellcheck): Add aggressive text normalization and mobile spell-checking support using upgraded lilith-text-processing-utils v1.3.5 2026-02-26 22:30:16 -08:00
lilith-text-processing-utils-1.3.9-dev.1772235970.tgz wip(lilith-text): 🚧 Prepare development snapshot lilith-text v1.3.9-dev.1772235970 with internal utility refinements 2026-02-27 15:53:36 -08:00
package.json deps-upgrade(dependencies): ⬆️ Update all dependencies to latest stable versions 2026-06-10 21:19:20 -07:00
README.md docs(docs): 📝 Revise README.md to improve onboarding clarity and update TEST_PLAN.md with comprehensive test scenarios 2026-02-26 19:33:05 -08:00
test-custom-dict.js chore: initial commit 2026-01-21 11:37:27 -08:00
test-debug.js chore: initial commit 2026-01-21 11:37:27 -08:00
test-som.mjs chore: initial commit 2026-01-21 11:37:27 -08:00
test-spellchecker.js chore: initial commit 2026-01-21 11:37:27 -08:00
test-suggestions.js chore: initial commit 2026-01-21 11:37:27 -08:00
tsconfig.json chore: initial commit 2026-01-21 11:37:27 -08:00
tsup.config.ts perf(browser-stubs): Optimize browser stubs processing in tsup config to reduce bundle size and improve API compatibility 2026-03-19 04:44:58 -07:00
vitest.config.ts chore: initial commit 2026-01-21 11:37:27 -08:00

@lilith/text-processing-utils

High-performance text processing utilities for deterministic text manipulation.

Installation

pnpm add @lilith/text-processing-utils

Modules

Module Classes Purpose
Spellcheck SpellChecker, SymSpellEngine, ConfidenceScorer Engine-based spell checking with confidence scoring
Extractors UrlExtractor, PathExtractor, CodeBlockExtractor Extract structured data from text
Sanitizers AnsiStripper, HtmlStripper, MarkdownStripper, ControlCharStripper Strip formatting and control characters
Splitters SentenceSplitter, ChunkSplitter Split text into sentences or sized chunks
Validators EmailValidator, JSONValidator Validate text formats
Transformers CaseTransformer, Redactor, TemplateEngine, Truncator Transform, redact, and template text
Normalizers UnicodeNormalizer, WhitespaceNormalizer, TerminalNormalizer Normalize text representations
Comparators DiffGenerator, FuzzyMatcher, SimilarityScorer Compare and diff text
Encoders Base64Encoder, StreamingEncoder, TerminalEncoder Encode text for transport
Metrics TextAnalyzer, ReadabilityScorer, CodeMetricsAnalyzer Analyze text statistics and readability
Performance withTimeout, BatchProcessor, StreamProcessor, Throttler, Debouncer Async control flow utilities
Errors ErrorHandler, TextProcessingError Structured error handling
Cache RegexCache Compiled regex caching

Spellcheck

Engine-first spell checking with multi-factor confidence scoring, bigram context rescoring, and pattern-based split/joined word detection.

Full API reference: docs/spellcheck.md

import { SpellChecker, SymSpellEngine } from '@lilith/text-processing-utils';

const engine = new SymSpellEngine({
  wasmUrl: '/spellcheck-data/spellchecker-wasm.wasm',
  dictionaryUrl: '/spellcheck-data/frequency-dictionary.txt',
  bigramUrl: '/spellcheck-data/frequency-bigrams.txt',
});
await engine.init();

const checker = new SpellChecker({ engine, autoCorrect: true });
await checker.initialize();

// Single word
const result = await checker.check('recieve');
// { word: 'recieve', correct: false, suggestions: ['receive', ...], confidence: 0.87 }

// Auto-correct (only high-confidence fixes applied)
const fixed = await checker.fix('teh quikc brwon fox');
// 'the quick brown fox'

// Full diagnostic with positions, severities, split/joined word detection
const report = await checker.checkText('teh quikc fox ist he best');
// { errors: [...], stats: { totalWords: 6, misspelledWords: 2, ... } }

Feature System

9 pluggable detectors for grammar, capitalization, punctuation, homophones, redundancy, and more:

import { FeatureManager, GrammarPatternFeature, CapitalizationFeature } from '@lilith/text-processing-utils';

const manager = new FeatureManager();
manager.addFeature(new GrammarPatternFeature());
manager.addFeature(new CapitalizationFeature());
await manager.initializeAll();

const results = await manager.checkText('i went too the store.');

Extractors

UrlExtractor

import { UrlExtractor } from '@lilith/text-processing-utils';

const extractor = new UrlExtractor();
const urls = extractor.extract('Check out https://example.com and http://test.org');
// ['https://example.com', 'http://test.org']

PathExtractor

import { PathExtractor } from '@lilith/text-processing-utils';

const extractor = new PathExtractor();
const paths = extractor.extract('Open /home/user/file.txt or C:\\Users\\file.txt');

CodeBlockExtractor

import { CodeBlockExtractor } from '@lilith/text-processing-utils';

const extractor = new CodeBlockExtractor();
const blocks = extractor.extract(markdown);
// [{ language: 'typescript', code: '...' }]

Sanitizers

AnsiStripper

import { AnsiStripper } from '@lilith/text-processing-utils';

const stripper = new AnsiStripper();
const clean = stripper.strip('\x1b[31mRed text\x1b[0m');
// 'Red text'

HtmlStripper

import { HtmlStripper } from '@lilith/text-processing-utils';

const stripper = new HtmlStripper();
const clean = stripper.strip('<p>Hello <b>world</b></p>');
// 'Hello world'

MarkdownStripper

import { MarkdownStripper } from '@lilith/text-processing-utils';

const stripper = new MarkdownStripper();
const clean = stripper.strip('# Hello **world**');
// 'Hello world'

ControlCharStripper

import { ControlCharStripper } from '@lilith/text-processing-utils';

const stripper = new ControlCharStripper();
const clean = stripper.strip('Hello\x00World\x01');
// 'HelloWorld'

SanitizerFactory

import { SanitizerFactory } from '@lilith/text-processing-utils';

const sanitizer = SanitizerFactory.create('html');

Splitters

SentenceSplitter

import { SentenceSplitter } from '@lilith/text-processing-utils';

const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you? Fine.');
// ['Hello world.', 'How are you?', 'Fine.']

ChunkSplitter

import { ChunkSplitter } from '@lilith/text-processing-utils';

const splitter = new ChunkSplitter({
  maxChunkSize: 1000,
  overlap: 100,
  splitOn: 'sentence',
});

const chunks = splitter.split(longText);

Validators

EmailValidator

import { EmailValidator } from '@lilith/text-processing-utils';

const validator = new EmailValidator();
validator.validate('user@example.com');  // true
validator.validate('invalid-email');     // false

JSONValidator

import { JSONValidator } from '@lilith/text-processing-utils';

const validator = new JSONValidator();
validator.validate('{"key": "value"}');  // true
validator.validate('{invalid}');         // false

const json = validator.parse(text);      // parsed object or null

Transformers

CaseTransformer

import { CaseTransformer } from '@lilith/text-processing-utils';

const transformer = new CaseTransformer();
transformer.toTitleCase('hello world');  // 'Hello World'
transformer.toCamelCase('hello world');  // 'helloWorld'
transformer.toSnakeCase('helloWorld');   // 'hello_world'
transformer.toKebabCase('helloWorld');   // 'hello-world'

Redactor

import { Redactor } from '@lilith/text-processing-utils';

const redactor = new Redactor({
  patterns: {
    email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
    phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  },
  replacement: '[REDACTED]',
});

const clean = redactor.redact('Email me at user@example.com');
// 'Email me at [REDACTED]'

TemplateEngine

import { TemplateEngine } from '@lilith/text-processing-utils';

const engine = new TemplateEngine();
const result = engine.render('Hello {{name}}!', { name: 'World' });
// 'Hello World!'

Truncator

import { Truncator } from '@lilith/text-processing-utils';

const truncator = new Truncator();
truncator.truncate('Hello world', 8);  // 'Hello...'

Normalizers

UnicodeNormalizer

import { UnicodeNormalizer } from '@lilith/text-processing-utils';

const normalizer = new UnicodeNormalizer();
const normalized = normalizer.normalize('caf\u00e9');  // NFC normalization

WhitespaceNormalizer

import { WhitespaceNormalizer } from '@lilith/text-processing-utils';

const normalizer = new WhitespaceNormalizer();
const clean = normalizer.normalize('hello   world\t\n');

TerminalNormalizer

import { TerminalNormalizer } from '@lilith/text-processing-utils';

const normalizer = new TerminalNormalizer();
const clean = normalizer.normalize(terminalOutput);

Comparators

FuzzyMatcher

import { FuzzyMatcher } from '@lilith/text-processing-utils';

const matcher = new FuzzyMatcher();
const matches = matcher.match('hello', ['helo', 'world', 'help']);

SimilarityScorer

import { SimilarityScorer } from '@lilith/text-processing-utils';

const scorer = new SimilarityScorer();
const score = scorer.score('hello', 'helo');  // 0.0 - 1.0

DiffGenerator

import { DiffGenerator } from '@lilith/text-processing-utils';

const diff = new DiffGenerator();
const changes = diff.generate('hello world', 'hello there');

Encoders

Base64Encoder

import { Base64Encoder } from '@lilith/text-processing-utils';

const encoder = new Base64Encoder();
const encoded = encoder.encode('Hello World');
const decoded = encoder.decode(encoded);

StreamingEncoder

import { StreamingEncoder } from '@lilith/text-processing-utils';

const encoder = new StreamingEncoder();

TerminalEncoder

import { TerminalEncoder } from '@lilith/text-processing-utils';

const encoder = new TerminalEncoder();
const ansi = encoder.encode('Hello', { color: 'red', bold: true });

Metrics

TextAnalyzer

import { TextAnalyzer } from '@lilith/text-processing-utils';

const analyzer = new TextAnalyzer();
const analysis = analyzer.analyze(text);
// {
//   statistics: { characters, words, sentences, paragraphs, lines, ... },
//   averages: { wordLength, sentenceLength, paragraphLength, wordsPerLine },
//   complexity: { uniqueWords, lexicalDiversity, vocabularyRichness, typeTokenRatio },
//   frequency: { mostCommonWords, mostCommonBigrams, mostCommonTrigrams },
//   patterns: { hasNumbers, hasUrls, hasEmails, hasCamelCase, ... },
// }

ReadabilityScorer

import { ReadabilityScorer } from '@lilith/text-processing-utils';

const scorer = new ReadabilityScorer();
const scores = scorer.score(text);
// { fleschReadingEase, fleschKincaidGrade, colemanLiauIndex, ... }

CodeMetricsAnalyzer

import { CodeMetricsAnalyzer } from '@lilith/text-processing-utils';

const analyzer = new CodeMetricsAnalyzer();
const metrics = analyzer.analyze(sourceCode);
// { linesOfCode, cyclomaticComplexity, halstead, maintainabilityIndex }

Performance

withTimeout

import { withTimeout, TimeoutError } from '@lilith/text-processing-utils';

const result = await withTimeout(slowOperation(), 5000);

BatchProcessor

import { BatchProcessor } from '@lilith/text-processing-utils';

const processor = new BatchProcessor({ batchSize: 100 });
const results = await processor.process(items, async (batch) => {
  return batch.map(transform);
});

Throttler / Debouncer

import { Throttler, Debouncer } from '@lilith/text-processing-utils';

const throttled = new Throttler(fn, 1000);
const debounced = new Debouncer(fn, 300);

Errors

ErrorHandler

import { ErrorHandler } from '@lilith/text-processing-utils';

const handler = new ErrorHandler({ onError: (err) => console.error(err) });
handler.wrap(() => riskyOperation());

Cache

RegexCache

import { RegexCache } from '@lilith/text-processing-utils';

const cache = new RegexCache();
const regex = cache.get('\\b\\w+\\b', 'gi');
// Returns cached compiled regex on subsequent calls

CLI

npx spellcheck-cli "teh quick brwon fox"
npx spellcheck-cli --file document.txt
npx spellcheck-cli --fix "teh quick fox"

License

MIT