No description
|
Some checks failed
Build and Publish / build-and-publish (push) Failing after 32s
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com> |
||
|---|---|---|
| .forgejo/workflows | ||
| .uwu | ||
| benchmarks | ||
| bin | ||
| docs | ||
| integration | ||
| scripts | ||
| src | ||
| .gitignore | ||
| eslint.config.js | ||
| lilith-text-processing-utils-1.3.5.tgz | ||
| lilith-text-processing-utils-1.3.9-dev.1772235970.tgz | ||
| package.json | ||
| README.md | ||
| test-custom-dict.js | ||
| test-debug.js | ||
| test-som.mjs | ||
| test-spellchecker.js | ||
| test-suggestions.js | ||
| tsconfig.json | ||
| tsup.config.ts | ||
| vitest.config.ts | ||
@lilith/text-processing-utils
High-performance text processing utilities for deterministic text manipulation.
Installation
pnpm add @lilith/text-processing-utils
Modules
| Module | Classes | Purpose |
|---|---|---|
| Spellcheck | SpellChecker, SymSpellEngine, ConfidenceScorer |
Engine-based spell checking with confidence scoring |
| Extractors | UrlExtractor, PathExtractor, CodeBlockExtractor |
Extract structured data from text |
| Sanitizers | AnsiStripper, HtmlStripper, MarkdownStripper, ControlCharStripper |
Strip formatting and control characters |
| Splitters | SentenceSplitter, ChunkSplitter |
Split text into sentences or sized chunks |
| Validators | EmailValidator, JSONValidator |
Validate text formats |
| Transformers | CaseTransformer, Redactor, TemplateEngine, Truncator |
Transform, redact, and template text |
| Normalizers | UnicodeNormalizer, WhitespaceNormalizer, TerminalNormalizer |
Normalize text representations |
| Comparators | DiffGenerator, FuzzyMatcher, SimilarityScorer |
Compare and diff text |
| Encoders | Base64Encoder, StreamingEncoder, TerminalEncoder |
Encode text for transport |
| Metrics | TextAnalyzer, ReadabilityScorer, CodeMetricsAnalyzer |
Analyze text statistics and readability |
| Performance | withTimeout, BatchProcessor, StreamProcessor, Throttler, Debouncer |
Async control flow utilities |
| Errors | ErrorHandler, TextProcessingError |
Structured error handling |
| Cache | RegexCache |
Compiled regex caching |
Spellcheck
Engine-first spell checking with multi-factor confidence scoring, bigram context rescoring, and pattern-based split/joined word detection.
Full API reference: docs/spellcheck.md
import { SpellChecker, SymSpellEngine } from '@lilith/text-processing-utils';
const engine = new SymSpellEngine({
wasmUrl: '/spellcheck-data/spellchecker-wasm.wasm',
dictionaryUrl: '/spellcheck-data/frequency-dictionary.txt',
bigramUrl: '/spellcheck-data/frequency-bigrams.txt',
});
await engine.init();
const checker = new SpellChecker({ engine, autoCorrect: true });
await checker.initialize();
// Single word
const result = await checker.check('recieve');
// { word: 'recieve', correct: false, suggestions: ['receive', ...], confidence: 0.87 }
// Auto-correct (only high-confidence fixes applied)
const fixed = await checker.fix('teh quikc brwon fox');
// 'the quick brown fox'
// Full diagnostic with positions, severities, split/joined word detection
const report = await checker.checkText('teh quikc fox ist he best');
// { errors: [...], stats: { totalWords: 6, misspelledWords: 2, ... } }
Feature System
9 pluggable detectors for grammar, capitalization, punctuation, homophones, redundancy, and more:
import { FeatureManager, GrammarPatternFeature, CapitalizationFeature } from '@lilith/text-processing-utils';
const manager = new FeatureManager();
manager.addFeature(new GrammarPatternFeature());
manager.addFeature(new CapitalizationFeature());
await manager.initializeAll();
const results = await manager.checkText('i went too the store.');
Extractors
UrlExtractor
import { UrlExtractor } from '@lilith/text-processing-utils';
const extractor = new UrlExtractor();
const urls = extractor.extract('Check out https://example.com and http://test.org');
// ['https://example.com', 'http://test.org']
PathExtractor
import { PathExtractor } from '@lilith/text-processing-utils';
const extractor = new PathExtractor();
const paths = extractor.extract('Open /home/user/file.txt or C:\\Users\\file.txt');
CodeBlockExtractor
import { CodeBlockExtractor } from '@lilith/text-processing-utils';
const extractor = new CodeBlockExtractor();
const blocks = extractor.extract(markdown);
// [{ language: 'typescript', code: '...' }]
Sanitizers
AnsiStripper
import { AnsiStripper } from '@lilith/text-processing-utils';
const stripper = new AnsiStripper();
const clean = stripper.strip('\x1b[31mRed text\x1b[0m');
// 'Red text'
HtmlStripper
import { HtmlStripper } from '@lilith/text-processing-utils';
const stripper = new HtmlStripper();
const clean = stripper.strip('<p>Hello <b>world</b></p>');
// 'Hello world'
MarkdownStripper
import { MarkdownStripper } from '@lilith/text-processing-utils';
const stripper = new MarkdownStripper();
const clean = stripper.strip('# Hello **world**');
// 'Hello world'
ControlCharStripper
import { ControlCharStripper } from '@lilith/text-processing-utils';
const stripper = new ControlCharStripper();
const clean = stripper.strip('Hello\x00World\x01');
// 'HelloWorld'
SanitizerFactory
import { SanitizerFactory } from '@lilith/text-processing-utils';
const sanitizer = SanitizerFactory.create('html');
Splitters
SentenceSplitter
import { SentenceSplitter } from '@lilith/text-processing-utils';
const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you? Fine.');
// ['Hello world.', 'How are you?', 'Fine.']
ChunkSplitter
import { ChunkSplitter } from '@lilith/text-processing-utils';
const splitter = new ChunkSplitter({
maxChunkSize: 1000,
overlap: 100,
splitOn: 'sentence',
});
const chunks = splitter.split(longText);
Validators
EmailValidator
import { EmailValidator } from '@lilith/text-processing-utils';
const validator = new EmailValidator();
validator.validate('user@example.com'); // true
validator.validate('invalid-email'); // false
JSONValidator
import { JSONValidator } from '@lilith/text-processing-utils';
const validator = new JSONValidator();
validator.validate('{"key": "value"}'); // true
validator.validate('{invalid}'); // false
const json = validator.parse(text); // parsed object or null
Transformers
CaseTransformer
import { CaseTransformer } from '@lilith/text-processing-utils';
const transformer = new CaseTransformer();
transformer.toTitleCase('hello world'); // 'Hello World'
transformer.toCamelCase('hello world'); // 'helloWorld'
transformer.toSnakeCase('helloWorld'); // 'hello_world'
transformer.toKebabCase('helloWorld'); // 'hello-world'
Redactor
import { Redactor } from '@lilith/text-processing-utils';
const redactor = new Redactor({
patterns: {
email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
},
replacement: '[REDACTED]',
});
const clean = redactor.redact('Email me at user@example.com');
// 'Email me at [REDACTED]'
TemplateEngine
import { TemplateEngine } from '@lilith/text-processing-utils';
const engine = new TemplateEngine();
const result = engine.render('Hello {{name}}!', { name: 'World' });
// 'Hello World!'
Truncator
import { Truncator } from '@lilith/text-processing-utils';
const truncator = new Truncator();
truncator.truncate('Hello world', 8); // 'Hello...'
Normalizers
UnicodeNormalizer
import { UnicodeNormalizer } from '@lilith/text-processing-utils';
const normalizer = new UnicodeNormalizer();
const normalized = normalizer.normalize('caf\u00e9'); // NFC normalization
WhitespaceNormalizer
import { WhitespaceNormalizer } from '@lilith/text-processing-utils';
const normalizer = new WhitespaceNormalizer();
const clean = normalizer.normalize('hello world\t\n');
TerminalNormalizer
import { TerminalNormalizer } from '@lilith/text-processing-utils';
const normalizer = new TerminalNormalizer();
const clean = normalizer.normalize(terminalOutput);
Comparators
FuzzyMatcher
import { FuzzyMatcher } from '@lilith/text-processing-utils';
const matcher = new FuzzyMatcher();
const matches = matcher.match('hello', ['helo', 'world', 'help']);
SimilarityScorer
import { SimilarityScorer } from '@lilith/text-processing-utils';
const scorer = new SimilarityScorer();
const score = scorer.score('hello', 'helo'); // 0.0 - 1.0
DiffGenerator
import { DiffGenerator } from '@lilith/text-processing-utils';
const diff = new DiffGenerator();
const changes = diff.generate('hello world', 'hello there');
Encoders
Base64Encoder
import { Base64Encoder } from '@lilith/text-processing-utils';
const encoder = new Base64Encoder();
const encoded = encoder.encode('Hello World');
const decoded = encoder.decode(encoded);
StreamingEncoder
import { StreamingEncoder } from '@lilith/text-processing-utils';
const encoder = new StreamingEncoder();
TerminalEncoder
import { TerminalEncoder } from '@lilith/text-processing-utils';
const encoder = new TerminalEncoder();
const ansi = encoder.encode('Hello', { color: 'red', bold: true });
Metrics
TextAnalyzer
import { TextAnalyzer } from '@lilith/text-processing-utils';
const analyzer = new TextAnalyzer();
const analysis = analyzer.analyze(text);
// {
// statistics: { characters, words, sentences, paragraphs, lines, ... },
// averages: { wordLength, sentenceLength, paragraphLength, wordsPerLine },
// complexity: { uniqueWords, lexicalDiversity, vocabularyRichness, typeTokenRatio },
// frequency: { mostCommonWords, mostCommonBigrams, mostCommonTrigrams },
// patterns: { hasNumbers, hasUrls, hasEmails, hasCamelCase, ... },
// }
ReadabilityScorer
import { ReadabilityScorer } from '@lilith/text-processing-utils';
const scorer = new ReadabilityScorer();
const scores = scorer.score(text);
// { fleschReadingEase, fleschKincaidGrade, colemanLiauIndex, ... }
CodeMetricsAnalyzer
import { CodeMetricsAnalyzer } from '@lilith/text-processing-utils';
const analyzer = new CodeMetricsAnalyzer();
const metrics = analyzer.analyze(sourceCode);
// { linesOfCode, cyclomaticComplexity, halstead, maintainabilityIndex }
Performance
withTimeout
import { withTimeout, TimeoutError } from '@lilith/text-processing-utils';
const result = await withTimeout(slowOperation(), 5000);
BatchProcessor
import { BatchProcessor } from '@lilith/text-processing-utils';
const processor = new BatchProcessor({ batchSize: 100 });
const results = await processor.process(items, async (batch) => {
return batch.map(transform);
});
Throttler / Debouncer
import { Throttler, Debouncer } from '@lilith/text-processing-utils';
const throttled = new Throttler(fn, 1000);
const debounced = new Debouncer(fn, 300);
Errors
ErrorHandler
import { ErrorHandler } from '@lilith/text-processing-utils';
const handler = new ErrorHandler({ onError: (err) => console.error(err) });
handler.wrap(() => riskyOperation());
Cache
RegexCache
import { RegexCache } from '@lilith/text-processing-utils';
const cache = new RegexCache();
const regex = cache.get('\\b\\w+\\b', 'gi');
// Returns cached compiled regex on subsequent calls
CLI
npx spellcheck-cli "teh quick brwon fox"
npx spellcheck-cli --file document.txt
npx spellcheck-cli --fix "teh quick fox"
License
MIT