No description
| .forgejo/workflows | ||
| .uwu | ||
| benchmarks | ||
| bin | ||
| integration | ||
| scripts | ||
| src | ||
| .gitignore | ||
| eslint.config.js | ||
| package.json | ||
| README.md | ||
| test-custom-dict.js | ||
| test-debug.js | ||
| test-som.mjs | ||
| test-spellchecker.js | ||
| test-suggestions.js | ||
| TEST_PLAN.md | ||
| tsconfig.json | ||
| tsup.config.ts | ||
| vitest.config.ts | ||
@lilith/text-processing-utils
High-performance text processing utilities for deterministic text manipulation.
Features
- Extractors: URL, path, code block extraction
- Sanitizers: ANSI stripping, HTML cleaning
- Splitters: Sentence and chunk splitting
- Validators: Email, JSON, URL validation
- Transformers: Case conversion, truncation, redaction, templates
- Spellcheck: Full spell checking with auto-correction
- Performance: Timeout wrappers, complexity checking
- Caching: Regex caching for repeated patterns
Installation
pnpm add @lilith/text-processing-utils
Quick Start
import {
UrlExtractor,
SentenceSplitter,
EmailValidator,
SpellChecker,
} from '@lilith/text-processing-utils';
// Extract URLs
const extractor = new UrlExtractor();
const urls = extractor.extract('Visit https://example.com for more');
// Split sentences
const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you?');
// Validate email
const validator = new EmailValidator();
const isValid = validator.validate('user@example.com');
// Spellcheck
const checker = new SpellChecker();
const result = checker.check('teh quick brwon fox');
Extractors
UrlExtractor
Extract URLs from text:
import { UrlExtractor } from '@lilith/text-processing-utils';
const extractor = new UrlExtractor();
const urls = extractor.extract('Check out https://example.com and http://test.org');
// ['https://example.com', 'http://test.org']
PathExtractor
Extract file paths:
import { PathExtractor } from '@lilith/text-processing-utils';
const extractor = new PathExtractor();
const paths = extractor.extract('Open /home/user/file.txt or C:\\Users\\file.txt');
CodeBlockExtractor
Extract code blocks from markdown:
import { CodeBlockExtractor } from '@lilith/text-processing-utils';
const extractor = new CodeBlockExtractor();
const blocks = extractor.extract(markdown);
// [{ language: 'typescript', code: '...' }]
Sanitizers
AnsiStripper
Remove ANSI escape codes:
import { AnsiStripper } from '@lilith/text-processing-utils';
const stripper = new AnsiStripper();
const clean = stripper.strip('\x1b[31mRed text\x1b[0m');
// 'Red text'
Splitters
SentenceSplitter
Split text into sentences:
import { SentenceSplitter } from '@lilith/text-processing-utils';
const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you? I am fine.');
// ['Hello world.', 'How are you?', 'I am fine.']
ChunkSplitter
Split text into chunks with configurable size:
import { ChunkSplitter } from '@lilith/text-processing-utils';
const splitter = new ChunkSplitter({
maxChunkSize: 1000,
overlap: 100,
splitOn: 'sentence', // 'character' | 'word' | 'sentence' | 'paragraph'
});
const chunks = splitter.split(longText);
Validators
EmailValidator
import { EmailValidator } from '@lilith/text-processing-utils';
const validator = new EmailValidator();
validator.validate('user@example.com'); // true
validator.validate('invalid-email'); // false
JSONValidator
import { JSONValidator } from '@lilith/text-processing-utils';
const validator = new JSONValidator();
validator.validate('{"key": "value"}'); // true
validator.validate('{invalid}'); // false
// Get parsed JSON or null
const json = validator.parse(text);
Transformers
CaseTransformer
Convert text case:
import { CaseTransformer } from '@lilith/text-processing-utils';
const transformer = new CaseTransformer();
transformer.toUpperCase('hello'); // 'HELLO'
transformer.toLowerCase('HELLO'); // 'hello'
transformer.toTitleCase('hello world'); // 'Hello World'
transformer.toCamelCase('hello world'); // 'helloWorld'
transformer.toSnakeCase('helloWorld'); // 'hello_world'
transformer.toKebabCase('helloWorld'); // 'hello-world'
Truncator
Truncate text with ellipsis:
import { Truncator } from '@lilith/text-processing-utils';
const truncator = new Truncator();
truncator.truncate('Hello world', 8); // 'Hello...'
truncator.truncate('Hello world', 8, { suffix: '…' }); // 'Hello wo…'
Redactor
Redact sensitive information:
import { Redactor } from '@lilith/text-processing-utils';
const redactor = new Redactor({
patterns: {
email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
},
replacement: '[REDACTED]',
});
const clean = redactor.redact('Email me at user@example.com');
// 'Email me at [REDACTED]'
TemplateEngine
Simple template interpolation:
import { TemplateEngine } from '@lilith/text-processing-utils';
const engine = new TemplateEngine();
const result = engine.render('Hello {{name}}!', { name: 'World' });
// 'Hello World!'
Spellcheck
SpellChecker
Full-featured spell checker:
import { SpellChecker } from '@lilith/text-processing-utils';
const checker = new SpellChecker({
language: 'en',
customDictionary: ['myword', 'anotherword'],
});
// Check text
const result = checker.check('teh quick brwon fox');
// {
// errors: [
// { word: 'teh', suggestions: ['the'], offset: 0 },
// { word: 'brwon', suggestions: ['brown'], offset: 10 }
// ]
// }
// Get suggestions
const suggestions = checker.suggest('teh');
// ['the', 'tea', 'ten', ...]
AutoCorrector
Automatic correction:
import { AutoCorrector } from '@lilith/text-processing-utils';
const corrector = new AutoCorrector({
maxDistance: 2,
minConfidence: 0.8,
});
const corrected = corrector.correct('teh quick brwon fox');
// 'the quick brown fox'
ContextualCorrector
Context-aware correction using surrounding words:
import { ContextualCorrector } from '@lilith/text-processing-utils';
const corrector = new ContextualCorrector();
const corrected = corrector.correct('I went to teh store');
// Uses context to improve suggestions
SplitWordDetector
Detect and fix split words:
import { SplitWordDetector } from '@lilith/text-processing-utils';
const detector = new SplitWordDetector();
const fixed = detector.fix('some thing went wr ong');
// 'something went wrong'
Performance
withTimeout
Wrap operations with timeout:
import { withTimeout, TimeoutError } from '@lilith/text-processing-utils';
const result = await withTimeout(
slowOperation(),
5000, // 5 second timeout
);
ComplexityChecker
Check text complexity:
import { ComplexityChecker } from '@lilith/text-processing-utils';
const checker = new ComplexityChecker();
const complexity = checker.analyze(text);
// {
// wordCount: 150,
// sentenceCount: 10,
// avgWordsPerSentence: 15,
// fleschReadingEase: 65,
// gradeLevel: 8.5,
// }
Caching
RegexCache
Cache compiled regex patterns:
import { RegexCache } from '@lilith/text-processing-utils';
const cache = new RegexCache();
const regex = cache.get('\\b\\w+\\b', 'gi');
// Returns cached regex on subsequent calls
CLI
Spellcheck CLI for command-line use:
npx spellcheck-cli "teh quick brwon fox"
# Output: Errors found: 'teh' (suggestions: the), 'brwon' (suggestions: brown)
npx spellcheck-cli --file document.txt
npx spellcheck-cli --fix "teh quick fox"
# Output: the quick fox
Metrics
Text metrics and analytics:
import { TextMetrics } from '@lilith/text-processing-utils';
const metrics = new TextMetrics();
const stats = metrics.analyze(text);
// {
// characters: 1000,
// words: 200,
// sentences: 15,
// paragraphs: 5,
// uniqueWords: 120,
// avgWordLength: 4.5,
// }
License
MIT