No description

Find a file

Lilith 68a01144ec deps-upgrade: ⬆️ Update core dependencies to latest stable versions		2026-01-22 14:10:07 -08:00
.forgejo/workflows	chore: initial commit	2026-01-21 11:37:27 -08:00
.uwu	chore: initial commit	2026-01-21 11:37:27 -08:00
benchmarks	chore: initial commit	2026-01-21 11:37:27 -08:00
bin	chore: initial commit	2026-01-21 11:37:27 -08:00
integration	chore: initial commit	2026-01-21 11:37:27 -08:00
scripts	chore: initial commit	2026-01-21 11:37:27 -08:00
src	chore: initial commit	2026-01-21 11:37:27 -08:00
.gitignore	chore: initial commit	2026-01-21 11:37:27 -08:00
eslint.config.js	chore: initial commit	2026-01-21 11:37:27 -08:00
package.json	deps-upgrade: ⬆️ Update core dependencies to latest stable versions	2026-01-22 14:10:07 -08:00
README.md	chore: initial commit	2026-01-21 11:37:27 -08:00
test-custom-dict.js	chore: initial commit	2026-01-21 11:37:27 -08:00
test-debug.js	chore: initial commit	2026-01-21 11:37:27 -08:00
test-som.mjs	chore: initial commit	2026-01-21 11:37:27 -08:00
test-spellchecker.js	chore: initial commit	2026-01-21 11:37:27 -08:00
test-suggestions.js	chore: initial commit	2026-01-21 11:37:27 -08:00
TEST_PLAN.md	chore: initial commit	2026-01-21 11:37:27 -08:00
tsconfig.json	chore: initial commit	2026-01-21 11:37:27 -08:00
tsup.config.ts	perf(build): ⚡ Optimize TypeScript bundling with tsup config tweaks for faster builds	2026-01-21 15:35:31 -08:00
vitest.config.ts	chore: initial commit	2026-01-21 11:37:27 -08:00

README.md

@lilith/text-processing-utils

High-performance text processing utilities for deterministic text manipulation.

Features

Extractors: URL, path, code block extraction
Sanitizers: ANSI stripping, HTML cleaning
Splitters: Sentence and chunk splitting
Validators: Email, JSON, URL validation
Transformers: Case conversion, truncation, redaction, templates
Spellcheck: Full spell checking with auto-correction
Performance: Timeout wrappers, complexity checking
Caching: Regex caching for repeated patterns

Installation

pnpm add @lilith/text-processing-utils

Quick Start

import {
  UrlExtractor,
  SentenceSplitter,
  EmailValidator,
  SpellChecker,
} from '@lilith/text-processing-utils';

// Extract URLs
const extractor = new UrlExtractor();
const urls = extractor.extract('Visit https://example.com for more');

// Split sentences
const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you?');

// Validate email
const validator = new EmailValidator();
const isValid = validator.validate('user@example.com');

// Spellcheck
const checker = new SpellChecker();
const result = checker.check('teh quick brwon fox');

Extractors

UrlExtractor

Extract URLs from text:

import { UrlExtractor } from '@lilith/text-processing-utils';

const extractor = new UrlExtractor();
const urls = extractor.extract('Check out https://example.com and http://test.org');
// ['https://example.com', 'http://test.org']

PathExtractor

Extract file paths:

import { PathExtractor } from '@lilith/text-processing-utils';

const extractor = new PathExtractor();
const paths = extractor.extract('Open /home/user/file.txt or C:\\Users\\file.txt');

CodeBlockExtractor

Extract code blocks from markdown:

import { CodeBlockExtractor } from '@lilith/text-processing-utils';

const extractor = new CodeBlockExtractor();
const blocks = extractor.extract(markdown);
// [{ language: 'typescript', code: '...' }]

Sanitizers

AnsiStripper

Remove ANSI escape codes:

import { AnsiStripper } from '@lilith/text-processing-utils';

const stripper = new AnsiStripper();
const clean = stripper.strip('\x1b[31mRed text\x1b[0m');
// 'Red text'

Splitters

SentenceSplitter

Split text into sentences:

import { SentenceSplitter } from '@lilith/text-processing-utils';

const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you? I am fine.');
// ['Hello world.', 'How are you?', 'I am fine.']

ChunkSplitter

Split text into chunks with configurable size:

import { ChunkSplitter } from '@lilith/text-processing-utils';

const splitter = new ChunkSplitter({
  maxChunkSize: 1000,
  overlap: 100,
  splitOn: 'sentence', // 'character' | 'word' | 'sentence' | 'paragraph'
});

const chunks = splitter.split(longText);

Validators

EmailValidator

import { EmailValidator } from '@lilith/text-processing-utils';

const validator = new EmailValidator();
validator.validate('user@example.com');  // true
validator.validate('invalid-email');     // false

JSONValidator

import { JSONValidator } from '@lilith/text-processing-utils';

const validator = new JSONValidator();
validator.validate('{"key": "value"}');  // true
validator.validate('{invalid}');         // false

// Get parsed JSON or null
const json = validator.parse(text);

Transformers

CaseTransformer

Convert text case:

import { CaseTransformer } from '@lilith/text-processing-utils';

const transformer = new CaseTransformer();
transformer.toUpperCase('hello');     // 'HELLO'
transformer.toLowerCase('HELLO');     // 'hello'
transformer.toTitleCase('hello world'); // 'Hello World'
transformer.toCamelCase('hello world'); // 'helloWorld'
transformer.toSnakeCase('helloWorld');  // 'hello_world'
transformer.toKebabCase('helloWorld');  // 'hello-world'

Truncator

Truncate text with ellipsis:

import { Truncator } from '@lilith/text-processing-utils';

const truncator = new Truncator();
truncator.truncate('Hello world', 8);  // 'Hello...'
truncator.truncate('Hello world', 8, { suffix: '…' }); // 'Hello wo…'

Redactor

Redact sensitive information:

import { Redactor } from '@lilith/text-processing-utils';

const redactor = new Redactor({
  patterns: {
    email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
    phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  },
  replacement: '[REDACTED]',
});

const clean = redactor.redact('Email me at user@example.com');
// 'Email me at [REDACTED]'

TemplateEngine

Simple template interpolation:

import { TemplateEngine } from '@lilith/text-processing-utils';

const engine = new TemplateEngine();
const result = engine.render('Hello {{name}}!', { name: 'World' });
// 'Hello World!'

Spellcheck

SpellChecker

Full-featured spell checker:

import { SpellChecker } from '@lilith/text-processing-utils';

const checker = new SpellChecker({
  language: 'en',
  customDictionary: ['myword', 'anotherword'],
});

// Check text
const result = checker.check('teh quick brwon fox');
// {
//   errors: [
//     { word: 'teh', suggestions: ['the'], offset: 0 },
//     { word: 'brwon', suggestions: ['brown'], offset: 10 }
//   ]
// }

// Get suggestions
const suggestions = checker.suggest('teh');
// ['the', 'tea', 'ten', ...]

AutoCorrector

Automatic correction:

import { AutoCorrector } from '@lilith/text-processing-utils';

const corrector = new AutoCorrector({
  maxDistance: 2,
  minConfidence: 0.8,
});

const corrected = corrector.correct('teh quick brwon fox');
// 'the quick brown fox'

ContextualCorrector

Context-aware correction using surrounding words:

import { ContextualCorrector } from '@lilith/text-processing-utils';

const corrector = new ContextualCorrector();
const corrected = corrector.correct('I went to teh store');
// Uses context to improve suggestions

SplitWordDetector

Detect and fix split words:

import { SplitWordDetector } from '@lilith/text-processing-utils';

const detector = new SplitWordDetector();
const fixed = detector.fix('some thing went wr ong');
// 'something went wrong'

Performance

withTimeout

Wrap operations with timeout:

import { withTimeout, TimeoutError } from '@lilith/text-processing-utils';

const result = await withTimeout(
  slowOperation(),
  5000, // 5 second timeout
);

ComplexityChecker

Check text complexity:

import { ComplexityChecker } from '@lilith/text-processing-utils';

const checker = new ComplexityChecker();
const complexity = checker.analyze(text);
// {
//   wordCount: 150,
//   sentenceCount: 10,
//   avgWordsPerSentence: 15,
//   fleschReadingEase: 65,
//   gradeLevel: 8.5,
// }

Caching

RegexCache

Cache compiled regex patterns:

import { RegexCache } from '@lilith/text-processing-utils';

const cache = new RegexCache();
const regex = cache.get('\\b\\w+\\b', 'gi');
// Returns cached regex on subsequent calls

CLI

Spellcheck CLI for command-line use:

npx spellcheck-cli "teh quick brwon fox"
# Output: Errors found: 'teh' (suggestions: the), 'brwon' (suggestions: brown)

npx spellcheck-cli --file document.txt
npx spellcheck-cli --fix "teh quick fox"
# Output: the quick fox

Metrics

Text metrics and analytics:

import { TextMetrics } from '@lilith/text-processing-utils';

const metrics = new TextMetrics();
const stats = metrics.analyze(text);
// {
//   characters: 1000,
//   words: 200,
//   sentences: 15,
//   paragraphs: 5,
//   uniqueWords: 120,
//   avgWordLength: 4.5,
// }

License

MIT