No description
Find a file
Lilith 6dba01fe2a deps-upgrade(root): ⬆️ Update dependencies to latest minor/patch versions in package.json
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-02-26 19:10:02 -08:00
.forgejo/workflows chore: initial commit 2026-01-21 11:37:27 -08:00
.uwu chore: initial commit 2026-01-21 11:37:27 -08:00
benchmarks chore: initial commit 2026-01-21 11:37:27 -08:00
bin chore: initial commit 2026-01-21 11:37:27 -08:00
integration chore: initial commit 2026-01-21 11:37:27 -08:00
scripts chore: initial commit 2026-01-21 11:37:27 -08:00
src feat(spellcheck): Enhance spelling suggestion ranking logic for improved candidate filtering 2026-02-26 19:03:56 -08:00
.gitignore chore: initial commit 2026-01-21 11:37:27 -08:00
eslint.config.js chore: initial commit 2026-01-21 11:37:27 -08:00
package.json deps-upgrade(root): ⬆️ Update dependencies to latest minor/patch versions in package.json 2026-02-26 19:10:02 -08:00
README.md chore: trigger CI publish 2026-01-30 11:56:33 -08:00
test-custom-dict.js chore: initial commit 2026-01-21 11:37:27 -08:00
test-debug.js chore: initial commit 2026-01-21 11:37:27 -08:00
test-som.mjs chore: initial commit 2026-01-21 11:37:27 -08:00
test-spellchecker.js chore: initial commit 2026-01-21 11:37:27 -08:00
test-suggestions.js chore: initial commit 2026-01-21 11:37:27 -08:00
TEST_PLAN.md chore: initial commit 2026-01-21 11:37:27 -08:00
tsconfig.json chore: initial commit 2026-01-21 11:37:27 -08:00
tsup.config.ts perf(build): Optimize TypeScript bundling with tsup config tweaks for faster builds 2026-01-21 15:35:31 -08:00
vitest.config.ts chore: initial commit 2026-01-21 11:37:27 -08:00

@lilith/text-processing-utils

High-performance text processing utilities for deterministic text manipulation.

Features

  • Extractors: URL, path, code block extraction
  • Sanitizers: ANSI stripping, HTML cleaning
  • Splitters: Sentence and chunk splitting
  • Validators: Email, JSON, URL validation
  • Transformers: Case conversion, truncation, redaction, templates
  • Spellcheck: Full spell checking with auto-correction
  • Performance: Timeout wrappers, complexity checking
  • Caching: Regex caching for repeated patterns

Installation

pnpm add @lilith/text-processing-utils

Quick Start

import {
  UrlExtractor,
  SentenceSplitter,
  EmailValidator,
  SpellChecker,
} from '@lilith/text-processing-utils';

// Extract URLs
const extractor = new UrlExtractor();
const urls = extractor.extract('Visit https://example.com for more');

// Split sentences
const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you?');

// Validate email
const validator = new EmailValidator();
const isValid = validator.validate('user@example.com');

// Spellcheck
const checker = new SpellChecker();
const result = checker.check('teh quick brwon fox');

Extractors

UrlExtractor

Extract URLs from text:

import { UrlExtractor } from '@lilith/text-processing-utils';

const extractor = new UrlExtractor();
const urls = extractor.extract('Check out https://example.com and http://test.org');
// ['https://example.com', 'http://test.org']

PathExtractor

Extract file paths:

import { PathExtractor } from '@lilith/text-processing-utils';

const extractor = new PathExtractor();
const paths = extractor.extract('Open /home/user/file.txt or C:\\Users\\file.txt');

CodeBlockExtractor

Extract code blocks from markdown:

import { CodeBlockExtractor } from '@lilith/text-processing-utils';

const extractor = new CodeBlockExtractor();
const blocks = extractor.extract(markdown);
// [{ language: 'typescript', code: '...' }]

Sanitizers

AnsiStripper

Remove ANSI escape codes:

import { AnsiStripper } from '@lilith/text-processing-utils';

const stripper = new AnsiStripper();
const clean = stripper.strip('\x1b[31mRed text\x1b[0m');
// 'Red text'

Splitters

SentenceSplitter

Split text into sentences:

import { SentenceSplitter } from '@lilith/text-processing-utils';

const splitter = new SentenceSplitter();
const sentences = splitter.split('Hello world. How are you? I am fine.');
// ['Hello world.', 'How are you?', 'I am fine.']

ChunkSplitter

Split text into chunks with configurable size:

import { ChunkSplitter } from '@lilith/text-processing-utils';

const splitter = new ChunkSplitter({
  maxChunkSize: 1000,
  overlap: 100,
  splitOn: 'sentence', // 'character' | 'word' | 'sentence' | 'paragraph'
});

const chunks = splitter.split(longText);

Validators

EmailValidator

import { EmailValidator } from '@lilith/text-processing-utils';

const validator = new EmailValidator();
validator.validate('user@example.com');  // true
validator.validate('invalid-email');     // false

JSONValidator

import { JSONValidator } from '@lilith/text-processing-utils';

const validator = new JSONValidator();
validator.validate('{"key": "value"}');  // true
validator.validate('{invalid}');         // false

// Get parsed JSON or null
const json = validator.parse(text);

Transformers

CaseTransformer

Convert text case:

import { CaseTransformer } from '@lilith/text-processing-utils';

const transformer = new CaseTransformer();
transformer.toUpperCase('hello');     // 'HELLO'
transformer.toLowerCase('HELLO');     // 'hello'
transformer.toTitleCase('hello world'); // 'Hello World'
transformer.toCamelCase('hello world'); // 'helloWorld'
transformer.toSnakeCase('helloWorld');  // 'hello_world'
transformer.toKebabCase('helloWorld');  // 'hello-world'

Truncator

Truncate text with ellipsis:

import { Truncator } from '@lilith/text-processing-utils';

const truncator = new Truncator();
truncator.truncate('Hello world', 8);  // 'Hello...'
truncator.truncate('Hello world', 8, { suffix: '…' }); // 'Hello wo…'

Redactor

Redact sensitive information:

import { Redactor } from '@lilith/text-processing-utils';

const redactor = new Redactor({
  patterns: {
    email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
    phone: /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  },
  replacement: '[REDACTED]',
});

const clean = redactor.redact('Email me at user@example.com');
// 'Email me at [REDACTED]'

TemplateEngine

Simple template interpolation:

import { TemplateEngine } from '@lilith/text-processing-utils';

const engine = new TemplateEngine();
const result = engine.render('Hello {{name}}!', { name: 'World' });
// 'Hello World!'

Spellcheck

SpellChecker

Full-featured spell checker:

import { SpellChecker } from '@lilith/text-processing-utils';

const checker = new SpellChecker({
  language: 'en',
  customDictionary: ['myword', 'anotherword'],
});

// Check text
const result = checker.check('teh quick brwon fox');
// {
//   errors: [
//     { word: 'teh', suggestions: ['the'], offset: 0 },
//     { word: 'brwon', suggestions: ['brown'], offset: 10 }
//   ]
// }

// Get suggestions
const suggestions = checker.suggest('teh');
// ['the', 'tea', 'ten', ...]

AutoCorrector

Automatic correction:

import { AutoCorrector } from '@lilith/text-processing-utils';

const corrector = new AutoCorrector({
  maxDistance: 2,
  minConfidence: 0.8,
});

const corrected = corrector.correct('teh quick brwon fox');
// 'the quick brown fox'

ContextualCorrector

Context-aware correction using surrounding words:

import { ContextualCorrector } from '@lilith/text-processing-utils';

const corrector = new ContextualCorrector();
const corrected = corrector.correct('I went to teh store');
// Uses context to improve suggestions

SplitWordDetector

Detect and fix split words:

import { SplitWordDetector } from '@lilith/text-processing-utils';

const detector = new SplitWordDetector();
const fixed = detector.fix('some thing went wr ong');
// 'something went wrong'

Performance

withTimeout

Wrap operations with timeout:

import { withTimeout, TimeoutError } from '@lilith/text-processing-utils';

const result = await withTimeout(
  slowOperation(),
  5000, // 5 second timeout
);

ComplexityChecker

Check text complexity:

import { ComplexityChecker } from '@lilith/text-processing-utils';

const checker = new ComplexityChecker();
const complexity = checker.analyze(text);
// {
//   wordCount: 150,
//   sentenceCount: 10,
//   avgWordsPerSentence: 15,
//   fleschReadingEase: 65,
//   gradeLevel: 8.5,
// }

Caching

RegexCache

Cache compiled regex patterns:

import { RegexCache } from '@lilith/text-processing-utils';

const cache = new RegexCache();
const regex = cache.get('\\b\\w+\\b', 'gi');
// Returns cached regex on subsequent calls

CLI

Spellcheck CLI for command-line use:

npx spellcheck-cli "teh quick brwon fox"
# Output: Errors found: 'teh' (suggestions: the), 'brwon' (suggestions: brown)

npx spellcheck-cli --file document.txt
npx spellcheck-cli --fix "teh quick fox"
# Output: the quick fox

Metrics

Text metrics and analytics:

import { TextMetrics } from '@lilith/text-processing-utils';

const metrics = new TextMetrics();
const stats = metrics.analyze(text);
// {
//   characters: 1000,
//   words: 200,
//   sentences: 15,
//   paragraphs: 5,
//   uniqueWords: 120,
//   avgWordLength: 4.5,
// }

License

MIT