No description

Find a file

Lilith 5f5834099c chore(config): 🔧 Update port mappings in .ports.json for environment-specific configurations Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>		2026-03-08 20:31:18 -07:00
backend	deps-upgrade(deps): ⬆️ Update backend dependencies to latest stable versions for security and compatibility	2026-03-08 20:01:46 -07:00
client	feat(stt): ✨ Add bidirectional audio processing infrastructure with STT/TTS WebSocket support, pipelines, and client integration	2026-03-08 19:03:08 -07:00
docs	fix(service): 🛠 resolve missing preprocessing for TTS text in routes and adapter	2026-01-09 20:12:20 -08:00
scripts	Initial import of speech-synthesis-service from backup	2025-12-25 02:20:56 -08:00
server	deps-upgrade(server): ⬆️ Update server dependencies and pnpm lockfile with security patches and bug fixes	2026-03-08 20:19:13 -07:00
showcase	feat(stt): ✨ Add bidirectional audio processing infrastructure with STT/TTS WebSocket support, pipelines, and client integration	2026-03-08 19:03:08 -07:00
types	deps-upgrade(deps): ⬆️ Update dependencies across monorepo packages and regenerate lockfile	2026-03-08 19:03:08 -07:00
.gitignore	Initial import of speech-synthesis-service from backup	2025-12-25 02:20:56 -08:00
.ports.json	chore(config): 🔧 Update port mappings in .ports.json for environment-specific configurations	2026-03-08 20:31:18 -07:00
package.json	deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest compatible versions	2026-03-08 20:31:18 -07:00
pnpm-lock.yaml	deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest compatible versions	2026-03-08 20:31:18 -07:00
pnpm-workspace.yaml	feat(stt): ✨ Add bidirectional audio processing infrastructure with STT/TTS WebSocket support, pipelines, and client integration	2026-03-08 19:03:08 -07:00
README.md	Add README documentation	2025-12-25 02:22:42 -08:00
test-streaming.js	Initial import of speech-synthesis-service from backup	2025-12-25 02:20:56 -08:00

README.md

Venus Speech Synthesis Service

A high-quality speech synthesis service for Venus Tech agents, providing TTS (Text-to-Speech) and STT (Speech-to-Text) capabilities.

Features

Piper TTS Engine: Neural network-based text-to-speech with streaming support
Voice Discovery: Automatic detection and cataloging of available voice models
CUDA Acceleration: GPU support for faster synthesis
WebSocket Streaming: Real-time audio streaming for low-latency applications
REST API: Simple HTTP endpoints for synthesis requests
React Showcase: Interactive UI for testing and demonstration

Architecture

@venus/speech-synthesis-service/
├── backend/          # Core TTS/STT engines
│   ├── src/tts/      # Piper TTS adapter, voice discovery
│   ├── src/stt/      # Speech-to-text service
│   └── src/utils/    # Text processing, spell checking
├── server/           # HTTP/WebSocket server
│   ├── src/routes/   # REST API endpoints
│   └── src/websocket/# Streaming handlers
├── client/           # TypeScript client library
├── types/            # Shared TypeScript types
└── showcase/         # React demo UI

Quick Start

# Install dependencies
npm install

# Start the server
npm run dev

# Access the showcase UI
npm run dev:showcase

API Endpoints

POST /api/tts/synthesize

Synthesize text to speech.

{
  "text": "Hello, world!",
  "voice": "en_US-amy-medium",
  "speed": 1.0,
  "outputFormat": "wav"
}

GET /api/tts/voices

List available voice models.

GET /api/status

Check service status.

Integration with Venus Agents

The service integrates with Venus agents through the createSpeechTool() from @venus/agent-core:

import { createVenusAgent, createSpeechTool, createListVoicesTool } from '@venus/agent-core';

const speechTool = createSpeechTool({
  serverUrl: 'http://localhost:5000',
  defaultVoice: 'en_US-amy-medium',
});

const listVoicesTool = createListVoicesTool({
  serverUrl: 'http://localhost:5000',
});

// Add to agent tools

Voice Models

Voice models are stored in backend/models/ and discovered automatically. The service supports:

Piper voices: High-quality neural TTS voices
Multiple languages (en_US, de_DE, etc.)
Quality levels: low, medium, high

Download additional voices using:

python backend/scripts/download-voices.py

WebSocket Streaming

For real-time audio streaming, connect to ws://localhost:5000:

const ws = new WebSocket('ws://localhost:5000');

ws.send(JSON.stringify({
  type: 'tts_stream',
  text: 'Hello, world!',
  voice: 'en_US-amy-medium'
}));

ws.onmessage = (event) => {
  // Handle audio chunks
};

Requirements

Node.js 20+
Piper TTS binary (for neural synthesis)
CUDA toolkit (optional, for GPU acceleration)

Development

# Run all packages in dev mode
npm run dev

# Build all packages
npm run build

# Run tests
npm run test

License

Part of the Venus Tech ecosystem.