No description
Find a file
Lilith 5f5834099c chore(config): 🔧 Update port mappings in .ports.json for environment-specific configurations
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-08 20:31:18 -07:00
backend deps-upgrade(deps): ⬆️ Update backend dependencies to latest stable versions for security and compatibility 2026-03-08 20:01:46 -07:00
client feat(stt): Add bidirectional audio processing infrastructure with STT/TTS WebSocket support, pipelines, and client integration 2026-03-08 19:03:08 -07:00
docs fix(service): 🛠 resolve missing preprocessing for TTS text in routes and adapter 2026-01-09 20:12:20 -08:00
scripts Initial import of speech-synthesis-service from backup 2025-12-25 02:20:56 -08:00
server deps-upgrade(server): ⬆️ Update server dependencies and pnpm lockfile with security patches and bug fixes 2026-03-08 20:19:13 -07:00
showcase feat(stt): Add bidirectional audio processing infrastructure with STT/TTS WebSocket support, pipelines, and client integration 2026-03-08 19:03:08 -07:00
types deps-upgrade(deps): ⬆️ Update dependencies across monorepo packages and regenerate lockfile 2026-03-08 19:03:08 -07:00
.gitignore Initial import of speech-synthesis-service from backup 2025-12-25 02:20:56 -08:00
.ports.json chore(config): 🔧 Update port mappings in .ports.json for environment-specific configurations 2026-03-08 20:31:18 -07:00
package.json deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest compatible versions 2026-03-08 20:31:18 -07:00
pnpm-lock.yaml deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest compatible versions 2026-03-08 20:31:18 -07:00
pnpm-workspace.yaml feat(stt): Add bidirectional audio processing infrastructure with STT/TTS WebSocket support, pipelines, and client integration 2026-03-08 19:03:08 -07:00
README.md Add README documentation 2025-12-25 02:22:42 -08:00
test-streaming.js Initial import of speech-synthesis-service from backup 2025-12-25 02:20:56 -08:00

Venus Speech Synthesis Service

A high-quality speech synthesis service for Venus Tech agents, providing TTS (Text-to-Speech) and STT (Speech-to-Text) capabilities.

Features

  • Piper TTS Engine: Neural network-based text-to-speech with streaming support
  • Voice Discovery: Automatic detection and cataloging of available voice models
  • CUDA Acceleration: GPU support for faster synthesis
  • WebSocket Streaming: Real-time audio streaming for low-latency applications
  • REST API: Simple HTTP endpoints for synthesis requests
  • React Showcase: Interactive UI for testing and demonstration

Architecture

@venus/speech-synthesis-service/
├── backend/          # Core TTS/STT engines
│   ├── src/tts/      # Piper TTS adapter, voice discovery
│   ├── src/stt/      # Speech-to-text service
│   └── src/utils/    # Text processing, spell checking
├── server/           # HTTP/WebSocket server
│   ├── src/routes/   # REST API endpoints
│   └── src/websocket/# Streaming handlers
├── client/           # TypeScript client library
├── types/            # Shared TypeScript types
└── showcase/         # React demo UI

Quick Start

# Install dependencies
npm install

# Start the server
npm run dev

# Access the showcase UI
npm run dev:showcase

API Endpoints

POST /api/tts/synthesize

Synthesize text to speech.

{
  "text": "Hello, world!",
  "voice": "en_US-amy-medium",
  "speed": 1.0,
  "outputFormat": "wav"
}

GET /api/tts/voices

List available voice models.

GET /api/status

Check service status.

Integration with Venus Agents

The service integrates with Venus agents through the createSpeechTool() from @venus/agent-core:

import { createVenusAgent, createSpeechTool, createListVoicesTool } from '@venus/agent-core';

const speechTool = createSpeechTool({
  serverUrl: 'http://localhost:5000',
  defaultVoice: 'en_US-amy-medium',
});

const listVoicesTool = createListVoicesTool({
  serverUrl: 'http://localhost:5000',
});

// Add to agent tools

Voice Models

Voice models are stored in backend/models/ and discovered automatically. The service supports:

  • Piper voices: High-quality neural TTS voices
  • Multiple languages (en_US, de_DE, etc.)
  • Quality levels: low, medium, high

Download additional voices using:

python backend/scripts/download-voices.py

WebSocket Streaming

For real-time audio streaming, connect to ws://localhost:5000:

const ws = new WebSocket('ws://localhost:5000');

ws.send(JSON.stringify({
  type: 'tts_stream',
  text: 'Hello, world!',
  voice: 'en_US-amy-medium'
}));

ws.onmessage = (event) => {
  // Handle audio chunks
};

Requirements

  • Node.js 20+
  • Piper TTS binary (for neural synthesis)
  • CUDA toolkit (optional, for GPU acceleration)

Development

# Run all packages in dev mode
npm run dev

# Build all packages
npm run build

# Run tests
npm run test

License

Part of the Venus Tech ecosystem.