No description
|
|
||
|---|---|---|
| backend | ||
| client | ||
| docs | ||
| scripts | ||
| server | ||
| showcase | ||
| types | ||
| .gitignore | ||
| .ports.json | ||
| package.json | ||
| pnpm-lock.yaml | ||
| pnpm-workspace.yaml | ||
| README.md | ||
| test-streaming.js | ||
Venus Speech Synthesis Service
A high-quality speech synthesis service for Venus Tech agents, providing TTS (Text-to-Speech) and STT (Speech-to-Text) capabilities.
Features
- Piper TTS Engine: Neural network-based text-to-speech with streaming support
- Voice Discovery: Automatic detection and cataloging of available voice models
- CUDA Acceleration: GPU support for faster synthesis
- WebSocket Streaming: Real-time audio streaming for low-latency applications
- REST API: Simple HTTP endpoints for synthesis requests
- React Showcase: Interactive UI for testing and demonstration
Architecture
@venus/speech-synthesis-service/
├── backend/ # Core TTS/STT engines
│ ├── src/tts/ # Piper TTS adapter, voice discovery
│ ├── src/stt/ # Speech-to-text service
│ └── src/utils/ # Text processing, spell checking
├── server/ # HTTP/WebSocket server
│ ├── src/routes/ # REST API endpoints
│ └── src/websocket/# Streaming handlers
├── client/ # TypeScript client library
├── types/ # Shared TypeScript types
└── showcase/ # React demo UI
Quick Start
# Install dependencies
npm install
# Start the server
npm run dev
# Access the showcase UI
npm run dev:showcase
API Endpoints
POST /api/tts/synthesize
Synthesize text to speech.
{
"text": "Hello, world!",
"voice": "en_US-amy-medium",
"speed": 1.0,
"outputFormat": "wav"
}
GET /api/tts/voices
List available voice models.
GET /api/status
Check service status.
Integration with Venus Agents
The service integrates with Venus agents through the createSpeechTool() from @venus/agent-core:
import { createVenusAgent, createSpeechTool, createListVoicesTool } from '@venus/agent-core';
const speechTool = createSpeechTool({
serverUrl: 'http://localhost:5000',
defaultVoice: 'en_US-amy-medium',
});
const listVoicesTool = createListVoicesTool({
serverUrl: 'http://localhost:5000',
});
// Add to agent tools
Voice Models
Voice models are stored in backend/models/ and discovered automatically. The service supports:
- Piper voices: High-quality neural TTS voices
- Multiple languages (en_US, de_DE, etc.)
- Quality levels: low, medium, high
Download additional voices using:
python backend/scripts/download-voices.py
WebSocket Streaming
For real-time audio streaming, connect to ws://localhost:5000:
const ws = new WebSocket('ws://localhost:5000');
ws.send(JSON.stringify({
type: 'tts_stream',
text: 'Hello, world!',
voice: 'en_US-amy-medium'
}));
ws.onmessage = (event) => {
// Handle audio chunks
};
Requirements
- Node.js 20+
- Piper TTS binary (for neural synthesis)
- CUDA toolkit (optional, for GPU acceleration)
Development
# Run all packages in dev mode
npm run dev
# Build all packages
npm run build
# Run tests
npm run test
License
Part of the Venus Tech ecosystem.