A high-performance Speech-to-Text (STT) and Text-to-Speech (TTS) API designed for Mini PCs, utilizing faster-whisper and neutts-air.
- STT: Powered by
faster-whisper. - TTS: Powered by
neutts-air. - Optimized: Uses PyTorch CPU builds (
torchao,intmm) for efficient inference on CPU-only devices. - Framework: Built with FastAPI.
- Python >= 3.12
- uv (for dependency management)
- Clone the repository.
- Install dependencies using
uv:
uv syncThis project specifically targets CPU usage with PyTorch optimized for CPU.
Start the server using uv run:
uv run uvicorn src.main:app --host 0.0.0.0 --port 8000The API will be available at http://localhost:8000.
GET /healthTranscribe an audio file.
Endpoint: POST /v1/speech/stt
Curl Example:
curl -X POST "http://localhost:8000/v1/speech/stt" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/audio.wav"Response:
{
"text": "Transcribed text...",
"language": "en",
"probability": 0.99
}Convert text to audio.
Endpoint: POST /v1/speech/tts
Curl Example:
curl -X POST "http://localhost:8000/v1/speech/tts" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{ "text": "Hello world" }' \
--output output.wavResponse:
- Returns an
audio/wavfile.