Just curious.
Soprano is an extremely lightweight text to speech model designed to produce highly realistic speech at unprecedented speed.
[Device: CUDA | Backend: transformers] 5060TI, 3060 - 3.258s generation time for 6.46s audio.
[wasm ONNX model] i7-11700K with 3600MT/s DDR4 - TTFB 1.552s then 0.9-1.0x buffering. Really difficult to figure out when it's done generating it, but it's arguably extremely close to nvidia cards.
So there's definitely something wrong here, I should see a performance difference between CPU and GPU, and between GPUs.
Thoughts?
Just curious.
Soprano is an extremely lightweight text to speech model designed to produce highly realistic speech at unprecedented speed.[Device: CUDA | Backend: transformers] 5060TI, 3060 - 3.258s generation time for 6.46s audio.
[wasm ONNX model] i7-11700K with 3600MT/s DDR4 - TTFB 1.552s then 0.9-1.0x buffering. Really difficult to figure out when it's done generating it, but it's arguably extremely close to nvidia cards.
So there's definitely something wrong here, I should see a performance difference between CPU and GPU, and between GPUs.
Thoughts?