Stars
Unofficial implementation of "Simplifying, Stabilizing & Scaling Continuous-Time Consistency Models" for MNIST
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper
The PyTorch-based audio source separation toolkit for researchers
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Open-Unmix - Music Source Separation for PyTorch
Official code for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)