| report | sound samples |
Tensorflow implementation of Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders, which compresses (encodes) a short sound files into a compact, discrete representation, and decompresses it to a waveform again. The method relies on an intermediate "spectrogram" representation. An gradient-based approximate inverse-STFT is included for generating a sound waveform from the reconstructed spectrogram.
For additional details, please see the report and sound samples.
- Python 3.7
- tensorflow==1.15.2
- dm-sonnet==1.36
- tensorflow-probability==0.8.0
pip install -r requirements.txt
The model training script train.py
requires an configuration YAML, and the configs used in the report experiments can be found in experiments/
. The minimal
configs are intended for debugging.
python train.py -f experiments/nsynth-full.yaml
evaluate.py
evaluates testset the predictive performance of a trained model.
predict.py
compresses/reconstructs a new sound file.
tf.data
input pipelines for the following datasets are included:
Nsynth dataset validation error plots for autoencoders of increasing latent representation size.
@mastersthesis{HansenVedal1376201,
title = {Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders},
author = {Hansen Vedal, Amund},
institution = {KTH, School of Electrical Engineering and Computer Science (EECS)},
pages = {75},
year = {2019}
}