Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders

Overview

Tensorflow implementation of Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders, which compresses (encodes) a short sound files into a compact, discrete representation, and decompresses it to a waveform again. The method relies on an intermediate "spectrogram" representation. An gradient-based approximate inverse-STFT is included for generating a sound waveform from the reconstructed spectrogram.

For additional details, please see the report and sound samples.

Getting started

Install dependencies

Requirements

Python 3.7
tensorflow==1.15.2
dm-sonnet==1.36
tensorflow-probability==0.8.0

pip install -r requirements.txt

Training

The model training script train.py requires an configuration YAML, and the configs used in the report experiments can be found in experiments/. The minimal configs are intended for debugging.

python train.py -f experiments/nsynth-full.yaml

Evaluation

evaluate.py evaluates testset the predictive performance of a trained model.

Prediction

predict.py compresses/reconstructs a new sound file.

Dataset pipelines

tf.data input pipelines for the following datasets are included:

.wav soundfiles, 4-seconds (such as Nsynth)
CIFAR10
MNIST

Results

Nsynth dataset validation error plots for autoencoders of increasing latent representation size.

CIFAR10-dataset

Citation

@mastersthesis{HansenVedal1376201,
	title = {Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders},	
	author = {Hansen Vedal, Amund},
	institution = {KTH, School of Electrical Engineering and Computer Science (EECS)},
	pages = {75},
	year = {2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
experiments		experiments
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
input_pipeline_cifar10.py		input_pipeline_cifar10.py
input_pipeline_mnist.py		input_pipeline_mnist.py
input_pipeline_nsynth.py		input_pipeline_nsynth.py
models.py		models.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders

Overview

Getting started

Install dependencies

Requirements

Training

Evaluation

Prediction

Dataset pipelines

Results

Citation

About

Releases

Packages

Languages

vedal/audio_spec_vqvae

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders

Overview

Getting started

Install dependencies

Requirements

Training

Evaluation

Prediction

Dataset pipelines

Results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages