Multiple architectures for generating bird sounds using BirdCLEF 2023 dataset
-
Clone repository
-
Create and activate the environnement (conda needs to be installed), then make the kernel visible in jupyter (not needed if you always launch jupyter notebook from the birdgen env)
conda env create -f env.yml conda activate birdgen python -m ipykernel install --user --name=birdgen
-
Download the dataset from https://www.kaggle.com/competitions/birdclef-2023/data and put the train_audio folder in the working dir
-
Run
Selection_cris_via_energie.ipynb
if you wish to train models on a dataset more likely to contain bird sounds instead of simply the first 2 seconds of each file -
Train models by running the VAE, GAN and VAE-GAN notebooks, you can use tensorboard to watch progress, replace
[dir]
with./vae
,./gan
or./vaegan
to get logs for a specific category of models, or simply./
for everything (assuming you are in the bird_soud_generation folder)tensorboard --logdir=[dir]
You can use the original dataset or the version where bird sounds are selected
-
Run inference for each model to generate sound files with the corresponding notebooks, adapt the checkpoints path for the models you trained
-
Compute the Fréchet Audio Distance between the sounds generated by each model, and the sounds in the validation set using the
FAD.ipynb
notebook -
NDB.ipynb
is used to compute JS divergence and Number os statistically Different Beans -
human_eval.ipynb
launches a website where users can vote for the best audio from a random pair to evaluate which model is betterplot_human_eval.ipynb
will compute the winrate of each model against the others