Music genre classification under noisy mixing conditions using the Messy Mashup competition dataset.
Jan 2026 DLGenAI Project - Messy Mashup
https://www.kaggle.com/competitions/jan-2026-dl-gen-ai-project/
The challenge is to classify music genres from mashups created by mixing instrument stems from different songs of the same genre, with added environmental noise.
Genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock
├── notebooks/
│ ├── milestone-1.ipynb # Milestone 1: EDA and baseline
│ └── kaggle_notebook.ipynb # Kaggle submission notebook
├── src/
│ ├── train.py
│ ├── inference.py
│ └── utils.py
├── reports/
├── models/
├── requirements.txt
├── .gitignore
└── README.md
- Dataset exploration and validation (completeness, corruption, size analysis)
- Silence detection across training stems
- Audio stem mixing and peak normalization
- Baseline heuristic methods using librosa
The dataset is available on Kaggle and contains:
genres_stems/- 10 genres x 100 songs x 4 stems (drums, vocals, bass, others)ESC-50-master/- Environmental noise datasetmashups/- 3,020 test audio mashupstest.csv- Test index filesample_submission.csv- Submission template
- Synthetic mashup generation from genre stems
- ESC-50 noise injection pipeline
- Mel-spectrogram feature extraction
- Baseline model evaluation
- Custom CNN architecture (4 conv blocks)
- Training with Adam optimizer and StepLR scheduler
- WandB experiment tracking
- ResNet18 fine-tuning on mel-spectrograms
- CRNN with bidirectional GRU for temporal features
- Model ensemble with weighted averaging
| Notebook | Description |
|---|---|
01_eda.ipynb |
Dataset exploration and visualization |
02_data_generation.ipynb |
Synthetic mashup generation |
03_model_cnn.ipynb |
CNN from scratch |
04_model_resnet.ipynb |
ResNet18 fine-tuning |
05_model_crnn.ipynb |
CRNN with bidirectional GRU |
06_ensemble.ipynb |
Model ensemble and comparison |
kaggle_notebook.ipynb |
Final Kaggle submission |
(Run notebooks/06_ensemble.ipynb to generate final metrics and replace this section)
| Model | Macro F1 | Accuracy |
|---|---|---|
| GenreCNN | TBD | TBD |
| ResNet18 | TBD | TBD |
| GenreCRNN | TBD | TBD |
CRNN typically performs best due to its ability to capture temporal dynamics.