This repo serves as a template for the exam assignment of the AutoML SS24 course at the university of Freiburg.
The aim of this repo is to provide a minimal installable template to help you get up and running.
To install the repository, first create an environment of your choice and activate it.
For example, using venv
:
You can change the python version here to the version you prefer.
Virtual Environment
python3 -m venv automl-vision-env
source automl-vision-env/bin/activate
Conda Environment
Can also use conda
, left to individual preference.
conda create -n automl-vision-env python=3.11
conda activate automl-vision-env
Then install the repository by running the following command:
pip install -e .
You can test that the installation was successful by running the following command:
python -c "import automl"
We make no restrictions on the python library or version you use, but we recommend using python 3.8 or higher.
We provide the following:
-
run.py
: A script that trains an AutoML-System on the training splitdataset_train
of a given dataset and then generates predictions for the test splitdataset_test
, saving those predictions to a file. For the training datasets, the test splits will contain the ground truth labels, but for the test dataset which we provide later the labels of the test split will not be available. You will be expected to generate these labels yourself and submit them to us through GitHub classrooms. -
src/automl
: This is a python package that will be installed above and contain your source code for whatever system you would like to build. We have provided a dummyAutoML
class to serve as an example.
You are completely free to modify, install new libraries, make changes and in general do whatever you want with the
code. The only requirement for the exam will be that you can generate predictions for the test splits of our datasets
in a .npy
file that we can then use to give you a test score through GitHub classrooms.
We selected three different vision datasets which you can use to develop your AutoML system and we will provide you with
a test dataset to evaluate your system at a later point in time. The datasets can be automatically downloaded by the
respective dataset classes in ./src/automl/datasets.py
. The datasets are: fashion, flowers, and emotions.
If there are any problems downloading the datasets, you can download them manually
from https://ml.informatik.uni-freiburg.de/research-artifacts/automl-exam-24-vision/ and place them in the /data
folder
after unzipping them.
The downloaded datasets will have the following structure:
./data
├── fashion
│ ├── images_test
│ │ ├── 000001.jpg
│ │ ├── 000002.jpg
│ │ ├── 000003.jpg
│ │ ...
│ ├── images_train
│ │ ├── 000001.jpg
│ │ ├── 000002.jpg
│ │ ├── 000003.jpg
│ │ ...
│ ├── description.md
│ ├── fashion.tgz
│ ├── test.csv
│ └── train.csv
├── emotions
...
...
Feel free to explore the images and the description.md
files to get a better understanding of the datasets.
The following table will provide you an overview of their characteristics and also a reference value for the
accuracy that a naive AutoML system could achieve on these datasets:
Dataset name | # Classes | # Train samples | # Test samples | # Channels | Resolution | Reference Accuracy |
---|---|---|---|---|---|---|
fashion | 10 | 60,000 | 10,000 | 1 | 28x28 | 0.88 |
flowers | 102* | 5732 | 2,457 | 3 | 512x512 | 0.55 |
emotions | 7 | 28709 | 7,178 | 1 | 48x48 | 0.40 |
final test dataset | TBA | TBA | TBA | TBA | TBA | TBA |
*classes are imbalanced
We will add the test dataset later by pushing its class definition to the datasets.py
file.
The test dataset will be in the same
format as the training datasets, but test.csv
will only contain nan's for labels.
This will download the fashion dataset into ./data
, train a dummy AutoML system and generate predictions for the test
split:
python run.py --dataset fashion --seed 42 --output-path preds-42-fashion.npy
You are free to modify these files and command line arguments as you see fit.
The final test predictions should be uploaded in a file final_test_preds.npy
, with each line containing the predictions for the input in the exact order of X_test
given.
Upload your poster as a PDF file named as final_poster_vision_<team-name>.pdf
, following the template given here.
-
If you need to add dependencies that you and your teammates are all on the same page, you can modify the
pyproject.toml
file and add the dependencies there. This will ensure that everyone has the same dependencies -
Please feel free to modify the
.gitignore
file to exclude files generated by your experiments, such as models, predictions, etc. Also, be friendly teammate and ignore your virtual environment and any additional folders/files created by your IDE.