Skip to content

automl-classroom/AutoML

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoML Exam - SS25 (Tabular Data)

This repo serves as a template for the exam assignment of the AutoML SS25 course at the university of Freiburg.

The aim of this repo is to provide a minimal installable template to help you get up and running.

Installation

To install the repository, first create an environment of your choice and activate it.

You can change the Python version here to the version you prefer.

Virtual Environment

python3 -m venv automl-tabular-env
source automl-tabular-env/bin/activate

Conda Environment

Can also use conda, left to individual preference.

conda create -n automl-tabular-env python=3.11
conda activate automl-tabular-env

Then install the repository by running the following command:

pip install -e .

You can test that the installation was successful by running the following command:

python -c "import automl"

We place no restrictions on the Python version or libraries you use, but we recommend using Python 3.10 or higher.

Code

We provide the following:

  • download-datasets.py: This script downloads the suggested training datasets that we provide ahead of time, before the official exam dataset becomes available.

  • run.py: A script that loads in a downloaded dataset, trains an AutoML-System and then generates predictions for X_test, saving those predictions to a file. For the training datasets, you will also have access to y_test which is present in the ./data folder, however you will not have access to y_test for the test dataset we provide later. Instead you will generate the predictions for X_test and submit those to us through Github Classroom.

  • ./src/automl: This is a python package that will be installed above and contain your source code for whatever system you would like to build. We have provided a dummy AutoML class to serve as an example.

You are completely free to modify, install new libraries, make changes and in general do whatever you want with the code. The only requirement for the exam will be that you can generate predictions for X_test in a .npy file that we can then use to give you a test score through Github Classroom.

Data

Practice datasets:

The following datasets are provided for practice purposes:

  • bike_sharing_demand
  • brazilian_houses
  • wine_quality
  • superconductivity
  • yprop_4_1

You can download the practice data using:

python download-datasets.py

This will by default, download the data to the /data folder with the following structure. The fold numbers 1, ..., n refer to outer folds, meaning each can be treated as a separate dataset for training and validation. You can use the --fold argument to specify which fold you would like.

./data
├── bike_sharing_demand
│   ├── 1
│   │   ├── X_test.parquet
│   │   ├── X_train.parquet
│   │   ├── y_test.parquet
│   │   └── y_train.parquet
│   ├── 2
│   │   ├── X_test.parquet
│   │   ├── X_train.parquet
│   │   ├── y_test.parquet
│   │   └── y_train.parquet
│   ├── 3
    ...
├── wine_quality 
│   ├── 1
│   │   ├── X_test.parquet
│   │   ├── X_train.parquet
│   │   ├── y_test.parquet
│   │   └── y_train.parquet
    ...

Running an initial test

This will train a dummy AutoML system and generate predictions for X_test:

python run.py --task bike_sharing_demand --seed 42 --output-path preds-42-bsd.npy

You are free to modify these files and command line arguments as you see fit.

Reference performance

Dataset Test performance
bike_sharing_demand 0.9457
brazilian_houses 0.9896
superconductivity 0.9311
wine_quality 0.4410
yprop_4_1 0.0778

The scores listed are the R² values calculated using scikit-learn's metrics.r2_score.

Tips

  • If you need to add dependencies that you and your teammates are all on the same page, you can modify the pyproject.toml file and add the dependencies there. This will ensure that everyone has the same dependencies

  • Please feel free to modify the .gitignore file to exclude files generated by your experiments, such as models, predictions, etc. Also, be a friendly teammate and ignore your virtual environment and any additional folders/files created by your IDE.

About

This repository provides code template for AutoML SS25 exam - TABULAR Modality

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%