[Question] How to best use AutoSklearn in a nested cross validation setting?

# Short Question Description
What is the suggested way of using AutoSklearn in a nested CV setting, i.e. for example in combination with scikitlearn's `cross_validate`?

# Further details
## The general setup/idea/goal:
AutoSklearn in a nested CV setup (inner CV), using Dask on a Slurm cluster for parallelisation and scikitlearn's `cross_validate` for the outer CV. 

## The problem:
Combining these 3 aspects leads to a `TypeError: cannot pickle '_asyncio.Task' object` error which doesn't appear when using Autosklearn just with fit() on a test data set instead of embedding it in the nested CV scenario.

## A code snippet that should lead to the problem:
```
from dask_jobqueue import SLURMCluster
from dask.distributed import Client
import time
import sys
import logging
from sklearn import model_selection
from sklearn.datasets import make_regression
from sklearn.model_selection import ShuffleSplit, cross_validate
from autosklearn.regression import AutoSklearnRegressor
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)

# input arguments
cluster_jobs = 1
n_samples = 10000
t_ASL = 5  # in minutes
n_repeats_outer = 2
n_jobs_outer = 1

# Dask cluster and client
cluster = SLURMCluster(nanny=True)  # other specifications in config file
cluster.scale(jobs=cluster_jobs)
client = Client(cluster)

# make fake regression data
X, y, true_weights = make_regression(
    n_samples= n_samples,
    n_features=1300,
    n_informative=400,
    noise=8,
    coef=True,
    random_state=0,
)

# further definitions
random_state = 43

# inner CV
cv_inner_folds = 2

# initialise ASL regressor
auto_model = AutoSklearnRegressor(
        time_left_for_this_task=t_ASL*60,  # multiplies by no. outer folds
        memory_limit=400000,
        resampling_strategy="cv",
        resampling_strategy_arguments={"folds": cv_inner_folds},
        metric=None,  # None: Use default of each algorithm (not mean_squared_error)
        n_jobs=1,  # n_jobs is ignore when passing a dask client
        initial_configurations_via_metalearning=0,  # avoid config. warnings
        dask_client=client,
        tmp_folder=(
            '/p/project/comanukb/vkomeyer/motorpred/ukb/code/Dask_ASL_minexpl/'
            'logs_SlurmCluster/auto_sklearn_log1')
    )

# outer CV
outer_cv = ShuffleSplit(
    n_splits=n_repeats_outer, random_state=random_state,
    train_size=0.7)
scoring_outer = ['r2']

# Model fitting
if __name__ == "__main__":
    score = cross_validate(
        estimator=model,
        X=X, y=y,
        cv=outer_cv,
        scoring=scoring_outer,
        return_estimator=True,
        n_jobs=n_jobs_outer)
```

## The resulting error:
`TypeError: cannot pickle '_asyncio.Task' object`originating from `cross_validate()` (I can provide you with the entire traceback if that was helpful)

## Further info
The error doesn't occur neither when the nested cross validate setting is replace by a simple train/test split and `auto_model.fit(X_train, y_train)` nor when using the cross validate setting without AutoSklearn but e.g. scikitlearn's GridSearchCV (comparison code can be provided in case that would be helpful).

Thank you very much for any hints to solve this!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] How to best use AutoSklearn in a nested cross validation setting? #1570

Short Question Description

Further details

The general setup/idea/goal:

The problem:

A code snippet that should lead to the problem:

The resulting error:

Further info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How to best use AutoSklearn in a nested cross validation setting? #1570

Description

Short Question Description

Further details

The general setup/idea/goal:

The problem:

A code snippet that should lead to the problem:

The resulting error:

Further info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions