Skip to content

Add quickstart and detailed walkthrough of AutoSklearn for docs #1228

@eddiebergman

Description

@eddiebergman

Note to anyone who reads this issue, please feel free to leave comments on what you would like to see included.

Adding detailed documentation

Currently the documentation lacks an introduction of what's possible beyond basically calling fit. We do detail some other functionality in the manual but otherwise most functionality is hidden in examples, API documentation or in the issues section.

Review from users seem to indicate that there is a lot of back and forth between using autosklearn, needing to do something, going to the docs to try find the relevant API and repeating this process until they have gotten what they need done. While most of these functionalities are documented in examples, it seems there's a desire for a more documented approach. There has also been some issue raised which seem to indicate that users are not always aware of what autosklearn even does other than produce predictions. This doesn't seem to be documented anywhere other than the papers which should not be an expectation that they read and understand them.

To change this we likely need three things:

  • More detail on the high level procedure of autosklearn and the end product
  • A quickstart document that goes through using autosklearn with accompanying commentary.
  • A more detailed walkthrough of different functionalities of autosklearn and the information that can be gotten out of it.

As there are soon to be proposed API changes, perhaps some of this is best left until after those have been implemented. However it can be started in the meantime.

High Level Overview

This doesn't have to be long but it should advertise:

  • Autosklearn aims to be sklearn compliant
  • Each model autosklearn produces consists of data preprocessing followed by a pipeline [feature preprocessing, sklearn model].
  • Data preprocessing is used to ensure that feature preprocessors and the sklearn models can use the data
  • The pipelines [feature preprocessing, sklearn model] are searched for efficiently, including their hyperparemeters.
  • Autosklearn focuses on evaluating pipelines that seem promising based on past evaluations while also efficiently trying out kinds of pipelines for which we know little about.
  • Once the time limit has elapsed, autosklearn is left with a large collection of pipelines.
  • Autosklearn then builds an ensemble out of some of these pipelines which is used to improve overall performance

Quickstart

The idea for a quickstart is to essentially to use one of the basic examples and walk through:

  • Loading in raw data
  • Configuring it into a pandas dataframe with the feature types
  • Fitting the model
  • Getting the predictions
  • Calculating a score
  • Viewing some information of what autosklearn has done.

During this walk through it would be good to leave many links to API reference and mention some other possibilities but sticking to the core workflow.

Functionality Walkthroughs

Autosklearn has different functionalities that are mostly known through the API documentations or examples. It is hard to know this functionality even exists unless you look through the examples, which could do with some added commentary.

To address this we likely need a small walkthrough of using these different functionalities with added code snippets and commentary, linking to the relevant examples for a more full code treatment of the matter.

Some functionalities (in no particular order) which would be good to document in this manner:

  • What kind of data autosklearn can handle
  • Autosklearn is non-deterministic
  • Managing memory
    • Setting memory limits per core
    • Sources of memory consumption and how to address them
  • Configuring the pipelines searched
  • Sampling strategies
  • Parallelism and Dask Clients
  • SMAC
    • Custom SMAC objects
    • using callbacks
  • Logging
  • Meta-data
  • Metrics
  • Scorings
  • Outputs that autosklearn generates in tmp
  • Fitting an individual pipeline
  • Configuration spaces
  • Accessing autosklearn's results
  • Refitting
  • Pickling models

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationSomething to be documented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions