Skip to content

Experiments and analytics on multi-task learning with Bayesian optimization of structure search

Notifications You must be signed in to change notification settings

nakytoe/BOSS-MT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Accelerating Bayesian Optimization Structure Search with Transfer Learning

Wellcome to BOSS-MT. This repo contains the data analysis scripts used in my 2020 M.Sc. thesis 'Accelerating Bayesian Optimization Structure Search with Transfer Learning'. The purpose of this repository is to allow verifying and reproducing the results of my thesis work. This is not a user manual for using transfer learning with BOSS, or a standalone document. See the BOSS project (references in the thesis) for further information on BOSS use. It is assumed that the user is familiar with the thesis work.

The folder structure is the following:

  • data: This folder would contain the raw, unprocessed experiment data. The raw data was not published here because of the large size and unconventional metadata. processed_data contains the data in a cleaned, easy-to-use format. The raw data is available by request from the BOSS project.

  • processed_data: Raw data (boss.out files) parsed to json format for analysis.

  • results: figures and tables created by the analysis scripts are created here.

  • src: Analysis scripts and config files.

  • visuals: repository visualization

Understanding the experiments

Each folder in processed data contains parsed outputs of one experiment. Each experiment is named with a 4 character code as follows: Naming of the experiments Most experiments contain multiple BOSS runs. Each boss run is named exp_N, where N is a running number. The settings in all runs under same experiment are equal, but the number of secondary data, and the initialization data itself, may vary for statistics depending on the experiment.
Processed data is in json format. You can load data of each run to python dictionary with python json module using json.load(filepath). The setup for each experiment run can be seen from "boss.in" keyword. Relevant settings are also listed under their own keywords. Use .keys() function to list all the keywords for a setup.

Reproducing the analysis

The analysis pipelines in this project are managed by Snakemake. All the analysis is completely reproducible. The main input file for running the analysis is the Snakefile, that can be found here in the root folder. To run the analysis pipeline, copy clone this repository
git clone [email protected]:NuuttiSten/BOSS-MT.git
install anaconda virtual environment with the requirements
conda create --name stenthesis --file requirements.txt ,
launch the environment
conda activate stenthesis
and run Snakemake with
snakemake.
Parsed data is saved under processed_data/.
Final analysis outputs are stored to results/.
Running the whole analysis pipeline takes about 15 minutes.

Citing the work

To cite, use:

Sten, N. A. 2020. 'Accelerating Bayesian Optimization Structure Search with Transfer Learning'. M.Sc. Thesis. Aalto University, Espoo, Finland.

or load bibtex citation.

About

Experiments and analytics on multi-task learning with Bayesian optimization of structure search

Resources

Stars

Watchers

Forks

Packages

No packages published