Skip to content

ianycxu/kaggle-pipeline

This branch is up to date with randxie/kaggle-pipeline:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jul 22, 2017
524e9cf · Jul 22, 2017

History

7 Commits
Jul 22, 2017
Jul 22, 2017
Jul 22, 2017
Jul 22, 2017

Repository files navigation

Kaggle Pipeline

The design of pipeline is based on my experience in Kaggle competitions and other data challenges. In this pipeline, I try to decouple different steps from data IO, feature extraction, building model, validation to stacking. The main purpose is for fast protytyping (not scalable). In terms of stacking, I am not a fan of multiple level stacking so the Stacker only provides basic stacking.

File Structure

├── input	<-- train and test data
├── README.md
├── submission	<--- final submission folder
├── src	<-- where code is stored
│   ├── main.py	<--- main process
│   ├── DataWarehouse.py	<--- handle data IO
│   ├── FeatureGenerator.py	<--- convert raw data to feature matrix
│   ├── Model.py	<--- used to initialize different machine learning models
│   ├── Stacker.py	<--- stacking different models together with feature matrix
│   ├── Validator.py	<--- k-fold validation of Model/Stacker
│   ├── common.py	<--- define where files are stored and setup logging
│   ├── config	<--- storing config files for stacking, models, features to extract

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%