MR. NODE (Multiple predictoR Neural ODE) is a deep learning method that models the infection rate of black Sigatoka via ordinary differential equations, and which can infer the infection risk variable at an arbitrary point on the timeline.
This is the submission of the University of Toronto team to ProjectX 2020, an international machine learning research competition.
.
├── baseline # All files related to baseline models
├── mr_node # Data structures for MR. NODE
├── train.py # Training script for MR. NODE
├── test.py # Testing script for MR. NODE
└── data # Time series data for Costa Rica and India.
We have collected microclimatic data from India and Costa Rica, two regions of the world known for having vast banana plantations and synthesized the corresponding infection risk variable via a probabilistic survival process inspired by [2].
| Region | Latitude | Longitude |
|---|---|---|
| Costa Rica | 10.39 | -83.812 |
| Maharashtra, India | 18.8143 | 73.125 |
- Install Poetry.
- Clone this repository and
cdinto its directory. - Install the project and run the training script in the right environment.
$ poetry install
$ poetry shell
$ python train.pyYou may use the following command to train the model. The results can be found in /results.
$ python train.py --region=cr --solver=euler --lr=3e-4 --encoder_fc_dims 8 16 8 --hidden_dims=4 --odefunc_fc_dims 64 64 --decoder_fc_dims 64 64 --window_length=128 --num_epochs=1 --rtol=1e-4 --atol=1e-6Keyword arguments:
region: Whether to train using data from Costa Rica (cr), India (in), or both (crin). Default:crsolver: ODE solver (see torchdiffeq for the complete list). Default:eulerlr: learning rate. Default:3e-4encoder_fc_dims: Fully-connected layers in the encoder. Default:8 16 8hidden_dims: Dimensions of latent space. Default:4odefunc_fc_dims: Fully-connected layers in the dynamics function. Default:64 64decoder_fc_dims: Fully-connected layers in the decoder. Default:8 16 8window_length: Window length for time steps. Default:128num_epochs: Number of training epochs. Default:1rtol: Relative tolerance for Neural ODE. Default:1e-4atol: Absolute tolerance for Neural ODE. Default:1e-4
Training a model with a set of arguments will generate a .pt file in /results/models uniquely identified by a job_id created based on the training arguments. You may use this job_id to specify which model to test.
$ python test.py --region=cr --job_id='cr_euler_lr3.0e-04_enc[8, 16, 8]_hidden4_ode[64, 64]_dec[8, 16, 8]_window128_epochs1_rtol0.0001_atol1e-06' --plot_indiv=False --num_to_keep=100Keyword arguments:
region: Whether to test using data from Costa Rica (cr), India (in), or both (crin). Default:crjob_id: Job id of the model to test. Default:cr_euler_lr3.0e-04_enc[8, 16, 8]_hidden4_ode[64, 64]_dec[8, 16, 8]_window128_epochs1_rtol0.0001_atol1e-06plot_indiv: Whether or not to generate individual plots inresults/plots. If not, all the plots will be created on a single image file. Default:Falsenum_to_keep: Number of time steps to use to create the initial latent state. This must be a positive integer no greater than 100. Default:100
You may use the following command to train the baseline RNN or LSTM model. The results can be found in /baseline/baseline_results.
$ cd baseline
$ python train_baseline.py --region=cr --lr=0.001 --batch_size=256 --seq_len=100 --num_epochs=1 --n_hidden=20 --model_name=lstmKeyword arguments:
region: Whether to train using data from Costa Rica (cr), India (in), or both (crin). Default:crlr: learning rate. Default:0.001batch_size: batch size. Default:256seq_len: Number of ground-truth points to use when extrapolating. This must be a positive integer no greater than 100. Default:100num_epochs: Number of training epochs. Default:1n_hidden: Number of hidden units in the RNN/LSTM. Default:20model_name: Can belstmorrnn. Default:lstm
Training a model with a set of arguments will generate a .pt file in /baseline/baseline_results/models uniquely identified by a job_id created based on the training arguments. You may use this job_id to specify which model to test.
$ cd baseline
$ python test_baseline.py --region=cr --job_id='cr_lstm_lr1.0e-03_batch256_seq100_epochs1_hidden20'Keyword arguments:
region: Whether to test using data from Costa Rica (cr), India (in), or both (crin). Default:crjob_id: Job id of the model to test. Default:cr_lstm_lr1.0e-03_batch256_seq100_epochs1_hidden20
[1] Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural Ordinary Differential Equations. 2018. https://arxiv.org/abs/1806.07366.
[2] Daniel P. Bebber. Climate change effects on Black Sigatoka disease of banana. May 2019. https://doi.org/10.1098/rstb.2018.0269.
