Skip to content

Commit

Permalink
Merge pull request #1 from ML4GLand/revision
Browse files Browse the repository at this point in the history
Revision
  • Loading branch information
adamklie authored Sep 18, 2023
2 parents 839bee0 + 3478545 commit dbd6754
Show file tree
Hide file tree
Showing 497 changed files with 6,889 additions and 245,960 deletions.
13 changes: 12 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,20 @@
data
data/
memray/
figures/
logs/
output/
archive/
zenodo/
figures/
*.DS_Store
*.out
*.err
*.html
*.bin
*.tmp
temp/
wandb/
checkpoints/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
3 changes: 3 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,6 @@ Affirmer's express Statement of Purpose.

For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>

For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,31 @@ Each directory within this repository is broken up into the three use cases (sec
You can install the version of EUGENe used for the preprint with `pip`

```bash
pip install eugene-tools==0.0.6[janggu,kipoi,memesuite]
pip install eugene-tools==0.1.2
```

# Datasets
You can find the raw and processed data for running the code and generating results at the following Zenodo link: https://doi.org/10.5281/zenodo.7140082.
You can find the raw and processed data for running the code and generating results at the following Zenodo link: [https://zenodo.org/deposit/7140083#.](https://doi.org/10.5281/zenodo.7140082)

## Subdirectories

## `configs/`
These contain `.yaml` files used for training models in each use case (when applicable).

## `docs/`
Contains manuscript figure files!
## `notebooks`
Notebooks for each use case are organized as follows:

## `figures/`
These contain the `.pdf` files generated by EUGENe code, prior to being massaged into the figures in the `docs` folder.

## `notebooks/`
What you really came for. These are the pillars of the workflow run for each use case. Notebooks for each use case are organized as follows:
- `dataset_ETL.ipynb` — extract the data from it’s downloaded or raw format, transform and preprocess it through a series of steps and get it ready for loading into a PyTorch model
- `dataset_EDA.ipynb` — perform visualizations of your data to better understand what is going on with it. You can often iterate between this and ETL to get a final version of your data for loading into a model
- `dataset_training.ipynb` — train a single or multiple models on one or multiple iterations of the dataset. This notebook or section is reserved for calls to fit and visualizations of training summaries
- `dataset_evaluate.ipynb` — evaluate trained models on test data and visualize and summarize the performance. This often starts with loading in the best iteration of the model from the training notebook and getting predictions on some test data of interest. Once predictions are generated, they can be added to SeqData or loaded in to generate useful summaries and visualizations
- `dataset_interpret.ipynb` — interpret trained models with either test data or random data that is manipulated by model outputs or prior knowledge. This can often be combined with the previous notebook, but can sometimes be standalone
- `dataset_intepret.ipynb` — interpret trained models with either test data or random data that is manipulated by model outputs or prior knowledge. This can often be combined with the previous notebook, but can sometimes be standalone
- `dataset_plotting.ipynb` - generate plots not already generated in the previous notebooks.
- `dataset_gpu_util.ipynb` — tests to make sure EUGENe is using the GPU and that the GPU is working properly

There is also the `plotting.ipynb` notebook in the `training_mem` folder where we show the results of using SeqData to load large datasets out-of-core!

**Note**: If you want to compare the DeepBind models to Kipoi's submitted DeepBind models, you will need to install Kipoi: https://github.com/kipoi/kipoi

## `scripts/`
## `scripts`
These contain Python scripts for when you have to submit a job to a cluster or run it on your local machine behind a screen because it will take too long otherwise. These are organized in a similar manner to the `notebooks` for each use case.
17 changes: 0 additions & 17 deletions configs/jores21/Jores21CNN.yaml

This file was deleted.

29 changes: 29 additions & 0 deletions configs/jores21/cnn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module: SequenceModule
model:
model_name: CNNalaJores21
arch_name: CNN
arch:
input_len: 170
output_dim: 1
conv_kwargs:
input_channels: 4
conv_channels: [256, 256, 256]
conv_kernels: [13, 13, 13]
conv_strides: [1, 1, 1]
pool_kernels: [2, 2, 2]
pool_strides: [2, 2, 2]
dropout_rates: 0.3
batchnorm: True
activations: relu
dense_kwargs:
hidden_dims: [64]
dropout_rates: 0.2
batchnorm: True
task: regression
loss_fxn: mse
optimizer: adam
optimizer_lr: 0.001
scheduler: reduce_lr_on_plateau
scheduler_monitor: val_loss_epoch
scheduler_kwargs:
patience: 2
15 changes: 15 additions & 0 deletions configs/jores21/deepstarr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
module: SequenceModule
model:
model_name: DeepSTARR_Jores21
arch_name: DeepSTARR
arch:
input_len: 170
output_dim: 1
task: regression
loss_fxn: mse
optimizer: adam
optimizer_lr: 0.001
scheduler: reduce_lr_on_plateau
scheduler_monitor: val_loss_epoch
scheduler_kwargs:
patience: 2
32 changes: 32 additions & 0 deletions configs/jores21/hybrid.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
module: SequenceModule
model:
model_name: HybridalaJores21
arch_name: Hybrid
arch:
input_len: 170
output_dim: 1
conv_kwargs:
input_channels: 4
conv_channels: [256, 256, 256]
conv_kernels: [13, 13, 13]
conv_strides: [1, 1, 1]
pool_kernels: [2, 2, 2]
pool_strides: [2, 2, 2]
dropout_rates: 0.3
batchnorm: True
activations: relu
recurrent_kwargs:
hidden_dim: 128
batch_first: True
dense_kwargs:
hidden_dims: [64]
dropout_rates: 0.2
batchnorm: True
task: regression
loss_fxn: mse
optimizer: adam
optimizer_lr: 0.001
scheduler: reduce_lr_on_plateau
scheduler_monitor: val_loss_epoch
scheduler_kwargs:
patience: 2
24 changes: 24 additions & 0 deletions configs/jores21/jores21_cnn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
module: SequenceModule
model:
model_name: Jores21CNN
arch_name: Jores21CNN
arch:
input_len: 170
output_dim: 1
layers: 3
filters: 256
kernel_size: 13
stride: 1
hidden_dim: 64
dropout: 0.3
task: "regression"
loss_fxn: "mse"
task: regression
loss_fxn: mse
optimizer: adam
optimizer_lr: 0.001
scheduler: reduce_lr_on_plateau
scheduler_monitor: val_loss_epoch
scheduler_kwargs:
patience: 2

21 changes: 0 additions & 21 deletions configs/jores21/ssCNN.yaml

This file was deleted.

25 changes: 0 additions & 25 deletions configs/jores21/ssHybrid.yaml

This file was deleted.

17 changes: 0 additions & 17 deletions configs/kopp21/Kopp21CNN.yaml

This file was deleted.

21 changes: 0 additions & 21 deletions configs/kopp21/dsCNN.yaml

This file was deleted.

15 changes: 0 additions & 15 deletions configs/kopp21/dsFCN.yaml

This file was deleted.

24 changes: 0 additions & 24 deletions configs/kopp21/dsHybrid.yaml

This file was deleted.

31 changes: 31 additions & 0 deletions configs/kopp21/dscnn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
module: SequenceModule
model:
model_name: dsCNNalaKopp21
arch_name: dsCNN
arch:
input_len: 500
output_dim: 1
aggr: concat
conv_kwargs:
input_channels: 4
conv_channels: [10, 8]
conv_kernels: [11, 3]
conv_strides: [1, 1]
pool_types: [max, null]
pool_kernels: [30, null]
pool_strides: [1, null]
dropout_rates: 0.2
batchnorm: True
activations: relu
dense_kwargs:
hidden_dims: [64]
dropout_rates: 0.2
batchnorm: True
task: binary_classification
loss_fxn: bce
optimizer: adam
optimizer_lr: 0.001
scheduler: reduce_lr_on_plateau
scheduler_monitor: val_loss_epoch
scheduler_kwargs:
patience: 2
20 changes: 20 additions & 0 deletions configs/kopp21/dsfcn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module: SequenceModule
model:
model_name: dsFCNalaKopp21
arch_name: dsFCN
arch:
input_len: 500
output_dim: 1
aggr: concat
dense_kwargs:
hidden_dims: [256, 128]
dropout_rates: 0.2
batchnorm: True
task: binary_classification
loss_fxn: bce
optimizer: adam
optimizer_lr: 0.001
scheduler: reduce_lr_on_plateau
scheduler_monitor: val_loss_epoch
scheduler_kwargs:
patience: 2
Loading

0 comments on commit dbd6754

Please sign in to comment.