-
Notifications
You must be signed in to change notification settings - Fork 0
Config
Miroslav Pakanec edited this page Mar 29, 2021
·
2 revisions
- Fishnchips config files can be found at
/configs, e.g. seetest_config.json - Most scripts (such as
run_fishnchips.py,train_fishnchips, etc.) expect config with the--config/-cflag
- signal_window_size - defines how the read signal will be segmented (e.g. 300 long windows)
- label_window_size - defines the label window size cap for each signal window (e.g. 100)
! make sure that no corresponding label of a signal window exceeds this cap
- attention_blocks - number of transformer encoder/decoder blocks (e.g. if set to 4, there will be 4 encoder and 4 decoder blocks)
- cnn_blocks - number of CNN blocks (each block has 3 convolution layers, relu activation, residual connection and dropout layers)
- maxpool_kernel - maxpool kernel size
! signal_window_size % maxpool_kernel == 0
- maxpool_idx - defines the position of the maxpool layer between CNN blocks (e.g. if set to 3, there are 3 CNN blocks followed by a max pool layer, followed by the rest of the CNN blocks)
! maxpool_idx < cnn_blocks
- d_model - depths of the model (if set to 250, each signal point becomes a 250 vector)
- dff - the size of the point-wise feed-forward network after the self attention
- num_heads - number of attention heads
! d_model % num_heads == 0
- dropout_rate
- data - path to training data hdf5 file
- epochs
- patience - number of epochs to train without improvement (when validation accuracy improves, patience is reset). e.g. 300
- warmup - warmup epochs ( patience is reset after each epoch )
- batches - defines how many batches make an epoch (e.g. 1000 )
This should be adjusted with signal window size due to performance
- batch_size
- buffer_size - how many reads should be loaded, segmented into windows, and shuffled to create training batches
- lr_mult - see appendix of Improving base calling accuracy with Transofrmers
- signal_window_stride - the overlap of signal windows during segmentation
- data - path to validation data hdf5 file
- batch_size
- buffer_size - legacy parameter that can be emited and does not influence any workflow
- signal_window_stride - For validation, set this to signal_window_size
- batch_size
- buffer_size - legacy parameter that can be emited and does not influence any workflow
- signal_window_stride - overlap, which determines how predictions are assembled
- signal_window_stride < signal_window_size => assembler will be used to compute alignment and consensus of predicted sequences
- signal_window_stride == signal_window_size => predications witll be concatinated
- signal_window_stride > signal_window_size => error
- save_predictions - save predications as a fasta file
- reads - number of reads to run within a data directory
- bacteria - describes test data (this list will be enumerated, such that e.g. if reads parameter is 20, 20 reads will be inferred for each element):
- name - will be used for reports, and evaluation filenames
- data - folder containing fast5 files
- reference - path to the reference genome of this particular bacteria