-
Notifications
You must be signed in to change notification settings - Fork 102
Contrib
from dgordon
Each job_ directory is associated with a number N. That job_ directory has all of the las files for N and all x < N. For example, if N is 62 for this job_ directory, then it will have:
raw_reads.5.raw_reads.62.N1.las
raw_reads.62.raw_reads.5.N1.las
However, it will not have
raw_reads.67.raw_reads.62.N1.las
That las file will be in the job_ directory associated with N=67.
You can tell what N is for a particular job directory by looking at the rj_*.sh
command in the job directory. The first raw_reads file on that line will tell you the value of N. For example,
daligner -v -t16 -H6000 -e0.7 -s1000 raw_reads.62 raw_reads.1 raw_reads.2 ...
shows you that N = 62 for this job_ directory.
The las files in a job_ directory have symbolic links to them from the m_ directories. Las files have 2 numbers in them such as:
raw_reads.7.raw_reads.62.N3.las
The first number in the las file (in this case 7
) tells you which m_ directory is linked to this las file, in this case it is m_00007
.
The larger number (which could be the first or the second number) tells you which job_ directory this las file will be in. In this example, raw_reads.7.raw_reads.62.N3.las
will be in the job_ directory with N = 62.
There can be multiple job_ directories for the same N. In my experience a single job_ won't have files for more than about 104 x's where x < N. So if N = 300, it will put x = 1 to 104 in one job directory, x = 105 to 209 in the next and 210 to 313 in a 3rd.
from dgordon
If the disk fills up to the level of the quota, the entire Falcon assembly may become corrupted and you will need to delete everything and start the assembly over from the beginning.
The reason is that in many case daligner will not crash when no more files can be written--it simply writes 0-length or truncated files, but will blithely continue on, and the done flag will be set, so fc_run.py will not know anything is wrong. The Falcon assembly will then start crashing at the LAsort stage. Restarting Falcon will not work. It becomes difficult to determine which job_ directories are corrupted and which are not.
(Editor's note: We'll have to address this issue some day.)
from David Gordon: I didn't find this in the regular documentation and had to learn it the hard way so I'm trying to save you all some struggle.
-
sge_option_da
controls the options for running the daligner jobs that run out of the subdirectory0-rawreads
. Note that daligner will make 4 threads. For human, daligner uses about 30GB for this process and it uses 4 slots so I use the following:sge_option_da = -pe serial 4 -l mfree=7.5G
-
sge_option_la
controls the sge options for, I believe, the LAsort/merge and LA4Falcon jobs that run out of the subdirectory0-rawreads
. This step will require about 6GB for human. I use:sge_option_la = -pe orte 6 -l mfree=6G
-
sge_option_pda
is used for the daligner jobs that run out of thesubdirectory 1-preads_ovl
. Again, daligner will make 4 threads. For human, this stage of daligner uses more than 30G so I use the following:sge_option_pda = -pe serial 4 -l mfree=12G
-
sge_option_pla
must be for the LAsort/merge jobs that run out of the1-preads_ovl
directory:sge_option_pla = -pe orte 2 -l mfree=6G
-
sge_option_fc
is used for the final2-asm-falcon
stage, including running fc_graph_to_contig.py My experience is that 6GB is not sufficient.sge_option_fc = -pe orte 6 -l mfree=6G
-
below is for
ct_*
taskssge_option_cns = -pe orte 6 -l mfree=6G -l ssd=FALSE
-
pa_concurrent_jobs
specifies the max # of daligner0-rawreads
jobs.