Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors while genotype generation. #33

Open
sawantaniket461 opened this issue Oct 8, 2024 · 1 comment
Open

Errors while genotype generation. #33

sawantaniket461 opened this issue Oct 8, 2024 · 1 comment

Comments

@sawantaniket461
Copy link

I am trying to generate genotype data. My genotype data is based on the reference genome: GRCh37.

While generating the rsid and variant list files and also downloading the mutation age and genetic map files, I think that the authors have downloaded the mutation age and genetic map files for GRCh37. This is something I noticed and I hope the authors have taken suitable measures to deal with this.

The preprocessing step was successful.
While generating genotype data, I am facing the following error:
[ Info: Creating output directories
Running pipelines:
optimisation => false
phenotype => false
evaluation => false
preprocessing => false
genotype => true
[ Info: Using 1 thread/s for computations
[ Info: Generating synthetic genotype data
[ Info: Processing chromosome 1
[ Info: Using the superpopulation EUR
[ Info: Using default population structure
[ Info: Writing the population file
[ Info: Creating the reference table for batch 1
Progress: 100%|█████████████████████████████████████████████████████████████████████████████████████████| Time: 0:01:06
[ Info: Writing the plink file for batch 1
ERROR: LoadError: TaskFailedException
Stacktrace:
[1] wait
@ ./task.jl:322 [inlined]
[2] threading_run(func::Function)
@ Base.Threads ./threadingconstructs.jl:34
[3] macro expansion
@ ./threadingconstructs.jl:93 [inlined]
[4] get_genostr(batch_ref_df::DataFrame, batchsize::Int64, start_haplotype::Int64, metadata::GenomicMetadata)
@ Main /opt/intervene/scripts/algorithms/genotype/write_output.jl:60
[5] write_to_plink_batch(batch_ref_df::DataFrame, prev_batchsize::Int64, cur_batchsize::Int64, batch_number::Int64, metadata::GenomicMetadata)
@ Main /opt/intervene/scripts/algorithms/genotype/write_output.jl:115
[6] create_synthetic_genotype_for_chromosome(metadata::GenomicMetadata)
@ Main /opt/intervene/scripts/algorithms/genotype/genotype_algorithm.jl:124
[7] create_synthetic_genotype(options::Dict{Any, Any})
@ Main /opt/intervene/scripts/algorithms/genotype/genotype_algorithm.jl:162
[8] macro expansion
@ /opt/intervene/scripts/run_program.jl:31 [inlined]
[9] macro expansion
@ ./timing.jl:287 [inlined]
[10] run_program(pipelines::Dict{String, Bool}, options::Dict{Any, Any})
@ Main /opt/intervene/scripts/run_program.jl:30
[11] main()
@ Main /opt/intervene/scripts/run_program.jl:113
[12] top-level scope
@ /opt/intervene/scripts/run_program.jl:117

nested task error: BoundsError: attempt to access 2204×1274030 Matrix{Int8} at index [3575, 1:5]
Stacktrace:
 [1] throw_boundserror(A::Matrix{Int8}, I::Tuple{Int64, UnitRange{Int64}})
   @ Base ./abstractarray.jl:651
 [2] checkbounds
   @ ./abstractarray.jl:616 [inlined]
 [3] _getindex
   @ ./multidimensional.jl:837 [inlined]
 [4] getindex(::Matrix{Int8}, ::Int64, ::UnitRange{Int64})
   @ Base ./abstractarray.jl:1170
 [5] macro expansion
   @ /opt/intervene/scripts/algorithms/genotype/write_output.jl:77 [inlined]
 [6] (::var"#857#threadsfor_fun#78"{DataFrame, Int64, GenomicMetadata, Progress, Dict{Int64, Matrix{Int8}}, UnitRange{Int64}})(onethread::Bool)
   @ Main ./threadingconstructs.jl:81
 [7] (::var"#857#threadsfor_fun#78"{DataFrame, Int64, GenomicMetadata, Progress, Dict{Int64, Matrix{Int8}}, UnitRange{Int64}})()
   @ Main ./threadingconstructs.jl:48

in expression starting at /opt/intervene/scripts/run_program.jl:117

Can anyone help me gain insights for this? Thanks.

@sawantaniket461
Copy link
Author

Update: I can now generate the genotype. The error was in Population file. However, now I am experiencing errors with the evaluation scripts. The error is thrown while evaluating the example as well as generated genotype dataset. The error is as follows:

[ Info: AA_T = 0.9927
[ Info: AA_S = 0.0880
[ Info: AA_TS = 0.5403
[ Info: Running Kinship evaluations
[ Info: Plotting Kinship vs IBS
ERROR: LoadError: ArgumentError: "/scratch/aniket01" is not a directory
Stacktrace:
[1] tempname(parent::String; cleanup::Bool)
@ Base.Filesystem ./file.jl:580
[2] tempname (repeats 2 times)
@ ./file.jl:580 [inlined]
[3] _show(io::IOStream, #unused#::MIME{Symbol("image/png")}, plt::Plots.Plot{Plots.GRBackend})
@ Plots ~/aniket/tools/HAPNEST/trial2/packages/Plots/fw4rv/src/backends/gr.jl:2186
[4] #invokelatest#2
@ ./essentials.jl:708 [inlined]
[5] invokelatest
@ ./essentials.jl:706 [inlined]
[6] show
@ ~/aniket/tools/HAPNEST/trial2/packages/Plots/fw4rv/src/output.jl:227 [inlined]
[7] #303
@ ~/aniket/tools/HAPNEST/trial2/packages/Plots/fw4rv/src/output.jl:6 [inlined]
[8] open(::Plots.var"#303#304"{Plots.Plot{Plots.GRBackend}}, ::String, ::Vararg{String, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Base ./io.jl:330
[9] open
@ ./io.jl:328 [inlined]
[10] png(plt::Plots.Plot{Plots.GRBackend}, fn::String)
@ Plots ~/aniket/tools/HAPNEST/trial2/packages/Plots/fw4rv/src/output.jl:4
[11] savefig(plt::Plots.Plot{Plots.GRBackend}, fn::String)
@ Plots ~/aniket/tools/HAPNEST/trial2/packages/Plots/fw4rv/src/output.jl:139
[12] plot_kinship_ibs(ibs_real::DataFrame, ibs_synt::DataFrame, results_dir::String)
@ Main /opt/intervene/scripts/evaluation/metrics/eval_kinship_detail.jl:61
[13] run_kinship(ibsfile_real::String, ibsfile_synt::String, ibsfile_cross::String)
@ Main /opt/intervene/scripts/evaluation/metrics/eval_kinship_detail.jl:144
[14] run_kinship_evaluation(ibsfile_real::String, ibsfile_synt::String, ibsfile_cross::String)
@ Main /opt/intervene/scripts/evaluation/evaluation.jl:14
[15] run_pipeline(options::Dict{Any, Any}, chromosome::Int64, superpopulation::String, metrics::Dict{Any, Any})
@ Main /opt/intervene/scripts/evaluation/evaluation.jl:137
[16] run_evaluation(options::Dict{Any, Any})
@ Main /opt/intervene/scripts/evaluation/evaluation.jl:189
[17] run_program(pipelines::Dict{String, Bool}, options::Dict{Any, Any})
@ Main /opt/intervene/scripts/run_program.jl:46
[18] main()
@ Main /opt/intervene/scripts/run_program.jl:113
[19] top-level scope
@ /opt/intervene/scripts/run_program.jl:117
in expression starting at /opt/intervene/scripts/run_program.jl:117

I cannot understand why the script demands the /scratch/aniket01 directory (it is present, I checked). Also, I think it is important to mention that I am using the university HPC. Could you let me know a work around for this? I have run the evaluation script before and it did not throw any error back then. I hope it is not much trouble and I will hear from you soon.

Regards,
Aniket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant