Selecting only a subset of individuals for writing the vcf but getting n-1 individuals #3132
-
I have simulated my samples with SLiM and am trying to write the vcf file with only a subset of samples (100) with their sample ids. The issue is that I'm getting 99 (n-1) samples written in the vcf, everytime. Here is how I am doing it. I'm subsetting the tree and then writing the samples with them. Is there a more efficient way of doing it? Is there any particular reason I am getting n-1 samples instead? writing a new vcf file with just 100 highly admixed sampleswith open("path/to/sample_list", "r") as f: Assigning names to individuals##individual_names = [f"Ind_{i}" for i in subset_samples] Filtering the tree sequence to keep only selected samplessubset_ts = ts.simplify(samples=subset_samples, keep_unary=True) with open("path/to/vcf", "w") as vcf_file: I would highly appreciate some help and suggestions. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 13 replies
-
One possible problem may be that your "subset samples" are Individual ids and not Node ids? The latter is required for simplification. But it is hard to tell w/o knowing what the contents of the file that you are opening. |
Beta Was this translation helpful? Give feedback.
-
I tried verifying that and the sample_list are indeed sample ids and not node ids. It is a single column file containing a list of 100 samples. I have selected these ids from tspop output. If I understand it right I should be obtaining the node ids with: tspop.get_pop_ancestry(ts, census_time=n).ancestry_table |
Beta Was this translation helpful? Give feedback.
-
Please can you create a self-contained minimal example? It is hard to understand what you are doing with the snippets. |
Beta Was this translation helpful? Give feedback.
-
What does |
Beta Was this translation helpful? Give feedback.
I think this last point is the problem - see my answer above. Said another way, you're passing in a list of 100 nodes (i.e., chromosomes) to
simplify
; and by chance two of those happen to be in the same individual; so what happens is that you get 98 individuals with only one chromosome left and one individual with both chromosomes.