output of -selection #70

shirasegoby · 2023-01-29T02:51:59Z

Hello,

Thank you very much for this great tool. I used the pcangsd to find outlier SNP loci.

I used "-selection" flag and obtained an output. The output contained one column. Does this mean that only PC1 is significant and selection statistics along PC1 were outputted?

DanielOsmond · 2023-02-09T13:33:17Z

To save adding another thread, a similar question here. I'm a little confused as to what the output of the selection.npy file is actually reporting?

For context, I'm running PCAngsd with this command: pcangsd --beagle {input_beagle}--selection --minMaf 0.05 --threads 16 -o $BASEDIR'angsd/pcangsd_'$PREFIX --sites_save

And get an output with this head:

D <- npyLoad("pcangsd_full_snp_selection.selection.npy") # Reads PC based selection statistics
View(D)
head(D)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.550135195 0.41988349 2.86682153 0.01884845 0.1577135772 0.24476774 0.27481931 0.282718897
[2,] 0.004351228 1.18976772 0.01015188 0.01219513 0.2708614469 0.27879831 0.68683660 1.323182344
[3,] 1.946982026 0.75790089 0.32707891 0.01051806 0.0303267129 0.01896533 0.07251304 0.008901663
[4,] 0.112891927 3.26745486 1.69801426 0.40741089 0.0003703941 3.15582776 1.04348207 2.947828531
[5,] 0.001022269 0.05364039 0.03050057 0.28526935 0.6652516723 0.65088129 1.93980658 2.141073942
[6,] 0.449037045 0.23769799 0.14198528 1.01102042 0.0241993852 1.01173162 2.27705216 1.996245265

I presume each of the columns is a stat relating to a different PC axis but what is this stat, does it represent PCAngsd-s1/s2? Sorry if this is a silly question but I've scoured through the paper, supp materials and postings here and struggling to get a resolution.

Thank you!

Dan

Rosemeis · 2023-02-16T11:35:26Z

Hi both of you!
Sorry for the late response but I just came back from vacation. :-)

@shirasegoby
There is only one PC outputted, as there was only one PC detected (or you might have manually set it to 1) to capture population structure such that only one PC is used to detect selection. In the newer version of PCAngsd, you can still perform selection scans for more PCs by using "--selection_e INT".

@DanielOsmond
Yes exactly, each column refer to selection stats of each PC that was detected to capture population structure. The selection scan details are unfortunately not part of the original paper but it is in this one:
https://doi.org/10.1186/s12859-021-04375-2
So the selection statistics are chi-square distributed with 1 degree of freedom.

Please feel free to ask more questions! :-)

Best,
Jonas

shirasegoby · 2023-02-17T16:07:16Z

Hi Jonas,

I hope you had a good holiday. I understand well. Thank you very much!!

Best,
Shotaro

DanielOsmond · 2023-02-17T16:17:31Z

Thanks for the reply Jonas, that's exactly what I was after. Thank you for helping!

akimmitt · 2023-09-19T18:46:04Z

Hello! I similarly was getting 1 column of output for my -pcadapt function when I did not specify "--selection_e"
I added to my script "--selection_e 2", and while I'm now getting two columns of data, the columns have different z-scores for PC1 compared to the original PC1 (when --selection_e was not specified). Why would these values be different? Are z-scores not be calculated for the PCs independently?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output of -selection #70

output of -selection #70

shirasegoby commented Jan 29, 2023

DanielOsmond commented Feb 9, 2023

Rosemeis commented Feb 16, 2023

shirasegoby commented Feb 17, 2023

DanielOsmond commented Feb 17, 2023

akimmitt commented Sep 19, 2023

output of -selection #70

output of -selection #70

Comments

shirasegoby commented Jan 29, 2023

DanielOsmond commented Feb 9, 2023

Rosemeis commented Feb 16, 2023

shirasegoby commented Feb 17, 2023

DanielOsmond commented Feb 17, 2023

akimmitt commented Sep 19, 2023