Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output of -selection #70

Open
shirasegoby opened this issue Jan 29, 2023 · 5 comments
Open

output of -selection #70

shirasegoby opened this issue Jan 29, 2023 · 5 comments

Comments

@shirasegoby
Copy link

Hello,

Thank you very much for this great tool. I used the pcangsd to find outlier SNP loci.

I used "-selection" flag and obtained an output. The output contained one column. Does this mean that only PC1 is significant and selection statistics along PC1 were outputted?

@DanielOsmond
Copy link

To save adding another thread, a similar question here. I'm a little confused as to what the output of the selection.npy file is actually reporting?

For context, I'm running PCAngsd with this command: pcangsd --beagle {input_beagle}--selection --minMaf 0.05 --threads 16 -o $BASEDIR'angsd/pcangsd_'$PREFIX --sites_save

And get an output with this head:

D <- npyLoad("pcangsd_full_snp_selection.selection.npy") # Reads PC based selection statistics
View(D)
head(D)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.550135195 0.41988349 2.86682153 0.01884845 0.1577135772 0.24476774 0.27481931 0.282718897
[2,] 0.004351228 1.18976772 0.01015188 0.01219513 0.2708614469 0.27879831 0.68683660 1.323182344
[3,] 1.946982026 0.75790089 0.32707891 0.01051806 0.0303267129 0.01896533 0.07251304 0.008901663
[4,] 0.112891927 3.26745486 1.69801426 0.40741089 0.0003703941 3.15582776 1.04348207 2.947828531
[5,] 0.001022269 0.05364039 0.03050057 0.28526935 0.6652516723 0.65088129 1.93980658 2.141073942
[6,] 0.449037045 0.23769799 0.14198528 1.01102042 0.0241993852 1.01173162 2.27705216 1.996245265

I presume each of the columns is a stat relating to a different PC axis but what is this stat, does it represent PCAngsd-s1/s2? Sorry if this is a silly question but I've scoured through the paper, supp materials and postings here and struggling to get a resolution.

Thank you!

Dan

@Rosemeis
Copy link
Owner

Hi both of you!
Sorry for the late response but I just came back from vacation. :-)

@shirasegoby
There is only one PC outputted, as there was only one PC detected (or you might have manually set it to 1) to capture population structure such that only one PC is used to detect selection. In the newer version of PCAngsd, you can still perform selection scans for more PCs by using "--selection_e INT".

@DanielOsmond
Yes exactly, each column refer to selection stats of each PC that was detected to capture population structure. The selection scan details are unfortunately not part of the original paper but it is in this one:
https://doi.org/10.1186/s12859-021-04375-2
So the selection statistics are chi-square distributed with 1 degree of freedom.

Please feel free to ask more questions! :-)

Best,
Jonas

@shirasegoby
Copy link
Author

Hi Jonas,

I hope you had a good holiday. I understand well. Thank you very much!!

Best,
Shotaro

@DanielOsmond
Copy link

Thanks for the reply Jonas, that's exactly what I was after. Thank you for helping!

@akimmitt
Copy link

Hello! I similarly was getting 1 column of output for my -pcadapt function when I did not specify "--selection_e"
I added to my script "--selection_e 2", and while I'm now getting two columns of data, the columns have different z-scores for PC1 compared to the original PC1 (when --selection_e was not specified). Why would these values be different? Are z-scores not be calculated for the PCs independently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants