Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 questions about the synthetic dataset available on BioStudies #32

Open
alicegranb opened this issue Jul 17, 2024 · 0 comments
Open

2 questions about the synthetic dataset available on BioStudies #32

alicegranb opened this issue Jul 17, 2024 · 0 comments

Comments

@alicegranb
Copy link

alicegranb commented Jul 17, 2024

Hello,
Thanks for this great resource!

(1) During our attempt to convert the chromosome 1 PLINK files to VCF format or apply filtering, we encountered a size issue with the .bed file, resulting in the following error message:
"Error: Invalid .bed file size (expected 134448552003 bytes)."

Upon reviewing the file, we noticed that the size indicated at the beginning of the download was 134450064003 bytes, which aligns with the actual size of the downloaded file.
"Length: 134450064003 (125G) [application/vnd.realvnc.bed"

Our hypothesis is that the BED file is corrupt and that the information it contains does not correspond to that in the FAM and BIM files.

The conversion/filtering worked for all other chromosomes PLINK files.

Could you kindly assist us in resolving this matter? Can you check on your side what is the true BED file size? Any guidance or correction you could provide would be greatly appreciated.

(2) Some variants have inverted REF and ALT columns, while the ID column is accurate. What could be the cause this?

Thank you in advance for your support.
Alice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant