Skip to content

Query Regarding Platform-Specific Bias in PCA Analysis for WGS Data #78

@xuanxuClem

Description

@xuanxuClem

Dear Xihaoli,

I am currently working with the STAARpipeline for WGS data analysis, as outlined in the STAARpipeline-Tutorial. My dataset comes from a Chinese population, including 80 cases and 800 healthy controls. The WGS data for the 80 cases were generated on the Illumina sequencing platform, while the 800 healthy control samples were sequenced using the BGI-T7 platform.

I have completed the variant calling and VQSR quality control for all 880 samples and am now progressing through the steps of the STAARpipeline. However, when I reached the step for generating the sparse Genetic Relatedness Matrix (GRM), I conducted a PCA analysis on the population and visualized the results. I observed that the first two principal components (PC1 and PC2) clearly separate the samples by sequencing platform, with a strong technical bias associated with the different platforms. Below is the PCA plot showing this separation: I used different colors to represent the two platforms in the PCA plot. As I am aware that the genetic backgrounds of all my 918 samples are similar (all from a Chinese population), I expected that any clear separation of the points by sequencing platform in the PCA plot could indicate a technical bias.

Image

As both platforms (Illumina and BGI-T7) have slight differences in GC content preferences, with the BGI-T7 platform having a 1% stronger GC bias, I am concerned that this could lead to platform-specific biases in the variant calling results.

Could you please advise on how to address this platform-specific bias? Are there any recommended methods within STAARpipeline or elsewhere to correct for this technical bias?

Thank you very much for your time and assistance. I look forward to your suggestions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions