Comparison of scRNA datasets via integration analysis #9954
Unanswered
sketch123456
asked this question in
Q&A
Replies: 1 comment
-
In addition, some of the other papers I plan to compare with have separate expression matrices for each condition (1 for SD and 1 for NS). I know I can't use the integrated expression values to perform DE analysis, so would I just merge and add a column for which condition it is in, than just use FindMarkers()? Suppose I want to integrate my dataset and their dataset. Would I then just have to integrate 3 expression matrices for combined cluster analysis, then go back to the normalized counts for DE analysis? I'm guessing this is a relatively simple task, but I'm still new to bioinformatics so I would greatly appreciate any information. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I plan on comparing some other scRNA datasets to my own to see if the DEGs match between papers as a way of validating my own results. For context, I am working on cortex data in sleep deprived vs normal conditions and the other paper has the same structure. I have a few clusters: astrocyte, microglia, endothelial cells, oligodendrocytes, and gabaergic/glutamatergic neurons. In my reading so far, a majority of existing papers perform integration via Seurat before performing downstream analysis to see what DEGs are enriched. I do not want to do this, as my interest is a direct comparison between my results and the results of the other paper. So far, the general reccomendation is to process their FASTQ files through my pipeline to control for any batch effects. I have questions regarding the optimal workflow for this comparison, especially with regards to integration analysis.
I plan on running their FASTQ files through my pipeline to help compare. Would integration analysis still be necessary after this to perform batch correction? I know I am not supposed to use batch corrected values to perform DE analysis, so would I just compare sleep to normal for my data and then see what DEGs match with sleep to normal for their data?
Additionally, I wanted to ask if using Pearson/Spearman correlation of either average gene expression or fold change between cell types would be a useful metric to consider, i.e. comparing the expression/log2fc vector between my astrocytes and their astrocytes.
So far, the workflow I have planned is:
Run other paper's data through my peipline and add labels (dataset1 vs. dataset2). I plan on performing integration analysis to see how they cluster on the same umap plot, but I will use the original normalized counts (not batch corrected) to perform DE analysis and compare SD vs normal for my data and repeat for their data. Then, I plan to perform GSEA/KEGG on the resulting DEGs for my data and for their data and then compare.
Any advice/comments about this proposed workflow would be much appreciated. Finally, as a more general question, what help does batch correction provide in this scenario, if the DE analysis must be conducted on the actual expression matrix anyways?
Beta Was this translation helpful? Give feedback.
All reactions