Comparison of scRNA datasets via integration analysis #9954

sketch123456 · 2025-06-19T07:03:23Z

sketch123456
Jun 19, 2025

I plan on comparing some other scRNA datasets to my own to see if the DEGs match between papers as a way of validating my own results. For context, I am working on cortex data in sleep deprived vs normal conditions and the other paper has the same structure. I have a few clusters: astrocyte, microglia, endothelial cells, oligodendrocytes, and gabaergic/glutamatergic neurons. In my reading so far, a majority of existing papers perform integration via Seurat before performing downstream analysis to see what DEGs are enriched. I do not want to do this, as my interest is a direct comparison between my results and the results of the other paper. So far, the general reccomendation is to process their FASTQ files through my pipeline to control for any batch effects. I have questions regarding the optimal workflow for this comparison, especially with regards to integration analysis.

Is integration analysis necessary in this context?
I plan on running their FASTQ files through my pipeline to help compare. Would integration analysis still be necessary after this to perform batch correction? I know I am not supposed to use batch corrected values to perform DE analysis, so would I just compare sleep to normal for my data and then see what DEGs match with sleep to normal for their data?

Additionally, I wanted to ask if using Pearson/Spearman correlation of either average gene expression or fold change between cell types would be a useful metric to consider, i.e. comparing the expression/log2fc vector between my astrocytes and their astrocytes.

Can anyone point me towards some relevant literature/documentation that may help with my questions? I am pretty new to the field, so I need as much help as I can get.

So far, the workflow I have planned is:
Run other paper's data through my peipline and add labels (dataset1 vs. dataset2). I plan on performing integration analysis to see how they cluster on the same umap plot, but I will use the original normalized counts (not batch corrected) to perform DE analysis and compare SD vs normal for my data and repeat for their data. Then, I plan to perform GSEA/KEGG on the resulting DEGs for my data and for their data and then compare.

Any advice/comments about this proposed workflow would be much appreciated. Finally, as a more general question, what help does batch correction provide in this scenario, if the DE analysis must be conducted on the actual expression matrix anyways?

sketch123456 · 2025-06-19T21:12:53Z

sketch123456
Jun 19, 2025
Author

In addition, some of the other papers I plan to compare with have separate expression matrices for each condition (1 for SD and 1 for NS). I know I can't use the integrated expression values to perform DE analysis, so would I just merge and add a column for which condition it is in, than just use FindMarkers()?

Suppose I want to integrate my dataset and their dataset. Would I then just have to integrate 3 expression matrices for combined cluster analysis, then go back to the normalized counts for DE analysis? I'm guessing this is a relatively simple task, but I'm still new to bioinformatics so I would greatly appreciate any information.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparison of scRNA datasets via integration analysis #9954

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Comparison of scRNA datasets via integration analysis #9954

Uh oh!

sketch123456 Jun 19, 2025

Replies: 1 comment

Uh oh!

sketch123456 Jun 19, 2025 Author

sketch123456
Jun 19, 2025

sketch123456
Jun 19, 2025
Author