You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m working with single-cell RNA-seq data processed using kallisto, and I have estimated counts (sometimes referred to as pseudocounts). These are not raw UMI counts but the expected transcript/gene counts generated by kallisto's EM algorithm. These are the files I have:
!Sample_data_processing = singlecell_rnaseq_gene_counts.tsv.gz (pseudocounts of each gene)
!Sample_data_processing = singlecell_rnaseq_transcript_counts.tsv.gz (pseudocounts of each transcript)
!Sample_data_processing = singlecell_rnaseq_gene_tpm.tsv.gz (transcripts per million (TPM) of each gene)
!Sample_data_processing = singlecell_rnaseq_transcript_tpm.tsv.gz (transcripts per million (TPM) of each transcript)
My goal is to use this data in Seurat for clustering and differential expression analysis.
Questions
Is it acceptable to use these pseudocounts as input to Seurat (e.g., for Smart-seq2-style data)?
Should I round these estimated counts before running SCTransform or log-normalization?
Are TPMs from kallisto also supported, or is that discouraged?
I'm aware that tools like tximport convert these to DE-friendly formats for edgeR and DESeq2, but it's unclear what Seurat prefers for full-length protocols.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I’m working with single-cell RNA-seq data processed using kallisto, and I have estimated counts (sometimes referred to as pseudocounts). These are not raw UMI counts but the expected transcript/gene counts generated by kallisto's EM algorithm. These are the files I have:
!Sample_data_processing = singlecell_rnaseq_gene_counts.tsv.gz (pseudocounts of each gene)
!Sample_data_processing = singlecell_rnaseq_transcript_counts.tsv.gz (pseudocounts of each transcript)
!Sample_data_processing = singlecell_rnaseq_gene_tpm.tsv.gz (transcripts per million (TPM) of each gene)
!Sample_data_processing = singlecell_rnaseq_transcript_tpm.tsv.gz (transcripts per million (TPM) of each transcript)
My goal is to use this data in Seurat for clustering and differential expression analysis.
Questions
I'm aware that tools like tximport convert these to DE-friendly formats for edgeR and DESeq2, but it's unclear what Seurat prefers for full-length protocols.
Thank you in advance for the clarification!
Beta Was this translation helpful? Give feedback.
All reactions