Skip to content

Why those cells are unassigned? #106

@tewengtong

Description

@tewengtong

Dear team,

Thank you for providing the tool to demultiplex. There are 16 donors in one pool in our data. We have paired genotype to demultiplex. My input vcf contains 10w exonic SNPs. I ran Vireo with Mode 1a, it seems there are still some unassigned cells. Some of them are low-coverage cells, but others look fine to me according to their metrics:

[vireo] Loading cell folder ...
[vireo] Loading donor VCF file ...
[vireo] 106759 out 106759 variants matched to donor VCF
[vireo] Demultiplex 27546 cells to 16 donors with 106759 variants.
[vireo] lower bound ranges [-11356250.4, -11356250.4, -11356250.4]
[vireo] allelic rate mean and concentrations:
[[0.086 0.456 0.857]]
[[13504538.2 8695336.1 4566475.7]]
[vireo] donor size before removing doublets:
donor0 donor1 donor2 donor3 donor4 donor5 donor6 donor7 donor8 donor9 donor10 donor11 donor12 donor13 donor14 donor15
2362 2047 3153 805 694 2378 672 2050 1355 2735 1024 2085 1059 1910 2332 885
[vireo] 49612 out 106759 SNPs selected for ambient RNA detection: ELBO_gain > 55.3
[vireo] Ambient RNA time: 29202.8 sec
[vireo] final donor size:
H073 H078 H111 H134 H145 H152 H204 H288 H345 H346 H351 H356 H358 H360 H388 H389 doublet unassigned
1165 1636 1382 1728 1223 363 1156 1543 1135 364 1491 546 752 679 496 316 7276 4295
[vireo] All done: 488 min 57.0 sec

unassigned.filter[1:20,c("donor_id","prob_max","prob_doublet","n_vars")]
donor_id prob_max prob_doublet n_vars
AAACCGACAAATTGCC-1 unassigned 3.20e-01 0.680 1438
AAACGTAAGCTCGTGG-1 unassigned 6.33e-01 0.253 349
AAACTAGAGGCCCTGA-1 unassigned 7.44e-04 0.499 209
AAAGCCGAGCGAATTT-1 unassigned 3.90e-17 0.544 928
AAAGCCTGTATTGAGC-1 unassigned 1.70e-01 0.264 126
AAAGCGAGTACCTTGC-1 unassigned 1.28e-01 0.640 205
AAAGCGTAGCGCACAT-1 unassigned 3.99e-03 0.519 282
AAAGCTAAGCCCTAGG-1 unassigned 6.60e-01 0.340 1077
AAAGGAGAGAAACGCG-1 unassigned 2.30e-01 0.250 105
AAAGGCCCACCGCTAA-1 unassigned 3.27e-01 0.245 80
AAAGGGGCAGCGGTAG-1 unassigned 6.71e-01 0.315 217
AAAGGTATCCAGGATC-1 unassigned 1.61e-125 0.662 4138

My question is: Why those cells are still estimated as unassigned cells even if the n_vars are high? Are those doublets? Should I include those cells for downstream analysis or just delete those cells (should around ~2,000 cells, 10% of the whole library)?

Thank you so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions