Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SketchData ignores ncells arguments and produces empty clusters #9551

Open
sbamopoulos opened this issue Dec 13, 2024 · 0 comments
Open

SketchData ignores ncells arguments and produces empty clusters #9551

sbamopoulos opened this issue Dec 13, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@sbamopoulos
Copy link

Hello,

thank you for this great tool.

I have been using SketchData to process a Xenium spatial dataset of ~2.4 million cells with 431 features. Here is my code:

xenium <- readRDS("path_to_file")
xenium <- NormalizeData(xenium)
xenium <- FindVariableFeatures(xenium)
xenium <- SketchData(object = xenium, ncells = 50000, method = "LeverageScore", sketched.assay = "sketch") 

this produces an object with >50000 cells:

> length(Cells(xenium))
[1] 684802

in addition after clustering, some clusters do not contain any or very few cells:

xenium <- RunPCA(xenium, npcs = 30, reduction.name = "pca_unintegrated")
xenium <- FindNeighbors(xenium, reduction = "pca_unintegrated", dims = 1:30)
xenium <- FindClusters(xenium, cluster.name = "clusters_unintegrated")
xenium <- RunUMAP(xenium, dims = 1:30, reduction = "pca_unintegrated", reduction.name = "umap_unintegrated")

table(xenium$clusters_unintegrated)

    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24 
49008 39512 38577 38276 37232 36553 36055 36043 32888 31845 31076 26943 22405 21973 21911 20844 20803 19206 12336  8339  7972  7678  7608  6684 
   25    26    27    28    29    30    31    32 
 4155  2190  2072   214     2     2     2     0 

The code is based on the documentation you provide for using SketchData.
I would appreciate any input/explanation regarding this unexpected behavior.

Best
Stefan

> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_Germany.utf8  LC_CTYPE=English_Germany.utf8    LC_MONETARY=English_Germany.utf8 LC_NUMERIC=C                    
[5] LC_TIME=English_Germany.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dittoSeq_1.14.3    ggplot2_3.5.1      scCustomize_2.1.2  openxlsx_4.2.7.1   export_0.3.0       BPCells_0.1.0      Seurat_5.0.3      
[8] SeuratObject_5.0.1 sp_2.1-4          

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22            splines_4.3.3               later_1.3.2                 bitops_1.0-8                tibble_3.2.1               
  [6] polyclip_1.10-7             janitor_2.2.0               fastDummies_1.7.4           lifecycle_1.0.4             globals_0.16.3             
 [11] lattice_0.22-6              MASS_7.3-60.0.1             backports_1.5.0             magrittr_2.0.3              plotly_4.10.4              
 [16] rmarkdown_2.28              httpuv_1.6.15               glmGamPoi_1.14.3            sctransform_0.4.1           spam_2.10-0                
 [21] zip_2.3.1                   askpass_1.2.0               spatstat.sparse_3.1-0       reticulate_1.39.0           cowplot_1.1.3              
 [26] pbapply_1.7-2               RColorBrewer_1.1-3          lubridate_1.9.3             pkgload_1.4.0               abind_1.4-8                
 [31] zlibbioc_1.48.2             Rtsne_0.17                  GenomicRanges_1.54.1        purrr_1.0.2                 BiocGenerics_0.48.1        
 [36] RCurl_1.98-1.14             rgl_1.3.1                   gdtools_0.4.0               circlize_0.4.16             GenomeInfoDbData_1.2.11    
 [41] IRanges_2.36.0              S4Vectors_0.40.2            ggrepel_0.9.6               irlba_2.3.5.1               listenv_0.9.1              
 [46] spatstat.utils_3.1-0        pheatmap_1.0.12             goftest_1.2-3               RSpectra_0.16-2             spatstat.random_3.3-2      
 [51] fitdistrplus_1.2-1          parallelly_1.38.0           DelayedMatrixStats_1.24.0   DelayedArray_0.28.0         leiden_0.4.3.1             
 [56] codetools_0.2-20            xml2_1.3.6                  shape_1.4.6.1               tidyselect_1.2.1            farver_2.1.2               
 [61] matrixStats_1.4.1           stats4_4.3.3                base64enc_0.1-3             spatstat.explore_3.3-2      jsonlite_1.8.9             
 [66] progressr_0.14.0            ggridges_0.5.6              survival_3.5-8              systemfonts_1.1.0           tools_4.3.3                
 [71] ragg_1.3.3                  ica_1.0-3                   Rcpp_1.0.13                 glue_1.7.0                  SparseArray_1.2.4          
 [76] gridExtra_2.3               xfun_0.47                   MatrixGenerics_1.14.0       GenomeInfoDb_1.38.8         dplyr_1.1.4                
 [81] withr_3.0.1                 fastmap_1.2.0               fansi_1.0.6                 openssl_2.2.2               digest_0.6.37              
 [86] timechange_0.3.0            R6_2.5.1                    mime_0.12                   ggprism_1.0.5               textshaping_0.4.0          
 [91] colorspace_2.1-1            scattermore_1.2             tensor_1.5                  spatstat.data_3.1-2         utf8_1.2.4                 
 [96] tidyr_1.3.1                 generics_0.1.3              fontLiberation_0.1.0        data.table_1.16.0           S4Arrays_1.2.1             
[101] httr_1.4.7                  htmlwidgets_1.6.4           uwot_0.2.2                  pkgconfig_2.0.3             gtable_0.3.5               
[106] lmtest_0.9-40               SingleCellExperiment_1.24.0 XVector_0.42.0              htmltools_0.5.8.1           fontBitstreamVera_0.1.1    
[111] dotCall64_1.1-1             rvg_0.3.4                   Biobase_2.62.0              scales_1.3.0                png_0.1-8                  
[116] snakecase_0.11.1            spatstat.univar_3.0-1       knitr_1.48                  rstudioapi_0.16.0           reshape2_1.4.4             
[121] uuid_1.2-1                  nlme_3.1-164                zoo_1.8-12                  GlobalOptions_0.1.2         flextable_0.9.6            
[126] stringr_1.5.1               KernSmooth_2.23-22          parallel_4.3.3              miniUI_0.1.1.1              vipor_0.4.7                
[131] ggrastr_1.0.2               pillar_1.9.0                grid_4.3.3                  vctrs_0.6.5                 RANN_2.6.2                 
[136] promises_1.3.0              xtable_1.8-4                cluster_2.1.6               paletteer_1.6.0             beeswarm_0.4.0             
[141] evaluate_1.0.0              cli_3.6.3                   compiler_4.3.3              crayon_1.5.3                rlang_1.1.4                
[146] future.apply_1.11.2         labeling_0.4.3              rematch2_2.1.2              forcats_1.0.0               plyr_1.8.9                 
[151] ggbeeswarm_0.7.2            stringi_1.8.4               viridisLite_0.4.2           deldir_2.0-4                munsell_0.5.1              
[156] lazyeval_0.2.2              spatstat.geom_3.3-3         fontquiver_0.2.1            Matrix_1.6-5                RcppHNSW_0.6.0             
[161] patchwork_1.3.0             sparseMatrixStats_1.14.0    future_1.34.0               devEMF_4.5                  stargazer_5.2.3            
[166] shiny_1.9.1                 SummarizedExperiment_1.32.0 ROCR_1.0-11                 igraph_2.0.3                broom_1.0.6                
[171] officer_0.6.6    
@sbamopoulos sbamopoulos added the bug Something isn't working label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant