Functionality 1: detect dubious metacells for a given metacell partition
Pan Liu
26 November 2024
mcRigor-1-detect-dubmc.Rmd
Introduction
In this tutorial, we will show how to use mcRigor to detect dubious metacells for a given metacell partition. We will demonstrate this functionality of mcRigor on a semi-synthetic single cell RNA sequencing (scRNA-seq) dataset with known ground truth trustworthiness of metacells.
Input preparation
Two main inputs are required for this functionality: 1. the raw
scRNA-seq data and 2. a given metacell partition generated by either
existing metacell partitioning methods or ad hoc approaches. The raw
scRNA-seq data needs to be provided as a Seurat object,
obj_singlecell
. The semi-synthetic scRNA-seq data, whose
generation process is described in Liu
and Li, 2024, stored as a rds file syn.rds
, is
available with the mcRigor package as an example. We first load the
data.
sc_dir = system.file('extdata', 'syn.rds', package = 'mcRigor')
obj_singlecell= readRDS(file = sc_dir)
obj_singlecell
#> An object of class Seurat
#> 2000 features across 13400 samples within 1 assay
#> Active assay: RNA (2000 features, 2000 variable features)
#> 3 layers present: counts, data, scale.data
#> 2 dimensional reductions calculated: pca, umap
The metacell partition should be provided as a dataframe,
cell_membership
, showing the assignment of single cells to
metacells. Specifically, this dataframe has only one column and each row
of the dataframe represents one single cell. The metacell partitions for
the semi-synthetic scRNA-seq data generated by the SEACells method (Persad et al.,
2023), stored as a csv file
seacells_cell_membership_rna_syn.csv
, is available with the
mcRigor package as an example. This csv file contains series of metacell
partitions, which were generated under different granularity levels.
Note that granularity level,
,
is a key parameter for metacell partitioning and is defined as the ratio
of the number of single cells to the number of metacells. In this
tutorial, we focus on the metacell partition given by granularity level
.
membership_dir = system.file('extdata', 'seacells_cell_membership_rna_syn.csv', package = 'mcRigor')
cell_membership_all <- read.csv(file = membership_dir, check.names = F, row.names = 1)
cell_membership <- cell_membership_all['50']
head(cell_membership)
#> 50
#> 1_Cell1 mc50-allcells-SEACell-98
#> 2_Cell1 mc50-allcells-SEACell-98
#> 3_Cell1 mc50-allcells-SEACell-98
#> 4_Cell1 mc50-allcells-SEACell-98
#> 5_Cell1 mc50-allcells-SEACell-98
#> 6_Cell1 mc50-allcells-SEACell-98
Detection of dubious metacells
We call the function mcRigor_DETECT
to detect dubious
metacells for the metacell partition represented by
cell_membership
.
detect_res = mcRigor_DETECT(obj_singlecell = obj_singlecell, cell_membership = cell_membership)
The Seurat object of metacells are stored in the
obj_metacell
field of the output
detect_res
.The mcRigor detection results are recorded in
the mc_res
field of the output detect_res
as
well as the metadata of the Seurat object with name
mcRigor
.
table(detect_res$mc_res)
#>
#> dubious trustworthy
#> 57 211
obj_metacell = detect_res$obj_metacell
head(obj_metacell$mcRigor)
#> mc50-allcells-SEACell-0 mc50-allcells-SEACell-1 mc50-allcells-SEACell-10
#> "trustworthy" "dubious" "dubious"
#> mc50-allcells-SEACell-100 mc50-allcells-SEACell-101 mc50-allcells-SEACell-102
#> "trustworthy" "trustworthy" "trustworthy"
Visualization
The function mcRigor_projection
can draw the metacells
projected to the two-dimensional embedding space of single cells and
mark the detected dubious metacells
sc_membership = obj_metacell@misc$cell_membership$Metacell
names(sc_membership) = rownames(obj_metacell@misc$cell_membership)
plot = mcRigor_projection(obj_singlecell = obj_singlecell, sc_membership = sc_membership,
color_field = 'celltype.l1',
dub_mc_test.label = T, test_stats = detect_res$TabMC, Thre = detect_res$thre)
plot
The dubious metacells are marked by red circles while the trustworthy metacells are with black circles.
Session information
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_3.5.1 Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4
#> [5] mcRigor_1.0 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_1.8.9 magrittr_2.0.3
#> [4] spatstat.utils_3.1-1 farver_2.1.2 rmarkdown_2.29
#> [7] fs_1.6.5 ragg_1.3.3 vctrs_0.6.5
#> [10] ROCR_1.0-11 spatstat.explore_3.3-3 htmltools_0.5.8.1
#> [13] sass_0.4.9 sctransform_0.4.1 parallelly_1.39.0
#> [16] KernSmooth_2.23-24 bslib_0.8.0 htmlwidgets_1.6.4
#> [19] desc_1.4.3 ica_1.0-3 plyr_1.8.9
#> [22] plotly_4.10.4 zoo_1.8-12 cachem_1.1.0
#> [25] igraph_2.1.1 mime_0.12 lifecycle_1.0.4
#> [28] pkgconfig_2.0.3 Matrix_1.7-1 R6_2.5.1
#> [31] fastmap_1.2.0 fitdistrplus_1.2-1 future_1.34.0
#> [34] shiny_1.9.1 digest_0.6.37 colorspace_2.1-1
#> [37] patchwork_1.3.0 tensor_1.5 RSpectra_0.16-2
#> [40] irlba_2.3.5.1 textshaping_0.4.0 labeling_0.4.3
#> [43] progressr_0.15.1 fansi_1.0.6 spatstat.sparse_3.1-0
#> [46] httr_1.4.7 polyclip_1.10-7 abind_1.4-8
#> [49] compiler_4.4.2 withr_3.0.2 fastDummies_1.7.4
#> [52] MASS_7.3-61 tools_4.4.2 lmtest_0.9-40
#> [55] httpuv_1.6.15 future.apply_1.11.3 goftest_1.2-3
#> [58] glue_1.8.0 nlme_3.1-166 promises_1.3.0
#> [61] grid_4.4.2 Rtsne_0.17 cluster_2.1.6
#> [64] reshape2_1.4.4 generics_0.1.3 gtable_0.3.6
#> [67] spatstat.data_3.1-4 tidyr_1.3.1 data.table_1.16.2
#> [70] utf8_1.2.4 spatstat.geom_3.3-4 RcppAnnoy_0.0.22
#> [73] ggrepel_0.9.6 RANN_2.6.2 pillar_1.9.0
#> [76] stringr_1.5.1 spam_2.11-0 RcppHNSW_0.6.0
#> [79] later_1.3.2 splines_4.4.2 dplyr_1.1.4
#> [82] lattice_0.22-6 survival_3.7-0 deldir_2.0-4
#> [85] tidyselect_1.2.1 miniUI_0.1.1.1 pbapply_1.7-2
#> [88] knitr_1.49 gridExtra_2.3 bookdown_0.41
#> [91] scattermore_1.2 xfun_0.49 matrixStats_1.4.1
#> [94] stringi_1.8.4 lazyeval_0.2.2 yaml_2.3.10
#> [97] evaluate_1.0.1 codetools_0.2-20 tibble_3.2.1
#> [100] BiocManager_1.30.25 cli_3.6.3 uwot_0.2.2
#> [103] xtable_1.8-4 reticulate_1.40.0 systemfonts_1.1.0
#> [106] munsell_0.5.1 jquerylib_0.1.4 Rcpp_1.0.13-1
#> [109] globals_0.16.3 spatstat.random_3.3-2 png_0.1-8
#> [112] spatstat.univar_3.1-1 parallel_4.4.2 pkgdown_2.1.1
#> [115] dotCall64_1.2 listenv_0.9.1 viridisLite_0.4.2
#> [118] scales_1.3.0 ggridges_0.5.6 leiden_0.4.3.1
#> [121] purrr_1.0.2 rlang_1.1.4 cowplot_1.1.3