Quick Start

Introduction

The R package scDesign3 is a unified probabilistic framework that generates realistic in silico high-dimensional single-cell omics data of various cell states, including discrete cell types, continuous trajectories, and spatial locations by learning from real datasets. scDesign3Py is the python interface for scDesign3.

As a quick start, we demonstrate how to use scDesign3Py to simulate an scRNA-seq dataset with one continuous developmental trajectory.

Step 1: Import packages and Read in data

import pacakges

import anndata as ad
import numpy as np
import scDesign3Py

Read in data

The raw data is from the scvelo, which describes pancreatic endocrinogenesis. We pre-select the top 1000 highly variable genes and filter out some cell types to ensure a single trajectory.

To save time, we only use the top 30 genes.

data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data
View of AnnData object with n_obs × n_vars = 2087 × 30
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'

Step 2: scdesign3() performs all-in-one simulation

First create an instance of the scDesign class to use the scdesign3() method, when creating the instance, we can also set up the basic settings when initializing.

Note

If you are a windows user, please refer to Get BPPARAM section to use more than one core for parallel computing.

test = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply")
test.set_r_random_seed(123)

The function scdesign3() takes in an anndata.AnnData object with the cell covariates (such as cell types, pesudotime, or spatial coordinates) stored in the anndata.AnnData.obs, and performs the all-in-one simulation.

simu_res = test.scdesign3(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    mu_formula="s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
    family_use="nb",
    usebam=False,
    corr_formula="1",
    copula="gaussian",
)

Note

Details of creating the instance of the scDesign3 class and the usage of the scdesign3() function will be shown in tutorial section.

Step 3: Construct new anndata.AnnData object with the simulated result

Besides constructing the simulated anndata.AnnData object, we can also calculate the log transformed data for visualization.

simu_data = ad.AnnData(X=simu_res["new_count"], obs=simu_res["new_covariate"])
simu_data.layers["log_transformed"] = np.log1p(simu_data.X)
data.layers["log_transformed"] = np.log1p(data.X)

Step 4: Visualization

Note

Details of the plot process will be shown in tutorial section.

plot = scDesign3Py.plot_reduceddim(
    ref_anndata=data,
    anndata_list=simu_data,
    name_list=["Reference", "scDesign3"],
    assay_use="log_transformed",
    if_plot=True,
    color_by="pseudotime",
    n_pc=20,
    point_size=5,
)

UMAP plot

plot["p_umap"]
../_images/0ad773edc4bbb07a816f3cf9a51e9b4394ee1b2137ec6a40f2cc7a5248a7b06e.png

PCA plot

plot["p_pca"]
../_images/0e8f02493f4857c927d4fc32e80b0ad301d5e4c7d94df434edbef8ebd185027b.png