Plot for visualization

Introduction

In this section, we will show how to use function plot_reduceddim() to realize dimension reduction and automatically plot the result.

To get detailed information of the input and output of the function, please check API.

Step 1: Import packages and Read in data

import packages

import anndata as ad
import numpy as np
import pandas as pd
import scDesign3Py

Read in data

The raw data is from the scvelo and we only choose top 30 genes to save time.

data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data
View of AnnData object with n_obs × n_vars = 2087 × 30
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'

Step 2: Use scDesign3 class to simulate reads for comparison

# create the instance and set the parallel method
test = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply", return_py=True)
test.set_r_random_seed(123)

# all-in-one simulation
simu_res = test.scdesign3(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    mu_formula="s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
    family_use="nb",
    usebam=False,
    corr_formula="1",
    copula="gaussian",
)

Step 3: Use the simulated counts to construct the new anndata.AnnData object

Besides constructing the simulated anndata.AnnData object, we can also calculate the log transformed data for visualization.

simu_data = ad.AnnData(X=simu_res["new_count"], obs=simu_res["new_covariate"])
simu_data.layers["log_transformed"] = np.log1p(simu_data.X)
data.layers["log_transformed"] = np.log1p(data.X)

Step 4: Visualization

Please make sure all anndata.AnnData objects provided in ref_anndata and anndata_list has similar assay name structure.

For example, all involved count matrix should all be stored either in anndata.AnnData.X or anndata.AnnData.layers[assay_use] and all share the same name.

plot = scDesign3Py.plot_reduceddim(
    ref_anndata=data,
    anndata_list=simu_data,
    name_list=["Reference", "scDesign3"],
    assay_use="log_transformed",
    if_plot=True,
    color_by="pseudotime",
    n_pc=20,
    point_size=5,
)

If you set if_plot = True, then the return result is a dict with two keys: p_umap and p_pca. Each key hosts a plot plotted by matplotlib.

UMAP plot

plot["p_umap"]
../_images/fa812b02861e835f82b34fb36698c38a9fe25cc4fa36ae0840e1ebfdcbe1c0f0.png

PCA plot

plot["p_pca"]
../_images/383afb649dcb62052ec6073a05d7d56fa1b1d210cfe791efc31db794195284c2.png

Alternative to plot

If you are not satisfied with the automatically generated plot result, you can set if_plot = False. Then you will get the dimension reduction result and customize your manipulation.

res = scDesign3Py.plot_reduceddim(
    ref_anndata=data,
    anndata_list=simu_data,
    name_list=["Reference", "scDesign3"],
    assay_use="log_transformed",
    if_plot=False,
    color_by="pseudotime",
    n_pc=20,
    point_size=5,
)

Each row of the output represents a cell (observation). The return PCs number is equal to the set in n_pc.

res[["Method","pseudotime","PC1","PC2","UMAP1","UMAP2"]].head()
Method pseudotime PC1 PC2 UMAP1 UMAP2
1 Reference 0.677236 2.303983 2.020007 0.683344 -3.041574
2 Reference 0.431569 -2.346135 1.611974 3.921517 1.959380
3 Reference 0.749373 3.688469 0.252404 -0.697866 -4.915706
4 Reference 0.925441 6.045197 -2.888025 -3.680566 -5.195340
5 Reference 0.380345 -3.650479 1.347464 2.639666 2.250414