Perform likelihood ratio test

Introduction

In this section, we will show how to use perform_lrt function to perform likelihood ratio test to compare two dicts of marginal models.

To get detailed information of the input and output of the function, please check API.

Step 1: Import packages and Read in data

import pacakges

import anndata as ad
import scDesign3Py

Read in data

The raw data is from the scvelo and we only choose top 30 genes to save time.

data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data
View of AnnData object with n_obs × n_vars = 2087 × 30
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'

Step 2: Run the fit_marginal() method to create the marginal models

# create the instance
test1 = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply", return_py=False)
test1.set_r_random_seed(123)

# construct data
test1.construct_data(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    corr_formula="cell_type",
)

# fit marginal
marginal1 = test1.fit_marginal(
    mu_formula="1",
    sigma_formula="1",
    family_use = "nb",
    usebam = False,
)
# create the instance
test2 = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply", return_py=False)
test2.set_r_random_seed(123)

# construct data
test2.construct_data(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    corr_formula="pseudotime",
    ncell=10000,
)

# fit marginal
marginal2 = test1.fit_marginal(
    mu_formula="s(pseudotime, bs = 'cr', k = 10)",
    sigma_formula="1",
    family_use = "nb",
    usebam = False,
)

Step 3: Get the marginal models from the fit key

marg_test_1 = {key:value.rx2("fit") for key,value in marginal1.items()}
marg_test_2 = {key:value.rx2("fit") for key,value in marginal2.items()}

Step 4: Run the perform_lrt function to perform the likelihood ratio test

The first argument is the alternative hypothesis and the second is the null hypothesis.

The return value is a pandas.DataFrame object with each row corresponding to a gene (marginal) model LRT result.

scDesign3Py.perform_lrt(marg_test_2,marg_test_1).head()
same_model LogLik_alter LogLik_null df_alter df_null p_value
Pyy 1.0 -6335.504699 -7242.205596 9.341965 1.0 0.0
Iapp 1.0 -4260.230410 -5730.149416 9.802905 1.0 0.0
Chgb 1.0 -5764.120772 -6942.103030 9.883961 1.0 0.0
Rbp4 1.0 -5268.219696 -6522.623963 10.216497 1.0 0.0
Spp1 1.0 -3397.364727 -4865.083733 10.013325 1.0 0.0