Perform likelihood ratio test

Introduction

In this section, we will show how to use perform_lrt function to perform likelihood ratio test to compare two dicts of marginal models.

To get detailed information of the input and output of the function, please check API.

Step 1: Import packages and Read in data

import pacakges

import anndata as ad
import scDesign3Py

Read in data

The raw data is from the scvelo and we only choose top 30 genes to save time.

data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data

View of AnnData object with n_obs × n_vars = 2087 × 30
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'

Step 2: Run the `fit_marginal()` method to create the marginal models

# create the instance
test1 = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply", return_py=False)
test1.set_r_random_seed(123)

# construct data
test1.construct_data(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    corr_formula="cell_type",
)

# fit marginal
marginal1 = test1.fit_marginal(
    mu_formula="1",
    sigma_formula="1",
    family_use = "nb",
    usebam = False,
)

# create the instance
test2 = scDesign3Py.scDesign3(n_cores=3, parallelization="pbmcmapply", return_py=False)
test2.set_r_random_seed(123)

# construct data
test2.construct_data(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    corr_formula="pseudotime",
    ncell=10000,
)

# fit marginal
marginal2 = test1.fit_marginal(
    mu_formula="s(pseudotime, bs = 'cr', k = 10)",
    sigma_formula="1",
    family_use = "nb",
    usebam = False,
)

Step 3: Get the marginal models from the `fit` key

marg_test_1 = {key:value.rx2("fit") for key,value in marginal1.items()}
marg_test_2 = {key:value.rx2("fit") for key,value in marginal2.items()}

Step 4: Run the `perform_lrt` function to perform the likelihood ratio test

The first argument is the alternative hypothesis and the second is the null hypothesis.

The return value is a pandas.DataFrame object with each row corresponding to a gene (marginal) model LRT result.

scDesign3Py.perform_lrt(marg_test_2,marg_test_1).head()

	same_model	LogLik_alter	LogLik_null	df_alter	df_null	p_value
Pyy	1.0	-6335.504699	-7242.205596	9.341965	1.0	0.0
Iapp	1.0	-4260.230410	-5730.149416	9.802905	1.0	0.0
Chgb	1.0	-5764.120772	-6942.103030	9.883961	1.0	0.0
Rbp4	1.0	-5268.219696	-6522.623963	10.216497	1.0	0.0
Spp1	1.0	-3397.364727	-4865.083733	10.013325	1.0	0.0