Get the bpparam for bpmapply

Introduction

In this section, we will show how to use get_bpparam() function to get a R BiocParallel::MulticoreParam or BiocParallel::SnowParam object to combine with @parallelization = bpmapply.

To get detailed information of the input and output of the function, please check API.

To get more information on BiocParallel R package to help you set the parameters, please check the document.

Step 1: Import packages

import time
import anndata as ad
import scDesign3Py

Step 2: Call get_bpparam function

  • For Linux/Mac users:

The possible parallel method include mcmapply, pbmcmapply and bpmapply. If you are using the bpmapply method, then you should run this function and you can choose either MulticoreParam or SnowParam mode.

  • For windows users:

The only choice is to choose bpmapply method and run this function with SnowParam mode. Setting more than 1 core is not allowed in other methods.

bpparam = scDesign3Py.get_bpparam(mode="MulticoreParam", show=True, stop_on_error=False)
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: FALSE
  bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
  bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: FALSE
  bpfallback: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

Step 3: Read in data and Run the scDesign3 methods

The raw data is from the scvelo and we only choose top 30 genes to save time.

data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data
View of AnnData object with n_obs × n_vars = 2087 × 30
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'

Here we simply show the differnece when fitting the marginal models using the SnowParam mode.

# create the instance
test1 = scDesign3Py.scDesign3(n_cores=1, parallelization="bpmapply", bpparam=bpparam, return_py=False)
test2 = scDesign3Py.scDesign3(n_cores=3, parallelization="bpmapply", bpparam=bpparam, return_py=False)

# construct data
test1.construct_data(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    corr_formula="1",
)
test2.construct_data(
    anndata=data,
    default_assay_name="counts",
    celltype="cell_type",
    pseudotime="pseudotime",
    corr_formula="1",
)

Fit marginal using 1 core

start = time.time()
test1.fit_marginal(
    mu_formula="s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
    family_use="nb",
    usebam=False,
)
end = time.time()
print("Total time cost when using 1 core is {:.2f} sec".format(end-start))
Total time cost when using 1 core is 195.70 sec

Fit marginal using 3 cores

start = time.time()
test2.fit_marginal(
    mu_formula="s(pseudotime, k = 10, bs = 'cr')",
    sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
    family_use="nb",
    usebam=False,
)
end = time.time()
print("Total time cost when using 3 cores is {:.2f} sec".format(end-start))
Total time cost when using 3 cores is 113.44 sec