Get the bpparam for bpmapply
Introduction
In this section, we will show how to use get_bpparam()
function to get a R BiocParallel::MulticoreParam
or BiocParallel::SnowParam
object to combine with @parallelization = bpmapply
.
To get detailed information of the input and output of the function, please check API.
To get more information on BiocParallel
R package to help you set the parameters, please check the document.
Step 1: Import packages
import time
import anndata as ad
import scDesign3Py
Step 2: Call get_bpparam
function
For Linux/Mac users:
The possible parallel method include mcmapply
, pbmcmapply
and bpmapply
. If you are using the bpmapply
method, then you should run this function and you can choose either MulticoreParam
or SnowParam
mode.
For windows users:
The only choice is to choose bpmapply
method and run this function with SnowParam
mode. Setting more than 1 core is not allowed in other methods.
bpparam = scDesign3Py.get_bpparam(mode="MulticoreParam", show=True, stop_on_error=False)
class: MulticoreParam
bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: FALSE
bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: FALSE
bpfallback: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: FORK
Step 3: Read in data and Run the scDesign3 methods
The raw data is from the scvelo and we only choose top 30 genes to save time.
data = ad.read_h5ad("data/PANCREAS.h5ad")
data = data[:, 0:30]
data
View of AnnData object with n_obs × n_vars = 2087 × 30
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
var: 'highly_variable_genes'
obsm: 'X_pca', 'X_umap', 'X_x_pca', 'X_x_umap'
Here we simply show the differnece when fitting the marginal models using the SnowParam
mode.
# create the instance
test1 = scDesign3Py.scDesign3(n_cores=1, parallelization="bpmapply", bpparam=bpparam, return_py=False)
test2 = scDesign3Py.scDesign3(n_cores=3, parallelization="bpmapply", bpparam=bpparam, return_py=False)
# construct data
test1.construct_data(
anndata=data,
default_assay_name="counts",
celltype="cell_type",
pseudotime="pseudotime",
corr_formula="1",
)
test2.construct_data(
anndata=data,
default_assay_name="counts",
celltype="cell_type",
pseudotime="pseudotime",
corr_formula="1",
)
Fit marginal using 1 core
start = time.time()
test1.fit_marginal(
mu_formula="s(pseudotime, k = 10, bs = 'cr')",
sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
family_use="nb",
usebam=False,
)
end = time.time()
print("Total time cost when using 1 core is {:.2f} sec".format(end-start))
Total time cost when using 1 core is 195.70 sec
Fit marginal using 3 cores
start = time.time()
test2.fit_marginal(
mu_formula="s(pseudotime, k = 10, bs = 'cr')",
sigma_formula="s(pseudotime, k = 5, bs = 'cr')",
family_use="nb",
usebam=False,
)
end = time.time()
print("Total time cost when using 3 cores is {:.2f} sec".format(end-start))
Total time cost when using 3 cores is 113.44 sec