scDesign3Py.scDesign3.construct_data

scDesign3.construct_data(anndata: anndata.AnnData, corr_formula: str | list[str], assay_use: str | None = None, default_assay_name: str | None = None, celltype: str | None = None, pseudotime: str | list[str] | None = None, spatial: list[str] | None = None, other_covariates: str | list[str] | None = None, ncell: int = 'default', parallelization: Literal['mcmapply', 'bpmapply', 'pbmcmapply'] = 'default', bpparam: rpy2.robjects.methods.RS4 | None = 'default', return_py: bool = 'default') → rpy2.robjects.vectors.ListVector[source]

Construct the input data

This function constructs the input data (covaraite matrix and expression matrix) for @fit_marginal.

Details:

This function takes a anndata.AnnData object as the input. Based on users’ choice, it constructs the matrix of covaraites (explainary variables) and the expression matrix (e.g., count matrix for scRNA-seq).

Arguments:

anndata: anndata.AnnData: anndata.AnnData object to store the single cell experiment information.
corr_formula: str or list[str]: Indicates the groups for correlation structure. If ‘1’, all cells have one estimated corr. If ‘ind’, no corr (features are independent). If others, this variable decides the corr structures.
assay_use: str (default: None): Indicates the assay you will use. If None, please specify a name for the assay stored in anndata.AnnData.X in @default_assay_name.
default_assay_name: str (default: None): Specified only when @assay_use is None. Asign a name to your default single cell experiment.
celltype: str (default: None): The name of cell type variable in the anndata.AnnData.obs.
pseudotime: str or list[str] (default: None): The name of pseudotime and (if exist) multiple lineages in the anndata.AnnData.obs.
spatial: list[str] (default: None): The names of spatial coordinates in the anndata.AnnData.obs.
other_covariates: str or list[str] (default: None): The other covaraites you want to include in the data.
ncell: int (default: ‘default’): The number of cell you want to simulate. Default is ‘default’, which means only the provided cells in the anndata.AnnData object will be used. If an arbitrary number is provided, the fucntion will use Vine Copula to simulate a new covaraite matrix.
parallelization: str (default: ‘default’): The specific parallelization function to use. If ‘bpmapply’, first call method @get_bpparam. Default is ‘default’, use the setting when initializing.
bpparam: rpy2.robject.methods.RS4 (default: ‘default’): If @parallelization is ‘bpmapply’, first call function @get_bpparam to get the robject. If @parallelization is ‘mcmapply’ or ‘pbmcmapply’, it should be None. Default is ‘default’, use the setting when initializing.
return_py: bool (default: ‘default’): If True, functions will return a result easy for manipulation in python. Default is ‘default’, use the setting when initializing.

Output:

A dict like object.

count_mat: pandas.DataFrame

The expression matrix.

The row corresponds to the observations and the column corresponds to the genes.

dat: pandas.DataFrame

The original covariate matrix.

newCovariate: pandas.DataFrame

The simulated new covariate matrix. If @ncell is default, the @newCovariate is basically the same as the @dat only without the corr_group column.

filtered_gene: None or list[str]

The genes that are excluded in the marginal and copula fitting steps because these genes only express in less than two cells.