scDesign3Py.scDesign3.simu_new

scDesign3.simu_new(mean_mat: numpy.ndarray | pandas.DataFrame = 'mean_mat', sigma_mat: numpy.ndarray | pandas.DataFrame = 'sigma_mat', zero_mat: numpy.ndarray | pandas.DataFrame = 'zero_mat', quantile_mat: numpy.ndarray | pandas.DataFrame | None = None, copula_dict: rpy2.robjects.vectors.ListVector | rpy2.rlike.container.OrdDict | dict | None = 'default', input_data: pandas.DataFrame = 'dat', new_covariate: pandas.DataFrame = 'newCovariate', family_use: Literal['binomial', 'poisson', 'nb', 'zip', 'zinb', 'gaussian'] | list[str] = 'default', important_feature: Literal['all', 'auto'] | list[bool] | rpy2.robjects.BoolVector = 'all', fastmvn: bool = False, nonnegative: bool = True, nonzerovar: bool = False, filtered_gene: list[str] | None = 'default', n_cores: int = 'default', parallelization: Literal['mcmapply', 'bpmapply', 'pbmcmapply'] = 'default', bpparam: rpy2.robjects.methods.RS4 | None = 'default', return_py: bool = 'default') → rpy2.robjects.vectors.FloatMatrix[source]

Simulate new data

Generate new simulated data based on fitted marginal and copula models.

Details:

The function takes the new covariate (if use) from @construct_data, parameter matricies from @extract_para and multivariate Unifs from @fit_copula.

Arguments:

mean_mat: numpy.ndarray or pandas.DataFrame (default: ‘mean_mat’): A matrix of the mean parameter. Default is ‘mean_mat’, use the @model_paras, mean_mat output.
sigma_mat: numpy.ndarray or pandas.DataFrame (default: ‘sigma_mat’): A matrix of the sigma parameter. Default is ‘sigma_mat’, use the @model_paras, sigma_mat output.
zero_mat: numpy.ndarray or pandas.DataFrame (default: ‘zero_mat’): A matrix of the zero-inflation parameter. Default is ‘zero_mat’, use the @model_paras, zero_mat output.
quantile_mat: numpy.ndarray or pandas.DataFrame (default: None): A matrix of the multivariate quantile. Default is None, if parameter @copula_dict is provided.
copula_dict: rpy2.robject.vectors.ListVector or rpy2.rlike.container.OrdDict or dict (default: ‘default’): Copulas for generating the multivariate quantile matrix. Default is ‘default’, use the @fit_copula_res, copula_list output.
data: str or pandas.DataFrame (default: ‘dat’): An input count matrix. Default is ‘dat’, use the @construct_data_res, ‘dat’ output.
new_covariate: str or pandas.DataFrame (default: ‘newCovariate’): A dataframe which contains covariates of targeted simulated data from @construct_data. Default is ‘newCovariate’, use the @construct_data_res, ‘newCovariate’ output.
family_use: str or list[str] (default: ‘default’): A string or a list of strings of the marginal distribution. Must be one of ‘binomial’, ‘poisson’, ‘nb’, ‘zip’, ‘zinb’ or ‘gaussian’. Default is ‘default’, use the class property @family_use.
important_feature: str or list[bool] or rpy2.robject.vectors.BoolVector (default: ‘all’): A string or list which indicates whether a gene will be used in correlation estimation or not. If this is a string, then this string must be either “all” (using all genes) or “auto”, which indicates that the genes will be automatically selected based on the proportion of zero expression across cells for each gene. Gene with zero proportion greater than 0.8 will be excluded form gene-gene correlation estimation. If this is a list, then this should be a logical vector with length equal to the number of genes in @sce. True in the logical vector means the corresponding gene will be included in gene-gene correlation estimation and False in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation.
fastmvn: bool (default: False): If True, the sampling of multivariate Gaussian is done by R function mvnfast, otherwise by R function mvtnorm.
nonnegative: bool (default: True): If True, values < 0 in the synthetic data will be converted to 0. Default is True, since the expression matrix is nonnegative.
nonzerovar: bool (default: False): If True, for any gene with zero variance, a cell will be replaced with 1. This is designed for avoiding potential errors, for example, PCA.
filtered_gene: None or list[str] (default: ‘default’): None or a list which contains genes that are excluded in the marginal and copula fitting steps because these genes only express in less than two cells. Default is ‘default’, use the @construct_data_res, ‘filtered_gene’ output.
n_cores: int (default: ‘default’): The number of cores to use. Default is ‘default’, use the setting when initializing.
parallelization: str (default: ‘default’): The specific parallelization function to use. If ‘bpmapply’, first call method @get_bpparam. Default is ‘default’, use the setting when initializing.
bpparam: rpy2.robject.methods.RS4 (default: ‘default’): If @parallelization is ‘bpmapply’, first call function @get_bpparam to get the robject. If @parallelization is ‘mcmapply’ or ‘pbmcmapply’, it should be None. Default is ‘default’, use the setting when initializing.
return_py: bool (default: ‘default’): If True, functions will return a result easy for manipulation in python. Default is ‘default’, use the setting when initializing.

Output:

pandas.DataFrame

The new simulated count (expression) matrix.

The row corresponds to the observations and the column corresponds to the genes.