scDesign3Py.scDesign3.fit_copula

scDesign3.fit_copula(input_data: Literal['count_mat', 'dat', 'newCovariate'] | pandas.DataFrame = 'dat', copula: Literal['gaussian', 'vine'] = 'gaussian', empirical_quantile: bool = False, marginal_dict: rpy2.robjects.vectors.ListVector | rpy2.rlike.container.OrdDict | dict = 'default', family_use: Literal['binomial', 'poisson', 'nb', 'zip', 'zinb', 'gaussian'] | list[str] = 'default', dt: bool = True, pseudo_obs: bool = False, epsilon: float = 1e-06, family_set: str | list[str] = ['gaussian', 'indep'], important_feature: Literal['all', 'auto'] | list[bool] = 'all', n_cores: int = 'default', parallelization: Literal['mcmapply', 'bpmapply', 'pbmcmapply'] = 'default', bpparam: rpy2.robjects.methods.RS4 | None = 'default', return_py: bool = 'default') → rpy2.robjects.vectors.ListVector[source]

Fit the copula model

@fit_copula fits the copula model.

Details:

This function takes the result from @fit_marginal as the input and fit the copula model on the residuals.

Arguments:

input_data: str or pandas.DataFrame (default: ‘dat’): One of the output of @construct_data, only need to specify the name if directly use the @construct_data_res. An alternative is to directly provided the corresponding DataFrame.
copula: str (default: ‘gaussian’): A string of the copula choice. Must be one of ‘gaussian’ or ‘vine’. Note that vine copula may have better modeling of high-dimensions, but can be very slow when features are >1000.
empirical_quantile: bool (default: False): Please only use it if you clearly know what will happen! If True, DO NOT fit the copula and use the EMPIRICAL CDF values of the original data; it will make the simulated data fixed (no randomness). Only works if ncell is the same as your original data.
marginal_dict: rpy2.robject.vectors.ListVector or rpy2.rlike.container.OrdDict or dict (default: ‘default’): The result of @fit_marginal. Default is ‘default’, using the class property @fit_marginal_res.
family_use: str or list[str] (default: ‘default’): A string or a list of strings of the marginal distribution. Must be one of ‘binomial’, ‘poisson’, ‘nb’, ‘zip’, ‘zinb’ or ‘gaussian’. Default is ‘default’, use the class property @family_use.
dt: bool (default: True): If True, perform the distributional transformation to make the discrete data continuous. This is useful for discrete distributions (e.g., Poisson, NB). Note that for continuous data (e.g., Gaussian), DT does not make sense and should be set as False.
pseudo_obs: bool (default: False): If True, use the empirical quantiles instead of theoretical quantiles for fitting copula.
epsilon: float (default: 1e-6): A numeric variable for preventing the transformed quantiles to collapse to 0 or 1.
family_set: str or list[str] (default: [‘gaussian’, ‘indep’]): A string or a string list of the bivariate copula families.
important_feature: str or list[bool] (default: ‘all’): A string or list which indicates whether a gene will be used in correlation estimation or not. If this is a string, then this string must be either “all” (using all genes) or “auto”, which indicates that the genes will be automatically selected based on the proportion of zero expression across cells for each gene. Gene with zero proportion greater than 0.8 will be excluded form gene-gene correlation estimation. If this is a list, then this should be a logical vector with length equal to the number of genes in @sce. True in the logical vector means the corresponding gene will be included in gene-gene correlation estimation and False in the logical vector means the corresponding gene will be excluded from the gene-gene correlation estimation.
n_cores: int (default: ‘default’): The number of cores to use. Default is ‘default’, use the setting when initializing.
parallelization: str (default: ‘default’): The specific parallelization function to use. If ‘bpmapply’, first call method @get_bpparam. Default is ‘default’, use the setting when initializing.
bpparam: rpy2.robject.methods.RS4 (default: ‘default’): If @parallelization is ‘bpmapply’, first call function @get_bpparam to get the robject. If @parallelization is ‘mcmapply’ or ‘pbmcmapply’, it should be None. Default is ‘default’, use the setting when initializing.
return_py: bool (default: ‘default’): If True, functions will return a result easy for manipulation in python. Default is ‘default’, use the setting when initializing.

Output:

A dict like object.

model_aic: pandas.Series

An array with three values. In order, they are the marginal AIC, the copula AIC, the total AIC.

model_bic: pandas.Series

An array with three values. In order, they are the marginal BIC, the copula BIC, the total BIC.

important_feature: list[bool]

A vector showing the genes regarded as the inportant genes.

copula_list: rpy2.rlike.container.OrdDict

A dict of the fitted copula model. If using Gaussian copula, a dict of correlation matrices; if vine, a dict of vine objects.

Caution that though it’s name is list, it is actually a dict like object.