qolmat.imputations.em_sampler.MultiNormalEM

class qolmat.imputations.em_sampler.MultiNormalEM(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, n_samples: int = 10, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False)[source]

Multinormal EM imputer.

Imputation of missing values using a multivariate Gaussian model through EM optimization and using a projected Ornstein-Uhlenbeck process.

Parameters
methodLiteral[“mle”, “sample”]

Method for imputation, choose among “sample” or “mle”.

max_iter_emint, optional

Maximum number of steps in the EM algorithm

n_iter_ouint, optional

Number of iterations for the Gibbs sampling method (+ noise addition), necessary for convergence, by default 50.

n_samplesint, optional

Number of data samples used to estimate the parameters of the distribution. Default, 10

amplifloat, optional

Whether to sample the posterior (1) or to maximise likelihood (0), by default 1.

random_stateint, optional

The seed of the pseudo random number generator to use, for reproducibility.

dtfloat

Process integration time step, a large value increases the sample bias and can make the algorithm unstable, but compensates for a smaller n_iter_ou. By default, 2e-2.

tolerancefloat, optional

Threshold below which a L infinity norm difference indicates the convergence of the parameters

stagnation_thresholdfloat, optional

Threshold below which a L infinity norm difference indicates the convergence of the parameters

stagnation_loglikfloat, optional

Threshold below which an absolute difference of the log likelihood indicates the convergence of the parameters

periodint, optional

Integer used to fold the temporal data periodically

verbosebool, optional

Verbosity level, if False the warnings are silenced

__init__(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, n_samples: int = 10, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False) None[source]
combine_parameters()[source]

Combine all statistics computed for each sample in the update step.

If uses the MANOVA formula.

fit_parameters_with_missingness(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]

Fit the first estimation of the model parameters.

It is based on data with missing values.

Parameters
XNDArray

Data matrix with missingness

get_gamma(n_cols: int) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Get gamma.

If the covariance matrix is not full-rank, defines the projection matrix keeping the sampling process in the relevant subspace.

Parameters
n_colsint

Number of variables in the data matrix

Returns
NDArray

Gamma matrix

get_loglikelihood(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) float[source]

Get the log-likelihood.

Value of the log-likelihood up to a constant for the provided X, using the attributes means and cov_inv for the multivariate normal distribution.

Parameters
XNDArray

Input matrix with variables in column

Returns
float

Computed value

gradient_X_loglik(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

Compute the gradient of the log-likelihood for the provided X.

It uses the attributes means and cov_inv for the multivariate normal distribution.

Parameters
XNDArray

Input matrix with variables in column

Returns
NDArray

The gradient of the log-likelihood with respect to the input variable X.

init_imputation(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]

First simple imputation before iterating.

Parameters
XNDArray

Data matrix, with missing values

Returns
NDArray

Imputed matrix

reset_learned_parameters()[source]

Reset lists of parameters before starting a new estimation.

set_parameters(means: ndarray[tuple[int, ...], dtype[_ScalarType_co]], cov: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]

Set the model parameters from a user value.

Parameters
meansNDArray

Specified value for the mean vector

covNDArray

Specified value for the covariance matrix

update_criteria_stop(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]

Update the variables to compute the stopping criteria.

Parameters
XNDArray

Input matrix with variables in column

update_parameters(X)[source]

Retain statistics relative to the current sample.

Parameters
XNDArray

Input matrix with variables in column