`qolmat.imputations.em_sampler`.MultiNormalEM¶

class qolmat.imputations.em_sampler.MultiNormalEM(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, n_samples: int = 10, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False)[source]¶

Multinormal EM imputer.

Imputation of missing values using a multivariate Gaussian model through EM optimization and using a projected Ornstein-Uhlenbeck process.

Parameters

methodLiteral[“mle”, “sample”]: Method for imputation, choose among “sample” or “mle”.
max_iter_emint, optional: Maximum number of steps in the EM algorithm
n_iter_ouint, optional: Number of iterations for the Gibbs sampling method (+ noise addition), necessary for convergence, by default 50.
n_samplesint, optional: Number of data samples used to estimate the parameters of the distribution. Default, 10
amplifloat, optional: Whether to sample the posterior (1) or to maximise likelihood (0), by default 1.
random_stateint, optional: The seed of the pseudo random number generator to use, for reproducibility.
dtfloat: Process integration time step, a large value increases the sample bias and can make the algorithm unstable, but compensates for a smaller n_iter_ou. By default, 2e-2.
tolerancefloat, optional: Threshold below which a L infinity norm difference indicates the convergence of the parameters
stagnation_thresholdfloat, optional: Threshold below which a L infinity norm difference indicates the convergence of the parameters
stagnation_loglikfloat, optional: Threshold below which an absolute difference of the log likelihood indicates the convergence of the parameters
periodint, optional: Integer used to fold the temporal data periodically
verbosebool, optional: Verbosity level, if False the warnings are silenced

__init__(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, n_samples: int = 10, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False) → None[source]¶

combine_parameters()[source]¶

Combine all statistics computed for each sample in the update step.

If uses the MANOVA formula.

fit_parameters_with_missingness(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶

Fit the first estimation of the model parameters.

It is based on data with missing values.

Parameters

XNDArray: Data matrix with missingness

get_gamma(n_cols: int) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

Get gamma.

If the covariance matrix is not full-rank, defines the projection matrix keeping the sampling process in the relevant subspace.

Parameters

n_colsint: Number of variables in the data matrix

Returns

NDArray: Gamma matrix

get_loglikelihood(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → float[source]¶

Get the log-likelihood.

Value of the log-likelihood up to a constant for the provided X, using the attributes means and cov_inv for the multivariate normal distribution.

Parameters

XNDArray: Input matrix with variables in column

Returns

float: Computed value

gradient_X_loglik(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

Compute the gradient of the log-likelihood for the provided X.

It uses the attributes means and cov_inv for the multivariate normal distribution.

Parameters

XNDArray: Input matrix with variables in column

Returns

NDArray: The gradient of the log-likelihood with respect to the input variable X.

init_imputation(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

First simple imputation before iterating.

Parameters

XNDArray: Data matrix, with missing values

Returns

NDArray: Imputed matrix

reset_learned_parameters()[source]¶: Reset lists of parameters before starting a new estimation.

set_parameters(means: ndarray[tuple[int, ...], dtype[_ScalarType_co]], cov: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶

Set the model parameters from a user value.

Parameters

meansNDArray: Specified value for the mean vector
covNDArray: Specified value for the covariance matrix

update_criteria_stop(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶

Update the variables to compute the stopping criteria.

Parameters

XNDArray: Input matrix with variables in column

update_parameters(X)[source]¶

Retain statistics relative to the current sample.

Parameters

XNDArray: Input matrix with variables in column

qolmat.imputations.em_sampler.MultiNormalEM¶

`qolmat.imputations.em_sampler`.MultiNormalEM¶