qolmat.imputations.em_sampler.MultiNormalEM¶
- class qolmat.imputations.em_sampler.MultiNormalEM(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, n_samples: int = 10, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False)[source]¶
Multinormal EM imputer.
Imputation of missing values using a multivariate Gaussian model through EM optimization and using a projected Ornstein-Uhlenbeck process.
- Parameters
- methodLiteral[“mle”, “sample”]
Method for imputation, choose among “sample” or “mle”.
- max_iter_emint, optional
Maximum number of steps in the EM algorithm
- n_iter_ouint, optional
Number of iterations for the Gibbs sampling method (+ noise addition), necessary for convergence, by default 50.
- n_samplesint, optional
Number of data samples used to estimate the parameters of the distribution. Default, 10
- amplifloat, optional
Whether to sample the posterior (1) or to maximise likelihood (0), by default 1.
- random_stateint, optional
The seed of the pseudo random number generator to use, for reproducibility.
- dtfloat
Process integration time step, a large value increases the sample bias and can make the algorithm unstable, but compensates for a smaller n_iter_ou. By default, 2e-2.
- tolerancefloat, optional
Threshold below which a L infinity norm difference indicates the convergence of the parameters
- stagnation_thresholdfloat, optional
Threshold below which a L infinity norm difference indicates the convergence of the parameters
- stagnation_loglikfloat, optional
Threshold below which an absolute difference of the log likelihood indicates the convergence of the parameters
- periodint, optional
Integer used to fold the temporal data periodically
- verbosebool, optional
Verbosity level, if False the warnings are silenced
- __init__(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, n_samples: int = 10, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False) None[source]¶
- combine_parameters()[source]¶
Combine all statistics computed for each sample in the update step.
If uses the MANOVA formula.
- fit_parameters_with_missingness(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶
Fit the first estimation of the model parameters.
It is based on data with missing values.
- Parameters
- XNDArray
Data matrix with missingness
- get_gamma(n_cols: int) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶
Get gamma.
If the covariance matrix is not full-rank, defines the projection matrix keeping the sampling process in the relevant subspace.
- Parameters
- n_colsint
Number of variables in the data matrix
- Returns
- NDArray
Gamma matrix
- get_loglikelihood(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) float[source]¶
Get the log-likelihood.
Value of the log-likelihood up to a constant for the provided X, using the attributes means and cov_inv for the multivariate normal distribution.
- Parameters
- XNDArray
Input matrix with variables in column
- Returns
- float
Computed value
- gradient_X_loglik(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶
Compute the gradient of the log-likelihood for the provided X.
It uses the attributes means and cov_inv for the multivariate normal distribution.
- Parameters
- XNDArray
Input matrix with variables in column
- Returns
- NDArray
The gradient of the log-likelihood with respect to the input variable X.
- init_imputation(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶
First simple imputation before iterating.
- Parameters
- XNDArray
Data matrix, with missing values
- Returns
- NDArray
Imputed matrix
- set_parameters(means: ndarray[tuple[int, ...], dtype[_ScalarType_co]], cov: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶
Set the model parameters from a user value.
- Parameters
- meansNDArray
Specified value for the mean vector
- covNDArray
Specified value for the covariance matrix