`qolmat.imputations.em_sampler`.VARpEM¶

class qolmat.imputations.em_sampler.VARpEM(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None, max_lagp: int = 2)[source]¶

VAR(p) EM imputer.

Imputation of missing values using a vector autoregressive model through EM optimization and using a projected Ornstein-Uhlenbeck process. Equations and notations and from the following reference, matrices are transposed for consistency: Lütkepohl (2005) New Introduction to Multiple Time Series Analysis

X^n+1 = nu + sum_k A_k^T @ X_k^n + G_n @ S

Parameters

methodLiteral[“mle”, “sample”]: Method for imputation, choose among “sample” or “mle”.
max_iter_emint, optional: Maximum number of steps in the EM algorithm
n_iter_ouint, optional: Number of iterations for the Gibbs sampling method (+ noise addition), necessary for convergence, by default 50.
amplifloat, optional: Whether to sample the posterior (1) or to maximise likelihood (0), by default 1.
random_stateint, optional: The seed of the pseudo random number generator to use, for reproducibility.
dtfloat: Process integration time step, a large value increases the sample bias and can make the algorithm unstable, but compensates for a smaller n_iter_ou. By default, 2e-2.
tolerancefloat, optional: Threshold below which a L infinity norm difference indicates the convergence of the parameters
stagnation_thresholdfloat, optional: Threshold below which a L infinity norm difference indicates the convergence of the parameters
stagnation_loglikfloat, optional: Threshold below which an absolute difference of the log likelihood indicates the convergence of the parameters
periodint, optional: Integer used to fold the temporal data periodically
verbose: bool: default False

Examples

>>> import numpy as np
>>> from qolmat.imputations.em_sampler import VARpEM
>>> imputer = VARpEM(method="sample", random_state=11)
>>> X = np.array([[1, 1, 1, 1], [np.nan, np.nan, 3, 2], [1, 2, 2, 1], [2, 2, 2, 2]])
>>> imputer.fit_transform(X)  

Attributes

X_intermediatelist: List of pd.DataFrame giving the results of the EM process as function of the iteration number.

__init__(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None, max_lagp: int = 2) → None[source]¶

combine_parameters() → None[source]¶

Combine statistics computed for each sample in the update step.

The estimation of nu and B corresponds to the MLE, whereas S is approximated.

get_gamma(n_cols: int) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

Compute gamma.

If the noise matrix is not full-rank, defines the projection matrix keeping the sampling process in the relevant subspace. Rescales the process to avoid instabilities.

Parameters

n_colsint: Number of variables in the data matrix

Returns

NDArray: Gamma matrix

get_loglikelihood(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → float[source]¶

Get the log-likelihood.

Value of the log-likelihood up to a constant for the provided X, using the attributes nu, B and S for the VAR(p) distribution.

Parameters

XNDArray: Input matrix with variables in column

Returns

float: Computed value

gradient_X_loglik(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

Compute the gradient of the log-likelihood for the provided X.

It uses the attributes means and cov_inv for the VAR(p) distribution.

Parameters

XNDArray: Input matrix with variables in column

Returns

NDArray: The gradient of the log-likelihood with respect to the input variable X.

init_imputation(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

First simple imputation before iterating.

Parameters

XNDArray: Data matrix, with missing values

Returns

NDArray: Imputed matrix

pretreatment(X, mask_na) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶

Pretreat the data before imputation by EM, making it more robust.

In the case of the VAR(p) model we freeze the naive imputation on the first observations if all variables are missing to avoid explosive imputations.

Parameters

XNDArray: Data matrix without nans
mask_naNDArray: Boolean matrix indicating which entries are to be imputed

Returns

Tuple[NDArray, NDArray]: A tuple containing: - X the pretreated data matrix - mask_na the updated mask

reset_learned_parameters()[source]¶: Reset lists of parameters before starting a new estimation.

set_parameters(B: ndarray[tuple[int, ...], dtype[_ScalarType_co]], S: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶

Set the model parameters from a user value.

Parameters

BNDArray: Specified value for the autoregression matrix
SNDArray: Specified value for the noise covariance matrix

update_criteria_stop(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶

Update the variable to compute the stopping criteria.

Parameters

XNDArray: Input matrix with variables in column

update_parameters(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → None[source]¶

Retain statistics relative to the current sample.

Parameters

XNDArray: Input matrix with variables in column

qolmat.imputations.em_sampler.VARpEM¶

`qolmat.imputations.em_sampler`.VARpEM¶