qolmat.imputations.em_sampler.VARpEM¶
- class qolmat.imputations.em_sampler.VARpEM(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None, max_lagp: int = 2)[source]¶
VAR(p) EM imputer.
Imputation of missing values using a vector autoregressive model through EM optimization and using a projected Ornstein-Uhlenbeck process. Equations and notations and from the following reference, matrices are transposed for consistency: Lütkepohl (2005) New Introduction to Multiple Time Series Analysis
X^n+1 = nu + sum_k A_k^T @ X_k^n + G_n @ S
- Parameters
- methodLiteral[“mle”, “sample”]
Method for imputation, choose among “sample” or “mle”.
- max_iter_emint, optional
Maximum number of steps in the EM algorithm
- n_iter_ouint, optional
Number of iterations for the Gibbs sampling method (+ noise addition), necessary for convergence, by default 50.
- amplifloat, optional
Whether to sample the posterior (1) or to maximise likelihood (0), by default 1.
- random_stateint, optional
The seed of the pseudo random number generator to use, for reproducibility.
- dtfloat
Process integration time step, a large value increases the sample bias and can make the algorithm unstable, but compensates for a smaller n_iter_ou. By default, 2e-2.
- tolerancefloat, optional
Threshold below which a L infinity norm difference indicates the convergence of the parameters
- stagnation_thresholdfloat, optional
Threshold below which a L infinity norm difference indicates the convergence of the parameters
- stagnation_loglikfloat, optional
Threshold below which an absolute difference of the log likelihood indicates the convergence of the parameters
- periodint, optional
Integer used to fold the temporal data periodically
- verbose: bool
default False
Examples
>>> import numpy as np >>> from qolmat.imputations.em_sampler import VARpEM >>> imputer = VARpEM(method="sample", random_state=11) >>> X = np.array([[1, 1, 1, 1], [np.nan, np.nan, 3, 2], [1, 2, 2, 1], [2, 2, 2, 2]]) >>> imputer.fit_transform(X)
- Attributes
- X_intermediatelist
List of pd.DataFrame giving the results of the EM process as function of the iteration number.
- __init__(method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, random_state: Optional[Union[int, RandomState]] = None, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None, max_lagp: int = 2) None[source]¶
- combine_parameters() None[source]¶
Combine statistics computed for each sample in the update step.
The estimation of nu and B corresponds to the MLE, whereas S is approximated.
- get_gamma(n_cols: int) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶
Compute gamma.
If the noise matrix is not full-rank, defines the projection matrix keeping the sampling process in the relevant subspace. Rescales the process to avoid instabilities.
- Parameters
- n_colsint
Number of variables in the data matrix
- Returns
- NDArray
Gamma matrix
- get_loglikelihood(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) float[source]¶
Get the log-likelihood.
Value of the log-likelihood up to a constant for the provided X, using the attributes nu, B and S for the VAR(p) distribution.
- Parameters
- XNDArray
Input matrix with variables in column
- Returns
- float
Computed value
- gradient_X_loglik(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶
Compute the gradient of the log-likelihood for the provided X.
It uses the attributes means and cov_inv for the VAR(p) distribution.
- Parameters
- XNDArray
Input matrix with variables in column
- Returns
- NDArray
The gradient of the log-likelihood with respect to the input variable X.
- init_imputation(X: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶
First simple imputation before iterating.
- Parameters
- XNDArray
Data matrix, with missing values
- Returns
- NDArray
Imputed matrix
- pretreatment(X, mask_na) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶
Pretreat the data before imputation by EM, making it more robust.
In the case of the VAR(p) model we freeze the naive imputation on the first observations if all variables are missing to avoid explosive imputations.
- Parameters
- XNDArray
Data matrix without nans
- mask_naNDArray
Boolean matrix indicating which entries are to be imputed
- Returns
- Tuple[NDArray, NDArray]
A tuple containing: - X the pretreated data matrix - mask_na the updated mask
- set_parameters(B: ndarray[tuple[int, ...], dtype[_ScalarType_co]], S: ndarray[tuple[int, ...], dtype[_ScalarType_co]])[source]¶
Set the model parameters from a user value.
- Parameters
- BNDArray
Specified value for the autoregression matrix
- SNDArray
Specified value for the noise covariance matrix