qolmat.imputations.imputers.ImputerEM¶
- class qolmat.imputations.imputers.ImputerEM(groups: Tuple[str, ...] = (), model: Optional[str] = 'multinormal', columnwise: bool = False, random_state: Optional[Union[int, RandomState]] = None, method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None)[source]¶
EM imputer.
This class implements an imputation method based on joint modelling and an inference using a Expectation-Minimization algorithm.
- Parameters
- groupsTuple[str, …], default=()
List of column names to group by.
- model{‘multinormal’, ‘VAR’}, default=’multinormal’
Method defining the hypothesis made on the data distribution. Possible values: - ‘multinormal’ : the data points are independent and uniformly distributed following a multinormal distribution - ‘VAR’ : the data is a time series modeled by a VAR(p) process
- columnwisebool, default=False
If False, correlations between variables will be used, which is advised. If True, each column is imputed independently. For the multinormal case each value will be imputed by the mean up to a noise with fixed noise, for the VAR case the imputation will be a noisy temporal interpolation.
- random_stateRandomSetting, optional
Controls the randomness of the fit_transform, by default None
- method{‘mle’, ‘sample’}, default=’sample’
Imputation method after EM convergence. - ‘mle’ : Maximum Likelihood Estimation - ‘sample’ : Sample from the posterior distribution
- max_iter_emint, default=200
Maximum number of EM iterations.
- n_iter_ouint, default=50
Number of Ornstein-Uhlenbeck process iterations for sampling.
- amplifloat, default=1
Amplitude parameter for the Ornstein-Uhlenbeck process.
- dtfloat, default=0.02
Time step for the Ornstein-Uhlenbeck process discretization.
- tolerancefloat, default=1e-4
Convergence tolerance for EM algorithm.
- stagnation_thresholdfloat, default=5e-3
Threshold for element-wise stagnation detection in EM algorithm.
- stagnation_loglikfloat, default=2
Threshold for log-likelihood stagnation in EM algorithm.
- periodint, default=1
If different from 1, the data is folded with respect to the given period before applying the imputation.
- verbosebool, default=False
If True, print convergence information during fitting.
- pint, optional
Order of the VAR process (only used when model=’VAR’), by default None
- __init__(groups: Tuple[str, ...] = (), model: Optional[str] = 'multinormal', columnwise: bool = False, random_state: Optional[Union[int, RandomState]] = None, method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None)[source]¶