qolmat.imputations.imputers.ImputerEM

class qolmat.imputations.imputers.ImputerEM(groups: Tuple[str, ...] = (), model: Optional[str] = 'multinormal', columnwise: bool = False, random_state: Optional[Union[int, RandomState]] = None, method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None)[source]

EM imputer.

This class implements an imputation method based on joint modelling and an inference using a Expectation-Minimization algorithm.

Parameters
groupsTuple[str, …], default=()

List of column names to group by.

model{‘multinormal’, ‘VAR’}, default=’multinormal’

Method defining the hypothesis made on the data distribution. Possible values: - ‘multinormal’ : the data points are independent and uniformly distributed following a multinormal distribution - ‘VAR’ : the data is a time series modeled by a VAR(p) process

columnwisebool, default=False

If False, correlations between variables will be used, which is advised. If True, each column is imputed independently. For the multinormal case each value will be imputed by the mean up to a noise with fixed noise, for the VAR case the imputation will be a noisy temporal interpolation.

random_stateRandomSetting, optional

Controls the randomness of the fit_transform, by default None

method{‘mle’, ‘sample’}, default=’sample’

Imputation method after EM convergence. - ‘mle’ : Maximum Likelihood Estimation - ‘sample’ : Sample from the posterior distribution

max_iter_emint, default=200

Maximum number of EM iterations.

n_iter_ouint, default=50

Number of Ornstein-Uhlenbeck process iterations for sampling.

amplifloat, default=1

Amplitude parameter for the Ornstein-Uhlenbeck process.

dtfloat, default=0.02

Time step for the Ornstein-Uhlenbeck process discretization.

tolerancefloat, default=1e-4

Convergence tolerance for EM algorithm.

stagnation_thresholdfloat, default=5e-3

Threshold for element-wise stagnation detection in EM algorithm.

stagnation_loglikfloat, default=2

Threshold for log-likelihood stagnation in EM algorithm.

periodint, default=1

If different from 1, the data is folded with respect to the given period before applying the imputation.

verbosebool, default=False

If True, print convergence information during fitting.

pint, optional

Order of the VAR process (only used when model=’VAR’), by default None

__init__(groups: Tuple[str, ...] = (), model: Optional[str] = 'multinormal', columnwise: bool = False, random_state: Optional[Union[int, RandomState]] = None, method: Literal['mle', 'sample'] = 'sample', max_iter_em: int = 200, n_iter_ou: int = 50, ampli: float = 1, dt: float = 0.02, tolerance: float = 0.0001, stagnation_threshold: float = 0.005, stagnation_loglik: float = 2, period: int = 1, verbose: bool = False, p: Union[None, int] = None)[source]
get_model(**hyperparams) EM[source]

Get the underlying model of the imputer based on its attributes.

Returns
em_sampler.EM

EM model to be used in the fit and transform methods.

Examples using qolmat.imputations.imputers.ImputerEM

Benchmark for time series

Benchmark for time series