qolmat.imputations.imputers.ImputerRegressor

class qolmat.imputations.imputers.ImputerRegressor(imputer_params: Tuple[str, ...] = ('handler_nan',), groups: Tuple[str, ...] = (), estimator: Optional[BaseEstimator] = None, handler_nan: str = 'row', random_state: Optional[Union[int, RandomState]] = None)[source]

Regressor imputer.

This class implements a regression imputer in the multivariate case. It imputes each column using a single fit-predict for a given estimator, based on the columns which have no missing values.

Parameters
groups: Tuple[str, …]

List of column names to group by, by default []

estimatorBaseEstimator, optional

Estimator for imputing a column based on the others

handler_nanstr

Can be fit, `row or column: - if fit, the estimator is assumed to be robust to missing values - if row all non complete rows will be removed from the train dataset, and will not be used for the inference, - if column all non complete columns will be ignored. By default, row

random_stateRandomSetting, optional

Controls the randomness of the fit_transform, by default None

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from qolmat.imputations import imputers
>>> from sklearn.ensemble import ExtraTreesRegressor
>>> imputer = imputers.ImputerRegressor(estimator=ExtraTreesRegressor())
>>> df = pd.DataFrame(
...     data=[
...         [1, 1, 1, 1],
...         [np.nan, np.nan, np.nan, np.nan],
...         [1, 2, 2, 5],
...         [2, 2, 2, 2],
...     ],
...     columns=["var1", "var2", "var3", "var4"],
... )
>>> imputer.fit_transform(df)
   var1  var2  var3  var4
0   1.0   1.0   1.0   1.0
1   1.0   2.0   2.0   2.0
2   1.0   2.0   2.0   5.0
3   2.0   2.0   2.0   2.0
__init__(imputer_params: Tuple[str, ...] = ('handler_nan',), groups: Tuple[str, ...] = (), estimator: Optional[BaseEstimator] = None, handler_nan: str = 'row', random_state: Optional[Union[int, RandomState]] = None)[source]
get_Xy_valid(df: DataFrame, col: str) Tuple[DataFrame, Series][source]

Get a valid couple (X,y).

Parameters
dfpd.DataFrame

Input dataframe

colstr

column name.

Returns
Tuple[pd.DataFrame, pd.Series]

Valid X and y.

Raises
ValueError

_description_

Examples using qolmat.imputations.imputers.ImputerRegressor

Benchmark for categorical data

Benchmark for categorical data