qolmat.benchmark.missing_patterns.GroupedHoleGenerator

class qolmat.benchmark.missing_patterns.GroupedHoleGenerator(n_splits: int, subset: Optional[List[str]] = None, ratio_masked: float = 0.05, random_state: Optional[Union[int, RandomState]] = None, groups: Tuple[str, ...] = ())[source]

GroupedHoleGenerator class.

This class implements a way to generate holes in a dataframe. The holes are generated from groups, specified by the user.

Parameters
n_splitsint

Number of splits

subsetOptional[List[str]], optional

Names of the columns for which holes must be created, by default None

ratio_maskedOptional[float], optional

Ratio of masked to add, by default 0.05

random_stateint, RandomState instance or None, default=None

Controls the randomness. Pass an int for reproducible output across multiple function calls.

groupsTuple[str, …]

Names of the columns forming the groups, by default []

__init__(n_splits: int, subset: Optional[List[str]] = None, ratio_masked: float = 0.05, random_state: Optional[Union[int, RandomState]] = None, groups: Tuple[str, ...] = ())[source]
fit(X: DataFrame) GroupedHoleGenerator[source]

Create the groups based on the column names (groups attribute).

Parameters
Xpd.DataFrame

input dataframe

Returns
GroupedHoleGenerator

The model itself

Raises
if the number of samples/splits is greater than the number of groups.
split(X: DataFrame) List[DataFrame][source]

Create masked dataframes.

Parameters
Xpd.DataFrame

input dataframe

Returns
List[pd.DataFrame]

list of masks

Examples using qolmat.benchmark.missing_patterns.GroupedHoleGenerator

Tutorial for hole generation in tabular data

Tutorial for hole generation in tabular data