qolmat.benchmark.missing_patterns.EmpiricalHoleGenerator

class qolmat.benchmark.missing_patterns.EmpiricalHoleGenerator(n_splits: int, subset: Optional[List[str]] = None, ratio_masked: float = 0.05, random_state: Optional[Union[int, RandomState]] = None, groups: Tuple[str, ...] = ())[source]

EmpiricalHoleGenerator class.

This class implements a way to generate holes in a dataframe. The distribution of holes is learned from the data. The distributions are learned column by column.

Parameters
n_splitsint

Number of splits

subsetOptional[List[str]], optional

Names of the columns for which holes must be created, by default None

ratio_maskedOptional[float], optional

Ratio of masked values ​​to add, by default 0.05.

random_stateint, RandomState instance or None, default=None

Controls the randomness. Pass an int for reproducible output across multiple function calls.

groups: Tuple[str, …]

Column names used to group the data

__init__(n_splits: int, subset: Optional[List[str]] = None, ratio_masked: float = 0.05, random_state: Optional[Union[int, RandomState]] = None, groups: Tuple[str, ...] = ())[source]
compute_distribution_holes(states: Series) Series[source]

Compute the hole distribution.

Parameters
statespd.Series

Series of states.

Returns
pd.Series

hole distribution

fit(X: DataFrame) EmpiricalHoleGenerator[source]

Compute the holes sizes of a dataframe.

Dataframe df has only one column.

Parameters
Xpd.DataFrame

data with holes

Returns
EmpiricalTimeHoleGenerator

The model itself

sample_sizes(column, n_masked)[source]

Create missing data based on the holes size distribution.

Parameters
columnstr

name of the column to fill with holes

n_masked :int

number of masks

Returns
samples_sizesList[int]

Examples using qolmat.benchmark.missing_patterns.EmpiricalHoleGenerator

Benchmark for time series

Benchmark for time series

Tutorial for hole generation in tabular data

Tutorial for hole generation in tabular data