qolmat.benchmark.missing_patterns.MultiMarkovHoleGenerator¶
- class qolmat.benchmark.missing_patterns.MultiMarkovHoleGenerator(n_splits: int, subset: Optional[List[str]] = None, ratio_masked: float = 0.05, random_state: Optional[Union[int, RandomState]] = None, groups: Tuple[str, ...] = ())[source]¶
MultiMarkovHoleGenerator class.
This class implements a way to generate holes in a dataframe. The holes are generated according to a Markov process. Each line of the dataframe mask (np.nan) represents a state of the Markov chain.
- Parameters
- n_splitsint
Number of splits
- subsetOptional[List[str]], optional
Names of the columns for which holes must be created, by default None
- ratio_maskedOptional[float], optional
Ratio of masked values to add, by default 0.05
- random_stateint, RandomState instance or None, default=None
Controls the randomness. Pass an int for reproducible output across multiple function calls.
- groups: Tuple[str, …]
Column names used to group the data
- __init__(n_splits: int, subset: Optional[List[str]] = None, ratio_masked: float = 0.05, random_state: Optional[Union[int, RandomState]] = None, groups: Tuple[str, ...] = ())[source]¶
- fit(X: DataFrame) MultiMarkovHoleGenerator[source]¶
Get the transition matrix.
Get the transition matrix from a list of states transition matrix (stochastic matrix) current in index, next in columns 1 is missing
- Parameters
- Xpd.DataFrame
input dataframe
- Returns
- MultiMarkovHoleGenerator
The model itself
- generate_mask(X: DataFrame) List[DataFrame][source]¶
Create missing data in an array-like object based on a markov chain.
States of the MC are the different masks of missing values: there are at most pow(2,X.shape[1]) possible states.
- Parameters
- Xpd.DataFrame
initial dataframe with missing (true) entries
- Returns
- Dict[str, pd.DataFrame]
the initial dataframe, the dataframe with additional missing entries and the created mask