qolmat.imputations.imputers_pytorch.ImputerDiffusion

class qolmat.imputations.imputers_pytorch.ImputerDiffusion(model: str = 'TabDDPM', groups: ~typing.Tuple[str, ...] = (), epochs: int = 100, batch_size: int = 100, x_valid: ~pandas.core.frame.DataFrame = None, print_valid: bool = False, metrics_valid: ~typing.Tuple[~typing.Callable, ...] = (<function mean_absolute_error>, <function dist_wasserstein>), round: int = 10, cols_imputed: ~typing.Tuple[str, ...] = (), index_datetime: str = '', freq_str: str = '1D', random_state: ~typing.Optional[~typing.Union[int, ~numpy.random.mtrand.RandomState]] = None, num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, dim_feedforward: int = 64, num_blocks: int = 1, nheads_feature: int = 5, nheads_time: int = 8, num_layers_transformer: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_rolling: bool = False)[source]

Imputer based on diffusion models.

This class inherits from the class _Imputer. It is a wrapper for imputers based on diffusion models.

__init__(model: str = 'TabDDPM', groups: ~typing.Tuple[str, ...] = (), epochs: int = 100, batch_size: int = 100, x_valid: ~pandas.core.frame.DataFrame = None, print_valid: bool = False, metrics_valid: ~typing.Tuple[~typing.Callable, ...] = (<function mean_absolute_error>, <function dist_wasserstein>), round: int = 10, cols_imputed: ~typing.Tuple[str, ...] = (), index_datetime: str = '', freq_str: str = '1D', random_state: ~typing.Optional[~typing.Union[int, ~numpy.random.mtrand.RandomState]] = None, num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, dim_feedforward: int = 64, num_blocks: int = 1, nheads_feature: int = 5, nheads_time: int = 8, num_layers_transformer: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_rolling: bool = False)[source]

Init ImputerDiffusion.

Parameters
groupsTuple[str, …], optional

List of column names to group by, by default ()

modelstr

Name of the imputer based on diffusion models (e.g., TabDDPM, TsDDPM), by default TabDDPM

epochsint, optional

Number of epochs, by default 10

batch_sizeint, optional

Batch size, by default 100

x_validpd.DataFrame, optional

Dataframe for validation, by default None

print_validbool, optional

Print model performance for after several epochs, by default False

metrics_validTuple[Callable, …], optional

Set of validation metrics, by default (metrics.mean_absolute_error, metrics.dist_wasserstein)

roundint, optional

Number of decimal places to round to, for better displaying model performance, by default 10

cols_imputedTuple[str, …], optional

Name of columns that need to be imputed, by default ()

index_datetimestr

Name of datetime-like index. It is for processing time-series data, used in diffusion models e.g., TsDDPM.

freq_strstr

Frequency string of DateOffset of Pandas. It is for processing time-series data, used in diffusion models e.g., TsDDPM.

random_stateRandomSetting, optional

Controls the randomness of the fit_transform, by default None

num_noise_stepsint, optional

Number of noise steps, by default 50

beta_startfloat, optional

Range of beta (noise scale value), by default 1e-4

beta_endfloat, optional

Range of beta (noise scale value), by default 0.02

lrfloat, optional

Learning rate, by default 0.001

ratio_maskedfloat, optional

Ratio of artificial nan for training and validation, by default 0.1

dim_embeddingint, optional

Embedding dimension, by default 128

dim_feedforwardint, optional

Feedforward layer dimension in Transformers, by default 64

num_blocksint, optional

Number of residual blocks, by default 1

nheads_featureint, optional

Number of heads to encode feature-based context, by default 5

nheads_timeint, optional

Number of heads to encode time-based context, by default 8

num_layers_transformerint, optional

Number of transformer layer, by default 1

p_dropoutfloat, optional

Dropout probability, by default 0.0

num_samplingint, optional

Number of samples generated for each cell, by default 1

is_rollingbool, optional

Use pandas.DataFrame.rolling for preprocessing data, by default False

Examples

>>> import numpy as np
>>> from qolmat.imputations.imputers_pytorch import ImputerDiffusion
>>>
>>> X = np.array(
...     [
...         [1, 1, 1, 1],
...         [np.nan, np.nan, 3, 2],
...         [1, 2, 2, 1],
...         [2, 2, 2, 2],
...     ]
... )
>>> imputer = ImputerDiffusion(epochs=50, batch_size=1, random_state=11)
>>>
>>> df_imputed = imputer.fit_transform(X)
get_model() TabDDPM[source]

Get the underlying model of the imputer based on its attributes.

Returns
ddpms.TabDDPM

TabDDPM model to be used in the fit and transform methods.

get_params_model() dict[source]

Get parameters for creating a DDPM model.

Returns
dict

A dictionary containing the parameters required to create a model of type TabDDPM or TsDDPM.

get_summary_architecture() Dict[source]

Get the summary of the architecture.

Returns
Dict

Summary of the architecture

get_summary_training() Dict[source]

Get the summary of the training.

Returns
Dict

Summary of the training

Examples using qolmat.imputations.imputers_pytorch.ImputerDiffusion

Tutorial for imputers based on diffusion models

Tutorial for imputers based on diffusion models