qolmat.imputations.diffusions.ddpms.TabDDPM

class qolmat.imputations.diffusions.ddpms.TabDDPM(num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, num_blocks: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_clip: bool = True, random_state: Optional[Union[int, RandomState]] = None)[source]

Tab DDPM.

Diffusion model for tabular data based on Denoising Diffusion Probabilistic Models (DDPM) of Ho et al., 2020 (https://arxiv.org/abs/2006.11239), Tashiro et al., 2021 (https://arxiv.org/abs/2107.03502). This implementation follows the implementations found in https://github.com/quickgrid/pytorch-diffusion/tree/main, https://github.com/ermongroup/CSDI/tree/main

__init__(num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, num_blocks: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_clip: bool = True, random_state: Optional[Union[int, RandomState]] = None)[source]

Init function.

Parameters
num_noise_stepsint, optional

Number of noise steps, by default 50

beta_startfloat, optional

Range of beta (noise scale value), by default 1e-4

beta_endfloat, optional

Range of beta (noise scale value), by default 0.02

lrfloat, optional

Learning rate, by default 0.001

ratio_maskedfloat, optional

Ratio of artificial nan for training and validation, by default 0.1

dim_embeddingint, optional

Embedding dimension, by default 128

num_blocksint, optional

Number of residual block in epsilon model, by default 1

p_dropoutfloat, optional

Dropout probability, by default 0.0

num_samplingint, optional

Number of samples generated for each cell, by default 1

is_clipbool, optional

if values have to be clipped, by default True

random_stateint, RandomState instance or None, default=None

Controls the randomness. Pass an int for reproducible output across multiple function calls.

fit(x: ~pandas.core.frame.DataFrame, epochs: int = 10, batch_size: int = 100, print_valid: bool = False, x_valid: ~pandas.core.frame.DataFrame = None, metrics_valid: ~typing.Tuple[~typing.Callable, ...] = (<function mean_absolute_error>, <function dist_wasserstein>), round: int = 10, cols_imputed: ~typing.Tuple[str, ...] = ()) TabDDPM[source]

Fit data.

Parameters
xpd.DataFrame

Input dataframe

epochsint, optional

Number of epochs, by default 10

batch_sizeint, optional

Batch size, by default 100

print_validbool, optional

Print model performance for after several epochs, by default False

x_validpd.DataFrame, optional

Dataframe for validation, by default None

metrics_validTuple[Callable, …], optional

Set of validation metrics, by default (metrics.mean_absolute_error, metrics.dist_wasserstein)

roundint, optional

Number of decimal places to round to, for better displaying model performance, by default 10

cols_imputedTuple[str, …], optional

Name of columns that need to be imputed, by default ()

Returns
Self

Return Self

Raises
ValueError

Batch size is larger than data size

get_num_params() int[source]

Compute the number of parameters of the underlying model.

Returns
int: Number of parameters if the model has been fitted,

0 otherwise.

predict(x: DataFrame) DataFrame[source]

Predict/impute data.

Parameters
xpd.DataFrame

Data needs to be imputed

Returns
pd.DataFrame

Imputed data

Examples using qolmat.imputations.diffusions.ddpms.TabDDPM

Tutorial for imputers based on diffusion models

Tutorial for imputers based on diffusion models