qolmat.imputations.diffusions.ddpms.TsDDPM

class qolmat.imputations.diffusions.ddpms.TsDDPM(num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, dim_feedforward: int = 64, num_blocks: int = 1, nheads_feature: int = 5, nheads_time: int = 8, num_layers_transformer: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_rolling: bool = False, random_state: Optional[Union[int, RandomState]] = None)[source]

Time series DDPM.

Diffusion model for time-series data based on Denoising Diffusion Probabilistic Models (DDPMs) of Ho et al., 2020 (https://arxiv.org/abs/2006.11239), Tashiro et al., 2021 (https://arxiv.org/abs/2107.03502). This implementation follows the implementations found in https://github.com/quickgrid/pytorch-diffusion/tree/main, https://github.com/ermongroup/CSDI/tree/main

__init__(num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, dim_feedforward: int = 64, num_blocks: int = 1, nheads_feature: int = 5, nheads_time: int = 8, num_layers_transformer: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_rolling: bool = False, random_state: Optional[Union[int, RandomState]] = None)[source]

Init function.

Parameters
num_noise_stepsint, optional

Number of noise steps, by default 50

beta_startfloat, optional

Range of beta (noise scale value), by default 1e-4

beta_endfloat, optional

Range of beta (noise scale value), by default 0.02

lrfloat, optional

Learning rate, by default 0.001

ratio_maskedfloat, optional

Ratio of artificial nan for training and validation, by default 0.1

dim_embeddingint, optional

Embedding dimension, by default 128

dim_feedforwardint, optional

Feedforward layer dimension in Transformers, by default 64

num_blocksint, optional

Number of residual blocks, by default 1

nheads_featureint, optional

Number of heads to encode feature-based context, by default 5

nheads_timeint, optional

Number of heads to encode time-based context, by default 8

num_layers_transformerint, optional

Number of transformer layer, by default 1

p_dropoutfloat, optional

Dropout probability, by default 0.0

num_samplingint, optional

Number of samples generated for each cell, by default 1

is_rollingbool, optional

Use pandas.DataFrame.rolling for preprocessing data, by default False

random_stateint, RandomState instance or None, default=None

Controls the randomness. Pass an int for reproducible output across multiple function calls.

fit(x: ~pandas.core.frame.DataFrame, epochs: int = 10, batch_size: int = 100, print_valid: bool = False, x_valid: ~pandas.core.frame.DataFrame = None, metrics_valid: ~typing.Tuple[~typing.Callable, ...] = (<function mean_absolute_error>, <function dist_wasserstein>), round: int = 10, cols_imputed: ~typing.Tuple[str, ...] = (), index_datetime: str = '', freq_str: str = '1D') TsDDPM[source]

Fit data.

Parameters
xpd.DataFrame

Input dataframe

epochsint, optional

Number of epochs, by default 10

batch_sizeint, optional

Batch size, by default 100

print_validbool, optional

Print model performance for after several epochs, by default False

x_validpd.DataFrame, optional

Dataframe for validation, by default None

metrics_validTuple[Callable, …], optional

Set of validation metrics, by default (metrics.mean_absolute_error, metrics.dist_wasserstein)

roundint, optional

Number of decimal places to round to, by default 10

cols_imputedTuple[str, …], optional

Name of columns that need to be imputed, by default ()

index_datetimestr

Name of datetime-like index

freq_strstr

Frequency string of DateOffset of Pandas

Returns
Self

Return Self

Raises
ValueError

Batch size is larger than data size

Examples using qolmat.imputations.diffusions.ddpms.TsDDPM

Tutorial for imputers based on diffusion models

Tutorial for imputers based on diffusion models