qolmat.imputations.diffusions.ddpms.TsDDPM¶
- class qolmat.imputations.diffusions.ddpms.TsDDPM(num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, dim_feedforward: int = 64, num_blocks: int = 1, nheads_feature: int = 5, nheads_time: int = 8, num_layers_transformer: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_rolling: bool = False, random_state: Optional[Union[int, RandomState]] = None)[source]¶
Time series DDPM.
Diffusion model for time-series data based on Denoising Diffusion Probabilistic Models (DDPMs) of Ho et al., 2020 (https://arxiv.org/abs/2006.11239), Tashiro et al., 2021 (https://arxiv.org/abs/2107.03502). This implementation follows the implementations found in https://github.com/quickgrid/pytorch-diffusion/tree/main, https://github.com/ermongroup/CSDI/tree/main
- __init__(num_noise_steps: int = 50, beta_start: float = 0.0001, beta_end: float = 0.02, lr: float = 0.001, ratio_masked: float = 0.1, dim_embedding: int = 128, dim_feedforward: int = 64, num_blocks: int = 1, nheads_feature: int = 5, nheads_time: int = 8, num_layers_transformer: int = 1, p_dropout: float = 0.0, num_sampling: int = 1, is_rolling: bool = False, random_state: Optional[Union[int, RandomState]] = None)[source]¶
Init function.
- Parameters
- num_noise_stepsint, optional
Number of noise steps, by default 50
- beta_startfloat, optional
Range of beta (noise scale value), by default 1e-4
- beta_endfloat, optional
Range of beta (noise scale value), by default 0.02
- lrfloat, optional
Learning rate, by default 0.001
- ratio_maskedfloat, optional
Ratio of artificial nan for training and validation, by default 0.1
- dim_embeddingint, optional
Embedding dimension, by default 128
- dim_feedforwardint, optional
Feedforward layer dimension in Transformers, by default 64
- num_blocksint, optional
Number of residual blocks, by default 1
- nheads_featureint, optional
Number of heads to encode feature-based context, by default 5
- nheads_timeint, optional
Number of heads to encode time-based context, by default 8
- num_layers_transformerint, optional
Number of transformer layer, by default 1
- p_dropoutfloat, optional
Dropout probability, by default 0.0
- num_samplingint, optional
Number of samples generated for each cell, by default 1
- is_rollingbool, optional
Use pandas.DataFrame.rolling for preprocessing data, by default False
- random_stateint, RandomState instance or None, default=None
Controls the randomness. Pass an int for reproducible output across multiple function calls.
- fit(x: ~pandas.core.frame.DataFrame, epochs: int = 10, batch_size: int = 100, print_valid: bool = False, x_valid: ~pandas.core.frame.DataFrame = None, metrics_valid: ~typing.Tuple[~typing.Callable, ...] = (<function mean_absolute_error>, <function dist_wasserstein>), round: int = 10, cols_imputed: ~typing.Tuple[str, ...] = (), index_datetime: str = '', freq_str: str = '1D') TsDDPM[source]¶
Fit data.
- Parameters
- xpd.DataFrame
Input dataframe
- epochsint, optional
Number of epochs, by default 10
- batch_sizeint, optional
Batch size, by default 100
- print_validbool, optional
Print model performance for after several epochs, by default False
- x_validpd.DataFrame, optional
Dataframe for validation, by default None
- metrics_validTuple[Callable, …], optional
Set of validation metrics, by default (metrics.mean_absolute_error, metrics.dist_wasserstein)
- roundint, optional
Number of decimal places to round to, by default 10
- cols_imputedTuple[str, …], optional
Name of columns that need to be imputed, by default ()
- index_datetimestr
Name of datetime-like index
- freq_strstr
Frequency string of DateOffset of Pandas
- Returns
- Self
Return Self
- Raises
- ValueError
Batch size is larger than data size