qolmat.benchmark.metrics.kl_divergence

qolmat.benchmark.metrics.kl_divergence(df1: DataFrame, df2: DataFrame, df_mask: DataFrame, method: str = 'columnwise', min_n_rows: int = 10) Series[source]

Estimate the KL divergence.

Estimation of the Kullback-Leibler divergence between too empirical distributions. Three methods are implemented: - columnwise, relying on a uniform binarization and only taking marginals into account (https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence), - gaussian, relying on a Gaussian approximation,

Parameters
df1pd.DataFrame

First empirical distribution

df2pd.DataFrame

Second empirical distribution

df_mask: pd.DataFrame

Mask indicating on what values the divergence should be computed

method: str

Method used to compute the divergence on multivariate datasets with missing values. Possible values are columnwise and gaussian.

min_n_rows: int

Minimum number of rows for a KL estimation

Returns
pd.Series

Kullback-Leibler divergence

Raises
AssertionError

If the empirical distributions do not have enough samples to estimate a KL divergence. Consider using a larger dataset of lowering the parameter min_n_rows.