qolmat.benchmark.metrics.kl_divergence¶
- qolmat.benchmark.metrics.kl_divergence(df1: DataFrame, df2: DataFrame, df_mask: DataFrame, method: str = 'columnwise', min_n_rows: int = 10) Series[source]¶
Estimate the KL divergence.
Estimation of the Kullback-Leibler divergence between too empirical distributions. Three methods are implemented: - columnwise, relying on a uniform binarization and only taking marginals into account (https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence), - gaussian, relying on a Gaussian approximation,
- Parameters
- df1pd.DataFrame
First empirical distribution
- df2pd.DataFrame
Second empirical distribution
- df_mask: pd.DataFrame
Mask indicating on what values the divergence should be computed
- method: str
Method used to compute the divergence on multivariate datasets with missing values. Possible values are columnwise and gaussian.
- min_n_rows: int
Minimum number of rows for a KL estimation
- Returns
- pd.Series
Kullback-Leibler divergence
- Raises
- AssertionError
If the empirical distributions do not have enough samples to estimate a KL divergence. Consider using a larger dataset of lowering the parameter min_n_rows.