`qolmat.imputations.rpca.rpca_noisy`.RpcaNoisy¶

class qolmat.imputations.rpca.rpca_noisy.RpcaNoisy(random_state: Optional[Union[int, RandomState]] = None, rank: Optional[int] = None, mu: Optional[float] = None, tau: Optional[float] = None, lam: Optional[float] = None, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = True)[source]¶

Class for a noisy version of the so-called ‘improved RPCA’.

Parameters

random_stateint, optional: The seed of the pseudo random number generator to use, for reproducibility.
rank: Optional[int]: Upper bound of the rank to be estimated
mu: Optional[float]: initial stiffness parameter for the constraint M = L Q
tau: Optional[float]: penalizing parameter for the nuclear norm
lam: Optional[float]: penalizing parameter for the sparse matrix
list_periods: Optional[List[int]]: list of periods, linked to the Toeplitz matrices
list_etas: Optional[List[float]]: list of penalizing parameters for the corresponding period in list_periods
max_iterations: Optional[int]: stopping criteria, maximum number of iterations. By default, the value is set to 10_000
tolerance: Optional[float]: stopping criteria, minimum difference between 2 consecutive iterations. By default, the value is set to 1e-6
norm: Optional[str]: error norm, can be “L1” or “L2”. By default, the value is set to “L2”
verbose: Optional[bool]: verbosity level, if False the warnings are silenced

References

Wang, Xuehui, et al. “An improved robust principal component analysis model for anomalies detection of subway passenger flow.” Journal of advanced transportation (2018).

Chen, Yuxin, et al. “Bridging convex and nonconvex optimization in robust PCA: Noise, outliers and missing data.” The Annals of Statistics 49.5 (2021): 2948-2971.

__init__(random_state: Optional[Union[int, RandomState]] = None, rank: Optional[int] = None, mu: Optional[float] = None, tau: Optional[float] = None, lam: Optional[float] = None, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = True) → None[source]¶

static cost_function(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], M: ndarray[tuple[int, ...], dtype[_ScalarType_co]], A: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], tau: float, lam: float, list_periods: List[int] = [], list_etas: List[float] = [], norm: str = 'L2')[source]¶

Estimate cost function for the noisy RPCA algorithm.

Parameters

DNDArray: Matrix of observations
MNDArray: Low-rank signal
ANDArray: Anomalies
OmegaNDArray: Mask for observations
tau: Optional[float]: penalizing parameter for the nuclear norm
lam: Optional[float]: penalizing parameter for the sparse matrix
list_periods: Optional[List[int]]: list of periods, linked to the Toeplitz matrices
list_etas: Optional[List[float]]: list of penalizing parameters for the corresponding period in list_periods
norm: Optional[str]: error norm, can be “L1” or “L2”. By default, the value is set to “L2”

Returns

float: Value of the cost function minimized by the RPCA

decompose(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶

Compute the noisy RPCA with L1 or L2 time penalisation.

Parameters

DNDArray: Matrix of the observations
Omega: NDArray: Matrix of missingness, with boolean data

Returns

M: NDArray: Low-rank signal
A: NDArray: Anomalies

decompose_on_basis(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Q: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶

Decompose the matrix D with an observation matrix Omega.

It uses the noisy RPCA algorithm, with a fixed reduced basis given by the matrix Q. This allows to impute new data without resolving the optimization problem on the whole dataset.

Parameters

DNDArray: _description_
OmegaNDArray: _description_
QNDArray: _description_

Returns

Tuple[NDArray, NDArray]: A tuple representing the decomposition of D with: - M: low-rank matrix - A: sparse matrix

decompose_with_basis(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶

Compute the noisy RPCA with L1 or L2 time penalisation.

It returns the decomposition of the low-rank matrix.

Parameters

DNDArray: Matrix of the observations
Omega: NDArray: Matrix of missingness, with boolean data

Returns

M: NDArray: Low-rank signal
A: NDArray: Anomalies
L: NDArray: Coefficients of the low-rank matrix in the reduced basis
Q: NDArray: Reduced basis of the low-rank matrix

get_params_scale(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Dict[str, float][source]¶

Get parameters for scaling in RPCA based on the input data.

Parameters

Dnp.ndarray: Input data matrix of shape (m, n).

Returns

dict

A dictionary containing the following parameters:

“rank”int
Rank estimate for low-rank matrix decomposition.
“tau”float
Regularization parameter for the temporal correlations.
“lam”float
Regularization parameter for the L1 norm.

static minimise_loss(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], rank: int, tau: float, lam: float, mu: float = 0.01, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = False) → Tuple[source]¶

Compute the noisy RPCA with a L2 time penalisation.

This function computes the noisy Robust Principal Component Analysis (RPCA) using a L2 time penalisation. It iteratively minimizes a loss function to separate the low-rank and sparse components from the input data matrix.

Parameters

Dnp.ndarray: Observations matrix of shape (m, n).
Omeganp.ndarray: Binary matrix indicating the observed entries of D, shape (m, n).
rankint: Estimated low-rank of the matrix D.
taufloat: Penalizing parameter for the nuclear norm.
lamfloat: Penalizing parameter for the sparse matrix.
mufloat, optional: Initial stiffness parameter for the constraint on M, L, and Q. Defaults to 1e-2.
list_periodsList[int], optional: List of periods linked to the Toeplitz matrices. Defaults to [].
list_etasList[float], optional: List of penalizing parameters for the corresponding periods in list_periods. Defaults to [].
max_iterationsint, optional: Stopping criteria, maximum number of iterations. Defaults to 10000.
tolerancefloat, optional: Stopping criteria, minimum difference between 2 consecutive iterations. Defaults to 1e-6.
normstr, optional: Error norm, can be “L1” or “L2”. Defaults to “L2”.
verbosebool, optional: Verbosity level, if False the warnings are silenced. Defaults to False.

Returns

Tuple: A tuple containing the following elements: - M : np.ndarray Low-rank signal matrix of shape (m, n). - A : np.ndarray Anomalies matrix of shape (m, n). - L : np.ndarray Basis unitary array of shape (m, rank). - Q : np.ndarray Basis unitary array of shape (rank, n).

Raises

ValueError: If the periods provided in the argument in list_periods are not smaller than the number of rows in the matrix.

qolmat.imputations.rpca.rpca_noisy.RpcaNoisy¶

`qolmat.imputations.rpca.rpca_noisy`.RpcaNoisy¶