qolmat.imputations.rpca.rpca_noisy.RpcaNoisy¶
- class qolmat.imputations.rpca.rpca_noisy.RpcaNoisy(random_state: Optional[Union[int, RandomState]] = None, rank: Optional[int] = None, mu: Optional[float] = None, tau: Optional[float] = None, lam: Optional[float] = None, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = True)[source]¶
Class for a noisy version of the so-called ‘improved RPCA’.
- Parameters
- random_stateint, optional
The seed of the pseudo random number generator to use, for reproducibility.
- rank: Optional[int]
Upper bound of the rank to be estimated
- mu: Optional[float]
initial stiffness parameter for the constraint M = L Q
- tau: Optional[float]
penalizing parameter for the nuclear norm
- lam: Optional[float]
penalizing parameter for the sparse matrix
- list_periods: Optional[List[int]]
list of periods, linked to the Toeplitz matrices
- list_etas: Optional[List[float]]
list of penalizing parameters for the corresponding period in list_periods
- max_iterations: Optional[int]
stopping criteria, maximum number of iterations. By default, the value is set to 10_000
- tolerance: Optional[float]
stopping criteria, minimum difference between 2 consecutive iterations. By default, the value is set to 1e-6
- norm: Optional[str]
error norm, can be “L1” or “L2”. By default, the value is set to “L2”
- verbose: Optional[bool]
verbosity level, if False the warnings are silenced
References
Wang, Xuehui, et al. “An improved robust principal component analysis model for anomalies detection of subway passenger flow.” Journal of advanced transportation (2018).
Chen, Yuxin, et al. “Bridging convex and nonconvex optimization in robust PCA: Noise, outliers and missing data.” The Annals of Statistics 49.5 (2021): 2948-2971.
- __init__(random_state: Optional[Union[int, RandomState]] = None, rank: Optional[int] = None, mu: Optional[float] = None, tau: Optional[float] = None, lam: Optional[float] = None, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = True) None[source]¶
- static cost_function(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], M: ndarray[tuple[int, ...], dtype[_ScalarType_co]], A: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], tau: float, lam: float, list_periods: List[int] = [], list_etas: List[float] = [], norm: str = 'L2')[source]¶
Estimate cost function for the noisy RPCA algorithm.
- Parameters
- DNDArray
Matrix of observations
- MNDArray
Low-rank signal
- ANDArray
Anomalies
- OmegaNDArray
Mask for observations
- tau: Optional[float]
penalizing parameter for the nuclear norm
- lam: Optional[float]
penalizing parameter for the sparse matrix
- list_periods: Optional[List[int]]
list of periods, linked to the Toeplitz matrices
- list_etas: Optional[List[float]]
list of penalizing parameters for the corresponding period in list_periods
- norm: Optional[str]
error norm, can be “L1” or “L2”. By default, the value is set to “L2”
- Returns
- float
Value of the cost function minimized by the RPCA
- decompose(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶
Compute the noisy RPCA with L1 or L2 time penalisation.
- Parameters
- DNDArray
Matrix of the observations
- Omega: NDArray
Matrix of missingness, with boolean data
- Returns
- M: NDArray
Low-rank signal
- A: NDArray
Anomalies
- decompose_on_basis(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Q: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶
Decompose the matrix D with an observation matrix Omega.
It uses the noisy RPCA algorithm, with a fixed reduced basis given by the matrix Q. This allows to impute new data without resolving the optimization problem on the whole dataset.
- Parameters
- DNDArray
_description_
- OmegaNDArray
_description_
- QNDArray
_description_
- Returns
- Tuple[NDArray, NDArray]
A tuple representing the decomposition of D with: - M: low-rank matrix - A: sparse matrix
- decompose_with_basis(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶
Compute the noisy RPCA with L1 or L2 time penalisation.
It returns the decomposition of the low-rank matrix.
- Parameters
- DNDArray
Matrix of the observations
- Omega: NDArray
Matrix of missingness, with boolean data
- Returns
- M: NDArray
Low-rank signal
- A: NDArray
Anomalies
- L: NDArray
Coefficients of the low-rank matrix in the reduced basis
- Q: NDArray
Reduced basis of the low-rank matrix
- get_params_scale(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[str, float][source]¶
Get parameters for scaling in RPCA based on the input data.
- Parameters
- Dnp.ndarray
Input data matrix of shape (m, n).
- Returns
- dict
- A dictionary containing the following parameters:
- “rank”int
Rank estimate for low-rank matrix decomposition.
- “tau”float
Regularization parameter for the temporal correlations.
- “lam”float
Regularization parameter for the L1 norm.
- static minimise_loss(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], rank: int, tau: float, lam: float, mu: float = 0.01, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = False) Tuple[source]¶
Compute the noisy RPCA with a L2 time penalisation.
This function computes the noisy Robust Principal Component Analysis (RPCA) using a L2 time penalisation. It iteratively minimizes a loss function to separate the low-rank and sparse components from the input data matrix.
- Parameters
- Dnp.ndarray
Observations matrix of shape (m, n).
- Omeganp.ndarray
Binary matrix indicating the observed entries of D, shape (m, n).
- rankint
Estimated low-rank of the matrix D.
- taufloat
Penalizing parameter for the nuclear norm.
- lamfloat
Penalizing parameter for the sparse matrix.
- mufloat, optional
Initial stiffness parameter for the constraint on M, L, and Q. Defaults to 1e-2.
- list_periodsList[int], optional
List of periods linked to the Toeplitz matrices. Defaults to [].
- list_etasList[float], optional
List of penalizing parameters for the corresponding periods in list_periods. Defaults to [].
- max_iterationsint, optional
Stopping criteria, maximum number of iterations. Defaults to 10000.
- tolerancefloat, optional
Stopping criteria, minimum difference between 2 consecutive iterations. Defaults to 1e-6.
- normstr, optional
Error norm, can be “L1” or “L2”. Defaults to “L2”.
- verbosebool, optional
Verbosity level, if False the warnings are silenced. Defaults to False.
- Returns
- Tuple
A tuple containing the following elements: - M : np.ndarray Low-rank signal matrix of shape (m, n). - A : np.ndarray Anomalies matrix of shape (m, n). - L : np.ndarray Basis unitary array of shape (m, rank). - Q : np.ndarray Basis unitary array of shape (rank, n).
- Raises
- ValueError
If the periods provided in the argument in list_periods are not smaller than the number of rows in the matrix.