qolmat.imputations.rpca.rpca_noisy.RpcaNoisy

class qolmat.imputations.rpca.rpca_noisy.RpcaNoisy(random_state: Optional[Union[int, RandomState]] = None, rank: Optional[int] = None, mu: Optional[float] = None, tau: Optional[float] = None, lam: Optional[float] = None, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = True)[source]

Class for a noisy version of the so-called ‘improved RPCA’.

Parameters
random_stateint, optional

The seed of the pseudo random number generator to use, for reproducibility.

rank: Optional[int]

Upper bound of the rank to be estimated

mu: Optional[float]

initial stiffness parameter for the constraint M = L Q

tau: Optional[float]

penalizing parameter for the nuclear norm

lam: Optional[float]

penalizing parameter for the sparse matrix

list_periods: Optional[List[int]]

list of periods, linked to the Toeplitz matrices

list_etas: Optional[List[float]]

list of penalizing parameters for the corresponding period in list_periods

max_iterations: Optional[int]

stopping criteria, maximum number of iterations. By default, the value is set to 10_000

tolerance: Optional[float]

stopping criteria, minimum difference between 2 consecutive iterations. By default, the value is set to 1e-6

norm: Optional[str]

error norm, can be “L1” or “L2”. By default, the value is set to “L2”

verbose: Optional[bool]

verbosity level, if False the warnings are silenced

References

Wang, Xuehui, et al. “An improved robust principal component analysis model for anomalies detection of subway passenger flow.” Journal of advanced transportation (2018).

Chen, Yuxin, et al. “Bridging convex and nonconvex optimization in robust PCA: Noise, outliers and missing data.” The Annals of Statistics 49.5 (2021): 2948-2971.

__init__(random_state: Optional[Union[int, RandomState]] = None, rank: Optional[int] = None, mu: Optional[float] = None, tau: Optional[float] = None, lam: Optional[float] = None, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = True) None[source]
static cost_function(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], M: ndarray[tuple[int, ...], dtype[_ScalarType_co]], A: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], tau: float, lam: float, list_periods: List[int] = [], list_etas: List[float] = [], norm: str = 'L2')[source]

Estimate cost function for the noisy RPCA algorithm.

Parameters
DNDArray

Matrix of observations

MNDArray

Low-rank signal

ANDArray

Anomalies

OmegaNDArray

Mask for observations

tau: Optional[float]

penalizing parameter for the nuclear norm

lam: Optional[float]

penalizing parameter for the sparse matrix

list_periods: Optional[List[int]]

list of periods, linked to the Toeplitz matrices

list_etas: Optional[List[float]]

list of penalizing parameters for the corresponding period in list_periods

norm: Optional[str]

error norm, can be “L1” or “L2”. By default, the value is set to “L2”

Returns
float

Value of the cost function minimized by the RPCA

decompose(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Compute the noisy RPCA with L1 or L2 time penalisation.

Parameters
DNDArray

Matrix of the observations

Omega: NDArray

Matrix of missingness, with boolean data

Returns
M: NDArray

Low-rank signal

A: NDArray

Anomalies

decompose_on_basis(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Q: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Decompose the matrix D with an observation matrix Omega.

It uses the noisy RPCA algorithm, with a fixed reduced basis given by the matrix Q. This allows to impute new data without resolving the optimization problem on the whole dataset.

Parameters
DNDArray

_description_

OmegaNDArray

_description_

QNDArray

_description_

Returns
Tuple[NDArray, NDArray]

A tuple representing the decomposition of D with: - M: low-rank matrix - A: sparse matrix

decompose_with_basis(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]], ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]

Compute the noisy RPCA with L1 or L2 time penalisation.

It returns the decomposition of the low-rank matrix.

Parameters
DNDArray

Matrix of the observations

Omega: NDArray

Matrix of missingness, with boolean data

Returns
M: NDArray

Low-rank signal

A: NDArray

Anomalies

L: NDArray

Coefficients of the low-rank matrix in the reduced basis

Q: NDArray

Reduced basis of the low-rank matrix

get_params_scale(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) Dict[str, float][source]

Get parameters for scaling in RPCA based on the input data.

Parameters
Dnp.ndarray

Input data matrix of shape (m, n).

Returns
dict
A dictionary containing the following parameters:
  • “rank”int

    Rank estimate for low-rank matrix decomposition.

  • “tau”float

    Regularization parameter for the temporal correlations.

  • “lam”float

    Regularization parameter for the L1 norm.

static minimise_loss(D: ndarray[tuple[int, ...], dtype[_ScalarType_co]], Omega: ndarray[tuple[int, ...], dtype[_ScalarType_co]], rank: int, tau: float, lam: float, mu: float = 0.01, list_periods: List[int] = [], list_etas: List[float] = [], max_iterations: int = 10000, tolerance: float = 1e-06, norm: str = 'L2', verbose: bool = False) Tuple[source]

Compute the noisy RPCA with a L2 time penalisation.

This function computes the noisy Robust Principal Component Analysis (RPCA) using a L2 time penalisation. It iteratively minimizes a loss function to separate the low-rank and sparse components from the input data matrix.

Parameters
Dnp.ndarray

Observations matrix of shape (m, n).

Omeganp.ndarray

Binary matrix indicating the observed entries of D, shape (m, n).

rankint

Estimated low-rank of the matrix D.

taufloat

Penalizing parameter for the nuclear norm.

lamfloat

Penalizing parameter for the sparse matrix.

mufloat, optional

Initial stiffness parameter for the constraint on M, L, and Q. Defaults to 1e-2.

list_periodsList[int], optional

List of periods linked to the Toeplitz matrices. Defaults to [].

list_etasList[float], optional

List of penalizing parameters for the corresponding periods in list_periods. Defaults to [].

max_iterationsint, optional

Stopping criteria, maximum number of iterations. Defaults to 10000.

tolerancefloat, optional

Stopping criteria, minimum difference between 2 consecutive iterations. Defaults to 1e-6.

normstr, optional

Error norm, can be “L1” or “L2”. Defaults to “L2”.

verbosebool, optional

Verbosity level, if False the warnings are silenced. Defaults to False.

Returns
Tuple

A tuple containing the following elements: - M : np.ndarray Low-rank signal matrix of shape (m, n). - A : np.ndarray Anomalies matrix of shape (m, n). - L : np.ndarray Basis unitary array of shape (m, rank). - Q : np.ndarray Basis unitary array of shape (rank, n).

Raises
ValueError

If the periods provided in the argument in list_periods are not smaller than the number of rows in the matrix.