1. Introduction
Matrix denoising is important in many scientific endeavors, such as the data cleansing of big data, image processing, information security and so on [1-3]. Consider that we can observe a 𝑝 × 𝑛 signal-plus-noise matrix
\(\tilde {S } =S+Z \) (1)
where 𝑆 is the true signal matrix and 𝑍 is the noise matrix. In the classic setting when the dimension of data is much smaller than the sample size, the truncated singular value decomposition (TSVD) [4] is the default technique for estimating 𝑆 from \(\tilde {S }\). Two popular methods, for choosing the truncation level, are soft-thresholding [6] and hard-thresholding[7].
The advance of technology has led to high dimensional data set whose dimensionality diverges with sample size 𝑛. In this regime, the classic multivariate analysis [8] lose its validity and Random Matrix Theory (RMT) [9] serves as a powerful technical tool. In this paper, we study the high dimensional data set when 𝑝 is comparable to 𝑛, i.e. there exists some small constant 𝜏> 0 such that
\(\tau \le { C }_{ n }\le { \tau }^{ -1 },{ C }_{ n }:=\frac { p }{ { n } } \) (2)
It is remarkable that, unlike the standard results in RMT [10], we only require the boundedness of 𝐶𝑛 instead of the convergence. This makes our algorithm more adaptive to test data set.
In this paper, we consider the estimation of 𝑆 from its noisy estimation \(\tilde S\) in the high dimensional setting when (eq.2) holds. A popular and practical assumption on 𝑆 is simultaneously low rank and sparse in the sense that 𝑆 has a finite number of nonzero singular values and sparse singular vectors. This type of data set is commonly encountered in many scientific disciplines [11-13]. A typical example is from the study of gene expression data. An microarray experiment typically assesses a large number of DNA sequences (genes, cDNA clones, or expressed sequence tags) under multiple conditions. The gene expression data in an microarray experiment can be represented by a real-valued expression matrix 𝑆, where the rows of 𝑆 correspond to the expression pattern of genes (e.g. cancer patient) and column correspond to the gene levels. A subset of gene patterns can be clustered together as a subtype of the same pattern, which in turn is determined by a subset of genes. The original gene expression matrix obtained from a scanning process contains noise, missing values, and systematic variations arising from the experimental procedure. Therefore, our discussion here provides an ideal model for the gene expression data.
In the literature of low-rank matrix estimation, nuclear norm minimization (NNM) [14,15] are proved to be useful. However, since we have sparse structure here, we expect to obtain better estimate. To handle this issue, one research line is to add more regularization items for optimization other than the nuclear norm, for instance 𝑙1 penalty, to capture the structure of sparsity [16,17]. The other research line is to apply the two-way iterative thresholding method [12] to iteratively explore the low-rank and sparse structure of 𝑆. However, in [12], we need to estimate the variance of noise based on prior information which is usually very difficult in practice.
In the present paper, we assume that 𝑆 has sparse structure in the sense that its singular vectors are sparse. As a consequence, the nonzero entries of 𝑆 are confinedon some small blocks and hence 𝑆 is highly structured. In Fig. 1, we illustrate this property using simulated synthetic data set. It can be seen that the non-zero entries are confined on some blocks of small sizes.
Fig. 1. Synthetic data set. Left panel is the image of S, whereas right panel is \(\tilde S\). 𝑆 is a rank-two matrix and its singular vectors have only 10% non-zero entries. 𝑍 is a random Gaussian matrix with unit variance.
For this type of data set, instead of investigating the sparse structure of 𝑆, the best solution is to explore the structure of the singular vectors, i.e. the positions of the non-zero entries. However, none of the existing methods explore the structure of singular vectors directly.
In this paper, we propose the ASSVD to estimate the singular values and vectors, separately. ASSVD will explore the positions of non-zero entries of the singular vectors directly. From the point of matrix decomposition, our method is rather straightforward and provide the estimates of singular values and vectors. Moreover, with the recent technical inputs from [5], we do not need to estimate the variance of the noise. We also prove that our ASSVD gives a consistent estimator when the 𝑆 is strong (see Assumption 5 below). Numerical simulations show that ASSVD outperforms over many state-of-the-art algorithms even when the 𝑆 is neither strong nor sparse.
We point out similar problems have been studied in [5] when the noise variance is one (known). Our ASSVD can deal with general noise situation without estimating the noise variance. Moreover, in this paper, we propose a novel adaptive estimator for the singular values. This estimator only uses the singular values of \(\tilde S\).
The contributions of this paper can be summarized as follows:
• We propose ASSVD, an adaptive and simple algorithm that enables the estimation of a simultaneously low-rank and sparse matrix in presence of high dimensional noise. Our ASSVD does not need to estimate the variance of noise and is adaptive to the data matrix
• We theoretically and numerically prove that ASSVD can well estimate the high dimensional data matrix and outperforms over many state-of-the-art algorithms.
• As a byproducts, ASSVD produces estimates for the singular values and vectors. Such results can be of independent interest.
The rest of this paper is organized as follows. In Section 2, we introduce the main assumptions and the proposed ASSVD. In Section 3, we design Monte-Carlo simulations to illustrate the use of ASSVD and compare with some state-of-the-art algorithms. In Section 4, we prove the theoretical properties of ASSVD. Finally, we summarize in Section 5.
2. Adaptive matrix denoising
In this section, we introduce the main assumptions will be used throughout the paper and then the algorithm: ASSVD.
2.1 Main assumptions
We assume that the entries of the white noise matrix 𝑍 = (𝑧𝑖j) are i.i.d random variables such that
\(\mathbb{E} z_{i j}=0, \mathbb{E} z_{i j}^{2}=\frac{\sigma^{2}}{n}.\) (3)
and the noise variance is bounded, i.e
\(\sigma <\infty .\) (4)
Moreover, there exists a large constant 𝐶 > 0 for 𝑘 ≤ 𝐶, a constant 𝜇𝑘 > 0,which makes
\(\mathbb{E}\left|\sqrt{n} z_{i j}\right|^{k} \leq \mu_{k}, 3 \leq k \leq C\) (5)
Denote the singular value decomposition of 𝑆 as
\(S=\sum_{i=1}^{r} d_{i} u_{i} v_{i}^{\mathrm{T}}, d_{1}>d_{2}>\cdots>d_{r}\) (6)
where 𝑟 > 0 is a fixed integer, 𝑑𝑖, 𝑢𝑖 and 𝑣𝑖 are the singular values, left and right singular vectors of 𝑆, respectively. We assume that
\(0<{ d }_{ i }<\infty ,\quad 1\le i\le r \) (7)
Moreover, we assume that 𝑢𝑖, 𝑣𝑖 are sparse. Specifically, let \({ m }_{ u }^{ i }\)and \({ m }_{ v }^{ i }\) be the number of non-zero entries of 𝑢𝑢𝑖𝑖 and 𝑣𝑣𝑖𝑖, respectively. Denote
\(w=\max _{1 \leq i \leq r}\left\{m_{u}^{i}, m_{v}^{i}\right\}\)
then assume that there exists some constant 𝐶1 > 0 such that
\(w\le { C }_{ 1 }. \) (8)
In light of (2), we define the sparsity level of 𝑆 as
\(s=\frac { w }{ n } \)
We conclude from (8) that 𝑠 → 0 when 𝑛𝑛 → ∞.
For future reference, we summarize the assumptions of the present paper.
Assumption 1. For the model (1), we assume that (2), (3), (4), (5]), (6), (7) and (8) hold true.
2.2 Adaptive spare singular value decomposition (ASSVD)
We now introduce our algorithm, ASSVD. As mentioned in Section 1, our algorithm estimates the singular values and vectors, separately.
We first introduce some notations. Denote the eigenvalues and eigenvectors of \(\tilde S\)\(\tilde S\)T as 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑝 and \(\tilde u\)1 , ⋯ , \(\tilde u\)𝑝, respectively. Similarly, we define the eigenvalues and eigenvectors of \(\tilde S\)T\(\tilde S\) as 𝜇1 ≥ 𝜇2 ≥ ⋯ ≥ 𝜇𝑛 and \(\tilde v\)1 , ⋯ , \(\tilde v\)𝑛, respectively. Since \(\tilde S\)\(\tilde S\)𝑇 and \(\tilde S\)T \(\tilde S\) have the same non-zero eigenvalues, in order not to cause confusion, we define them as 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑝∧𝑛, 𝑝 ∧ 𝑛 = min{𝑝, 𝑛}. Note that {\(\tilde u\)𝑖} and {\(\tilde v\)𝑖} are the left and right singular vectors of \(\tilde S\), respectively. We next make it precise of the strong signal.
Definition 2. For 1 ≤ 𝑖 ≤ 𝑟, we say 𝑑𝑖 is a strong signal if
\(\frac { { d }_{ i } }{ \sigma } >{ c }_{ n }^{ 1/4 }+{ k } , \)
where 𝜅 is a fixed small constant.
Next, we define some statistics. For a given parameter 𝑞 and when ≤ 𝑞, denote
\(\hat{m}_{1}\left(\lambda_{i}\right)=\frac{1}{p} \sum_{j=q+1}^{p} \frac{1}{\lambda_{i}-\lambda_{j}}, \hat{m}_{2}\left(\lambda_{i}\right)=\frac{1}{n} \sum_{j=q+1}^{p} \frac{1}{\lambda_{i}-\mu_{j}}.\)
It will seen later that 𝑞 is used to estimate the number of strong signals and we refer it as the rank estimate. Its definition and construction will be discussed in Section 2.3. Armed with the above preparation, we introduce ASSVD as Algorithm 1.
Remark 3. First of all, from the above procedure, when 𝑑𝑑𝑖𝑖 is a strong signal satisfying Definition 2, we conclude that \(\hat d\)𝑖 , \(\hat u\)𝑖 and \(\hat v\)𝑖 are the estimates of 𝑑𝑖, 𝑢𝑖 and 𝑣𝑖, respectively. Secondly, our ASSVD is adaptive to our data matrix only and we do not need to estimate the variance of the noise.
2.3 Choice of parameter
As we have seen from the ASSVD algorithm, the number of strong signals of 𝑆 needed to be estimated separately using the parameter 𝑞𝑞. In this paper, we employ the resampling procedure [21] to choose 𝑞𝑞. The main idea behind the construction is to use the information of magnitude of singular values of \(\tilde S\). Heuristically, as we can see from Theorem 6 later, if 𝑑𝑖, 𝑑𝑖+1 are both strong signals, then the ratio of their corresponding singular values 𝜆𝑖/𝜆𝑖+1 will be well-separate from one. On the other hand, if both of them are weak signals in the sense that Definition 2 fails, their ratios will be close to one. Hence, there exists a transition point for the ratio of consecutive singular values of 𝑆𝑆˜ and this happens between the 𝑞-th and (𝑞 + 1)-th singular values, which information will be used to construct the statistic.
Remark 4. It can conclude from the above algorithm that with probability 𝛽 (say 𝛽 = 0.98), 𝑞 will be a reasonably statistic for the number of strong signals. The 𝜍 is chosen to make precise of being far away from one.
Table 1. Estimate of singular values 𝑑1 = 10 using (10). We report the averaged estimate over 104 simulations.
3. Simulations
3.1 Performance of the estimates \(\hat d_i\), \(\hat u_i\) and \(\hat v_i\)
As mentioned before, ASSVD estimates the singular values and vectors separately. We first study the performance of such estimators using a rank-two example under various noise level when Assumption 5 holds. To generate sparse singular vectors, we use the R package R1magic. The noise matrix is chosen to be a random Gaussian matrix generated from the R package mvtnorm. In the simulations below, we set
\(S={ d }_{ 1 }{ u }_{ 1 }{ v }_{ 1 }^{ T }+{ d }_{ 2 }{ u }_{ 2 }{ v }_{ 2 }^{ T }\).
Here \(u_{i} \in \mathbb{R}^{p}, v_{i} \in \mathbb{R}^{n}, i=1,2\) are generated using R1magic with 𝑠 = 0.1 and 𝑢1 ⊥ 𝑢2, 𝑣1 ⊥ 𝑣2 with 𝑑1 = 10, 𝑑2 = 7. In Table 1, we report the estimation of 𝑑1 using (9) for a variety choices of noise levels and combinations (𝑝, 𝑛). It can be seen that we estimator is robust against all such combinations.
Next, we consider the accuracy of the estimation for singular vectors. We report the results of the left and right singular vectors for a fixed noise level in Fig. 2 and 3 respectively. It can be seen that our estimate is quite accurate.
Fig. 2. Estimation of left singular subspace for σ = 2, p = 200, n = 400. Left panel is the true subspace whereras right panel is the estimated subspace.
Fig. 3. Estimation of right singular subspace for σ = 2, p = 200, n = 400. Left panel is the true subspace whereras right panel is the estimated subspace.
3.2 Comparison with other algorithms
In this section, we compare ASSVD with some state-of-the-art algorithms. Specifically, we compare with the sparse singular value decomposition (SSVD) in [12], the nuclear norm minimization with 𝑙1 penalty [16] (NSNM), optimal shrinkage of singular values (OSSVD) in [18], shrinkage estimates (OptShrink) in [19] and the truncated SVD (TSVD). For the implementation of SSVD, we use an R package ssvd contributed by the first author of [12].
For the shrinkage estimates in [18,19], the Matlab codes can be found on the author’s websites.
Table 2. Comparison of different algorithms using Frobenius norm. We record the estimation errors for different methods averaged over 104 simulations. We highlight the smallest error terms.
First of all, we study the performance of various methods for a fixed noise level σ = 1. We use the same setup as in Section 3 by varying the sparsity level 𝑠 between 0.05 and 0.45. It can be concluded from Table 2 that: (i). ASSVD outperforms the other algorithms at all levels of sparsity and combinations of 𝑝 and 𝑛; (ii). Even though we assume (8) and subsequently 𝑠 → 0 asymptotically, numerical simulations indicate that our estimation is still reasonable accurate when 𝑆 is not very sparse; (iii). TSVD has the worst performance and becomes worse with the increase of dimension; but it is stable under sparsity variation; (iv). SSVD has stable and smaller errors at all sparsity levels. However, we will show later that it will become worse (as indicated in when 𝑑1, 𝑑2 increases); (v). The penalty method becomes worse when the sparsity level increases.
We mention that, in this setting, both 𝑑1 and 𝑑2 are strong signals.
4 Theoretical properties
In this section, we state the main statistical properties of ASSVD. The key ingredients for our paper are the convergence limits and rates for the singular values and vectors.
4.1 Convergence of singular values and vectors of \(\tilde S\)
In [5], the author computed the convergence limits and rates for the singular values and vectors when 𝜎2 = 1. We extend the results for general noise level 𝜎.
\(\theta (d):=\frac { ({ d }^{ 2 }{ \sigma }^{ 2 })({ d }^{ 2 }{ c }_{ n }{ \sigma }^{ 2 }) }{ { d }^{ 2 } } \)
and
\({ a }_{ 1 }(d):=\frac { { d }^{ 4 }-{ c }_{ n } }{ { d }^{ 2 }({ d }^{ 2 }+{ c }_{ n }) } ,\quad { a }_{ 2 }(d)=\frac { { d }^{ 4 }-c_{ n } }{ { d }^{ 2 }({ d }^{ 2 }+1) } .\)
We next introduce the assumptions on the strength of the signals 𝑑𝑖, 1 ≤ 𝑖 ≤ 𝑟. Assumption 5. Suppose that for some 1 ≤ 𝑟+ ≤ 𝑟 and some small constant 𝜅 > 0, we have
\(d_{i}>\sigma d_{n}^{1 / 4}+\kappa,\left|d_{i}-d_{j}\right| \geq \kappa, 1 \leq i \neq j \leq r^{+}\)
Moreover, when 𝑟+ + 1 ≤ 𝑘 ≤ 𝑟, we assume
\({ d }^{ k }<\sigma { c }_{ n }^{ 1/4 }.\)
We next state the results for the singular values and vectors.
Theorem 6. We suppose that Assumptions 1 and 5 hold true. For any given small 𝜖 > 0, there exists a large constant 𝐷 ≡ 𝐷(𝜖) > 0, such that for sufficiently large𝑛𝑛, with probability at least 1 − 𝑛−𝐷, we have
\(\left|\lambda_{i}-\theta\left(d_{i}\right)\right| \leq n^{-1 / 2+\epsilon}, 1 \leq i \leq r^{+},\) (11)
and
\(\left|\left\langle u_{j}, \tilde{u}_{i}\right\rangle^{2}-\delta_{i j} a_{1}\left(d_{i}\right)\right| \leq\left(\delta_{i j} n^{-1 / 2+\epsilon}+n^{-1+\epsilon}\right),\)
where 𝛿ij = 1 when = 𝑗 and 𝛿ij = 0 otherwise. Furthermore, for 𝑟+ + 1 ≤ 𝑘 ≤ 𝑟, we have
\(\left|\lambda_{k}-\sigma^{2}\left(1+d_{n}^{1 / 2}\right)^{2}\right| \leq n^{-2 / 3+\epsilon}\)
and
\(\left\langle u_{l}, \tilde{u}_{k}\right\rangle^{2} \leq n^{-1+\epsilon},\left\langle v_{l}, \tilde{v}_{k}\right\rangle^{2} \leq n^{-1+\epsilon}, 1 \leq l \leq r\)
Proof. Denote \(\tilde S_1\) = \(\tilde S\) /𝜎, 𝑆1 = 𝑆/𝜎 and 𝑍1 = 𝑍/𝜎. The results for the model
\(\tilde S_1 =S_1 +Z_1\)
have been established in [Section 2 Theorem 2.2 and 2.3]. Note that \(\tilde S_1\) and \(\tilde S\) have the same singular vectors and 𝜆(\(\tilde S\)) = 𝜎𝜆(\(\tilde S_1\)). We can therefore conclude the proof using [Section 2 Theorem 2.2 and 2.3].
We remark that the almost surely convergence results have been established in [20] using Free Probability Theory. We provide convergent rates in the above theorems.
4.2 Convergence of the estimator \(\tilde S_{assvd}\)
With the preparation of Theorem 6, we next establish the properties of our estimator \(\tilde S_{assvd}\) under Frobenius norm. Recall (10).
Theorem 7. Suppose that Assumptions 1 and 5 hold true. Then for some small constant 𝜖 > 0, there exists a large constant 𝐷 ≡ 𝐷(𝜖) > 0, such that for a sufficiently large 𝑛, with probability at least 1 − 𝑛−𝐷, we have
\(|| S-\hat{S}_{\text {assvd }}||_{F} \leq n^{-1 / 2+\epsilon}+\sqrt{\sum_{i=r^{+}+1}^{r} d_{i}^{2}}.\)
Proof. We decompose 𝑆 = 𝑆𝑜 + 𝑆𝑏, where
\(S_{o}=\sum_{i=1}^{r^{+}} d_{i} u_{i} v_{i}^{\mathrm{T}}, S_{b}=\sum_{i=r^{+}+1}^{r} d_{i} u_{i} v_{i}^{\mathrm{T}}.\)
It is easy to see that
\(|| S-\hat{S}_{a s s v d}||_{F} \leq|| S_{o}-\hat{S}_{a s s v d}||_{F}+\sqrt{\sum_{i=r^{+}+1}^{r} d_{i}^{2}}.\)
From the proof of [2, Theorem 3.4] (see equation (5) there), we find that with probability at least 1 − 𝑛−D
\(|| S_{o}-\hat{S}_{a s s v d}||_{F}^{2} \leq n^{-1+2 \epsilon}+2 \sum_{i=1}^{r^{+}}\left(\hat{d}_{i}-d_{i}\right)^{2}.\)
Moreover, by [2, Proposition 3.3], we find that 𝑞 = 𝑟+ with probability at least 1 − 𝑛−𝐷. Therefore, the proof follows from the following lemma and its proof can be found in the appendix.
Lemma 8. Recall the estimate \(\hat d_i\) in (9). Assume the assumptions of Theorem 7 holds. Then with probability at least 1 − 𝑛−𝐷, we have
\(\left|\hat{d}_{i}-d_{i}\right| \leq n^{-1 / 2+\epsilon}, i \leq r^{+}.\)
We conclude from Theorem 7 that when 𝑟+ = 𝑟, i.e. all signals are strong, ASSVD can provide us a consistent estimator. However, in this situation, the shrinkage algorithms (OSSVD [18] and OptShrink [19]) obtain bound
\(\sqrt{\sum_{i=1}^{r} d_{i}^{2}\left(1-a_{1}\left(d_{i}\right) a_{2}\left(d_{i}\right)\right)}>0\)
since 0 < 𝑎1(𝑑𝑖), 𝑎2(𝑑𝑖) < 1.
For the iterative thresholding method SSVD , even though we theoretically have the same rate with them, numerically simulations show better performance than them. Moreover, since our algorithm does not involve any iterations, ASSVD is more simple and fast in the implementation.
For the penalty method, there does not exist any literature on proving the optimal bounds. However, as we can see from the mini-max bound in [17] that it will be bounded by the nuclear norm, which is strictly positive.
5. Conclusions and discussions
In this paper, we study the problem of estimating a simultaneously low-rank and sparse matrix from a high dimensional noisy observation. We propose an efficient algorithm, adaptive sparse singular value decomposition (ASSVD), by exploring the structure of the singular values and vectors. The inputs of ASSVD are based on recent developments in Random Matrix Theory. An main advantage is that we do not need to estimate the variance of the noise.Theoretical analysis shows that ASSVD outperforms over many existing methods. Extensive experimental results demonstrate the efficiency and efficacy of our proposed method. Moreover, ASSVD still works very well even when the data matrix is not very sparse. One future direction is to generalize this idea to incorporate high dimensional heteroskedastic noise. It is also very interesting to explore the situation when the rank of 𝑆 diverges with 𝑛.
References
- Z. H. Xia, L. H. Lu, T. Qiu, H. J. Shim, X. Y. Chen, and Byeungwoo Jeon, "A Privacy-Preserving Image Retrieval Based on AC-Coefficients and Color Histograms in Cloud Environment," Computers, Materials & Continua, Vol. 58, No. 1, pp. 27-43, 2019. https://doi.org/10.32604/cmc.2019.02688
- X. Y. Chen, H. D. Zhong, and Z. F. Bao, “A GLCM-Feature-Based Approach for Reversible Image Transformation,” Computers, Materials & Continua, Vol. 59, No. 1, pp. 239-255, 2019. https://doi.org/10.32604/cmc.2019.03572
- L. Z. Xiong, and Y. Q. Shi, "On the Privacy-Preserving Outsourcing Scheme of Reversible Data Hiding Over Encrypted Image Data in Cloud Computing," Computers, Materials & Continua, Vol.55, No.3, pp.523-539, 2018.
- Dennis I. Merino, "Topics in Matrix Analysis," Cambridge University Press, 2008.
- X.C. Ding, "High dimensional deformed rectangular matrix with applications in matrix denoising," Bernoulli, Vol.26, pp.387-417, 2020.
- D. Donoho, "De-noising by soft-thresdholding," IEEE Trans. Inf. Theory, vol. 41, pp.613-627,1995. https://doi.org/10.1109/18.382009
- M. Gavish, and D. L. Donoho, "The Optimal Hard Threshold for Singular Values is p 4/3," IEEE Trans. Inf. Theory, vol. 60, pp.5040-5053, 2014. https://doi.org/10.1109/TIT.2014.2323359
- N. H. Timm, Applied Multivariate Analysis, Springer Texts in Statistics, Springer Science & Business Media, 2007.
- L. Erd˝os, and H.-T. Yau, "A Dynamical Approach to Random Matrix Theory," Courant Lecture Notes, American Mathematical Soc, Vol. 28, 2017.
- Z. D. Bai, and J.W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, 2nd Edition, Springer Series in Statistics, Springer-Verlag New York, 2010.
- B. Pontes, R. Giraldez, and J. Aguilar-Ruiz, "Biclustering on expression data: A review," J. Biomed. Inform., vol. 57, pp. 163-180, 2015. https://doi.org/10.1016/j.jbi.2015.06.028
- D. Yang, Z. Ma, and A. Buja, "Rate optimal denoising of simultaneously sparse and low rank matrices," J. Mach. Learn. Res., vol. 17, pp. 1-27, 2016.
- R. Otazo, E. Cand'es, and D. K. Sodickson, "Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components," Magn Reson Med., vol 73, pp. 1125-1136, 2014. https://doi.org/10.1002/mrm.25240
- J. F. Cai, E. J. Cand'es, and Z. W. Shen, "A Singular Value Thresholding Algorithm for Matrix Completion," SIAM J. Optim., vol. 20, pp.1956-1982, 2010. https://doi.org/10.1137/080738970
- M. Gavish, and D. L. Donoho, "Minimax risk of matrix denoising by singular value thresholding," Ann. Statist., vol. 42, pp. 2413-2440, 2014. https://doi.org/10.1214/14-AOS1257
- P. V. Giampouras, K. E. Themelis, A. A. Rontogiannis, and K. D. Koutroumbas, "Simultaneously Sparse and Low-Rank Abundance Matrix Estimation for Hyperspectral Image Unmixing," IEEE Transactions on Geoscience and Remote Sensing, vol 54, pp.4775-4789, 2016. https://doi.org/10.1109/TGRS.2016.2551327
- E. Richard, P-A. Savalle, and N. Vayatis, "Estimation of simultaneously sparse and low rank matrices," in Proc. of ICML'12, pp.51-58, 2012.
- M. Gavish, and D. L. Donoho, "Optimal shrinkage of singular values," IEEE Trans. Inf. Theory, vol. 63, pp.2137-2152, 2017. https://doi.org/10.1109/TIT.2017.2653801
- R. R. Nadakuditi, "OptShrink: An Algorithm for Improved Low-Rank Signal Matrix Denoising by Optimal, Data-Driven Singular Value Shrinkage," IEEE Trans. Inf. Theory, vol. 60, pp.3002-3018, 2014. https://doi.org/10.1109/TIT.2014.2311661
- F. Benaych-Georges, and R. R. Nadakuditi, "The singular values and vectors of low rank perturbations of large rectangular random matrices," J. Multivariate Anal., vol.111, pp. 120-135, 2012. https://doi.org/10.1016/j.jmva.2012.04.019
- D. Passemiera, and J.F. Yao, "Estimation of the number of spikes, possibly equal, in the high-dimensional case," J. Multivariate Anal., vol 127, pp.173-183, 2014. https://doi.org/10.1016/j.jmva.2014.02.017
- A. Knowles, and J. Yin, "Anisotropic local laws for random matrices," Probab. Theory Relat. Fields, vol. 169, pp.257-352, 2017. https://doi.org/10.1007/s00440-016-0730-4
- W. Li, B. Zhu, "A 2k-vertex kernel for Vertex Cover based on Crown Decomposition," Theoretical computer science, vol 739, pp.80-85, 2018. https://doi.org/10.1016/j.tcs.2018.05.004