DOI QR코드

DOI QR Code

Non-Iterative Threshold based Recovery Algorithm (NITRA) for Compressively Sensed Images and Videos

  • Poovathy, J. Florence Gnana (Department of ECE, SSN College of Engineering) ;
  • Radha, S. (Department of ECE, SSN College of Engineering)
  • 투고 : 2015.03.30
  • 심사 : 2015.08.20
  • 발행 : 2015.10.31

초록

Data compression like image and video compression has come a long way since the introduction of Compressive Sensing (CS) which compresses sparse signals such as images, videos etc. to very few samples i.e. M < N measurements. At the receiver end, a robust and efficient recovery algorithm estimates the original image or video. Many prominent algorithms solve least squares problem (LSP) iteratively in order to reconstruct the signal hence consuming more processing time. In this paper non-iterative threshold based recovery algorithm (NITRA) is proposed for the recovery of images and videos without solving LSP, claiming reduced complexity and better reconstruction quality. The elapsed time for images and videos using NITRA is in ㎲ range which is 100 times less than other existing algorithms. The peak signal to noise ratio (PSNR) is above 30 dB, structural similarity (SSIM) and structural content (SC) are of 99%.

키워드

1. Introduction

With paramount growth in data flow in wireless sensor networks (WSNs), compression of data becomes essential, especially when it comes to images and videos. Traditional ways of compression requires all samples of image or video for perfect reconstruction. But in 2004, it was found that signals can be perfectly reconstructed even with very few samples which led to the development of the technique of Compressed Sensing (CS) [1-4]. CS involves the compression of original information into measurements from which the original signal can be got back at the receiver side efficiently with better quality and minimized delay in reconstruction, thus making CS applicable for WSNs [2].

Greedy algorithms like Orthogonal Matching Pursuit (OMP) [5], Compressed Sampling Matching Pursuit (CoSaMP) [6], Stagewise Orthogonal Matching Pursuit (StOMP) [7], Enhanced Orthogonal Matching Pursuit (EOMP) [8], Iterative Hard Thresholding (IHT) [9] etc. are prominent reconstruction techniques which solves LSP iteratively. Larger the number of iterations, greater is the time delay in reconstruction and complexity which is disadvantageous while dealing with large amounts of data. This paper proposes NITRA which has no iterations but threshold operation, to estimate the signal from the compressed measurements exhibiting reduced complexity with less elapsed time. The PSNR, SSIM and the quality of the resultant reconstructed image/video is good and has least error. This new algorithm is benchmarked by comparing various parameters like PSNR, SSIM, SC etc. with other currently used algorithms.

Section 2 describes the related work undergone so far on compressed sensing and reconstruction, Section 3 gives the notations used in this paper. In section 4 the basic compressed sensing procedure is explained, Section 5 projects the NITRA procedure and its mathematical error bound calculations, Section 6 discusses the results and comparisons and Section 7 concludes the whole work.

 

2. Related Work

Many have contributed to the field of CS in various ways, building efficient and strong procedures to compress and recover signals, images or videos. Dana Mackernzie [1], in his paper, “Compressed Sensing Makes Every Pixel Count”, describes the initial stages and ideas involved in the history of compressed sensing. Masiero et al. [2], explains the Bayesian analysis of CS and have displayed empirical proof that CS can be used as a practical compression and recovery technique by using real time data collected with WSN testbed. Emmanual Candes et al. and David L. Donoho have described in their respective papers [3, 4] about reconstruction of data from very less number of frequency components or from incomplete frequency samples.

OMP [5], proposed by T. Tonu Cai and Lie Wang selects the exact support of the signal with high probability allowing perfect reconstruction. Thomas Blumensath [9] has introduced another efficient algorithm for reconstruction of data from measurements called, ‘Iterative Hard Thresholding (IHT) algorithm’, where a hard threshold operator is used for selecting only the required values which are sufficient for reconstruction of the compressed data. It was Zhiling Zhang [10], who compared some of the prominent algorithms in terms of failure rate and mean square error has proved that T-MSBL shows superiority over those algorithms. There are many parameters to benchmark the quality of the reconstructed data like PSNR, MSE, Average difference, Maximum Difference etc. These measuring parameters were presented by Marta Mrak et.al. in [11]. Needell and Tropp, in [6] have proposed some lemmas on the basis that the measurement matrix follows RIP which is being used in this paper.

 

3. Notations

The following notations are used in this paper: 'x' is the input signal vector of size (N×1) where N = n2 , n is number of rows/columns. 'y' is a M - dimensional vector i.e., measurement vector and 'xs' is a N dimensional vector which has been sparsifed in some transform domain using suitable basis functions available, hence the name ‘sparse vector’. The sparsity of the sparse vector is represented by 's' . The ‘measurement matrix’ used is denoted by the notation 'Φ' which is of dimension (M×N). The transpose of this matrix is denoted by 'ΦT' of size (N×M) . The thresholding operator that is being used in this algorithm is 'β' which is calculated based on the measurements 'y'. The estimated signal is represented by 'xr' and the residual by rr = xs - xr , which is the error between the input sparse signal and the estimated signal. The vector obtained before thresholding is represented by the notation 'xbt' . 'e' denotes the observation error and Φaug represents the augmented measurement matrix formed using identity and zero matrices.

 

4. Theory of Compressed Sensing

Consider a real valued input vector ‘x’ of size N × 1, i.e. x : x ∈ ℜm , where ‘m’ is the real space dimension. CS can be applied only to those data that are sparse, i.e. data which contain only less number of non-zero values. Any vector ‘xs’ is ‘s’ sparse if there are ‘s’ non-zero values in it. However, all the signals are not naturally sparse and hence they are intentionally sparsified using a transform basis. The transformation and sparsification are carried out using orthonormal basis function, denoted by ‘ψ’, which may be DCT, DWT, DFT etc. depending upon the application. Equation (1) depicts the sparse vector formation [12]:

The sparsified signal is dimensionally undersized to form the measurements ‘y’ using measurement matrix ‘φ’, which is of size M×N . Gaussian, Binary, Toeplitz, Hankel, Kronecker product etc. and their combinations are used widely as measurement matrices. ‘ y ’ is of dimension (M×1) and is used to reconstruct the input image at the receiver end. The receiver is provided with both ‘ y ’ and ‘φ’ by the transmitter with which it reconstructs the input data. Since xs = ψx the mathematical representation for calculating measurement vector from the sparsified vector is given by equation (2):

4.1 General Compressed Sensing Framework

Fig. 1 describes the CS framework at both transmitter and receiver sides and is common for both images and videos [13, 14]. The test videos considered here are in uncompressed YUV format and only the luminance component is used for further processing.

Fig. 1.General framework for Compressed Sensing

The n×n pixel sub-matrices are transformed into another domain for sparsification using DCT and the resultant coefficients are sparsified. The measurement vector is calculated by processing the sparse vector with measurement matrix φ which may be any random matrix. The disadvantage of selecting a random matrix is that, for every execution, the values of the matrix changes resulting in variations in the execution time, output quality, efficiency, PSNR etc. Generating a fixed measurement matrix will fix the above said problem.

After many trials augmented matrix of size M×N , given by equation (3), formed by combining identity matrix of size M×M, and zero matrix of size M×(N-M), was found to extract necessary information selectively since it has unity values in the leading diagonal. The advantages of augmented matrix are that they are compact, easy to be expressed and useful for faster implementations and a constant output will be obtained for every execution.

where Z represents null matrix. Augmented matrix aids better in compression and reconstruction processes since the matrix follows restricted isometry property (RIP) and principle transform sparsity. The second and third lemmas were proposed by Needell and Tropp [6] which are used in this paper to prove the error bound of the proposed algorithm.

Lemma 1: For a measurement matrix φ which satisfies RIP with sparsity s

And also

A perfect measurement matrix should satisfy the following Lemmas proposed by Needel and Tropp, Proposition 3.5 in [6].

Lemma 2: If φ satisfies the RIP ║φxs║2 ≤ ║xs║2 , ∀x : ║xs║0 ≤ s, then

Lemma 3: For any ‘x’ let ‘xs’ be the best approximation to ‘x’. Let xr = x − xs . Let . If the RIP holds for sparsity ‘s’, then the error can be bounded by

where, ║e║2 is the observation error which is zero. Now that φ which satisfies RIP has been obtained, measurement vector y is to be calculated. y is calculated by multiplying φ(M×N) and . The resultant vector y(M×1) is transmitted to the receiver along with φ. Thus transmission of the whole input image block is reduced to just M < N measurements resulting in lot of reduction in execution time, complexity etc. If there are p blocks in a frame/image, then the total number of measurements required for reconstructing the frame/ image will be pM.

 

5. Non-Iterative Threshold based Recovery Algorithm

NITRA and other algorithms used in this paper were developed as MATLAB scripts. Dell Inspiron laptop with Intel i5 core processor was used as a computing resource for the execution of all the algorithms mentioned in this paper and the comparison of various objective measures of these algorithms. Upon the reception of y and φ, the receiver should apply a robust reconstruction algorithm to these inputs and recover the original input image/video. Many CS reconstruction algorithms as in [5-9] iteratively solve LSP where the number of iterations depends upon either sparsity or any comfortable fixed number. The proposed algorithm NITRA is named after its algorithmic procedure which involves no iteration to find the best match. It uses only transpose function and a thresholding operator β, where β depends upon the measurement vector y . The threshold operator is calculated using equation (10).

where s is the sparsity given as s = ║x║0. y is considered as an important metric to find the threshold because among the three inputs that are to be provided to the receiver namely, φ, y and s, y carries the information about the pixel values of the image or video frame in the form of coefficients. Table 1 provides the NITRA algorithm which uses these inputs to recover the images and videos.

Table 1.NITRA for Reconstruction of Image/Video at the Receiver

At the transmitter end, the measurement vector y is obtained by finding the inner product of φ and xs as explained in section 4. In NITRA, φ is fixed for every block and hence it will be sufficient to transmit φ only once to the receiver thus consuming less memory.

According to the NITRA’s procedure provided in Table 1, at the receiving end, the estimation of the original image is made by taking the inner product of φT and y. Only those values which satisfy the threshold condition β are selected for reconstruction and the others are made zeros thereby reducing the number of computations. NITRA does not have iterations within every block which is advantageous in the context of reducing computational complexity, execution time etc. making it suitable for WSNs. After threshold operation, the resultant values are directly transformed back to the real space by taking 2D-IDCT. Since the proposed NITRA uses augmented matrix (as in equation (3)) as sensing matrix which has large number of zeros and unity in leading diagonal, it senses only the necessary information required for perfect reconstruction. NITRA does not need to solve LSP to find the best solution unlike the other conventional CS recovery algorithms which use random matrices which makes solving LSP mandatory. Also, in existing algorithms, the sensing matrix is generated separately for each split block and hence the total performance cannot be relied upon in one single execution of the algorithm. The feasibility of reconstruction of images and videos using NITRA, its error bound and accuracy are proven mathematically by deriving the following theorems and lemmas.

5.1 Error bound of the proposed method

Since NITRA is a lossy technique of recovery of compressively sensed images and video frames, the perfection in reconstruction must be verified mathematically by checking the error in reconstruction. Using lemmas from [6], the following theorems have been proposed to prove that the error in reconstruction of images and videos by NITRA is within the minimal range. Theorem 1 gives the condition which is to be satisfied for perfect reconstruction. The norm of the difference between the original input vector x and the estimated vector xr must be less than or equal to the sum of second norm of sparse vector xs and the estimated error . depends upon x, xs and e as in equation (12) where eis the assumed error and can be neglected since it tends to zero.

Theorem 1: Considering a noisy observation, y = φx + e, where x is a vector. Lex xs be the sparse vector with s non-zero elements. NITRA will recover the estimated signal xr of the input x by satisfying the following condition:

The accuracy of NITRA for estimating x can be represented by

Equation (13) gives the accuracy of NITRA in estimating the original image from the compressed form. The error between the original input vector and the estimated input vector is less than the sum of first and second norms of the difference between original and the sparse vector. NITRA satisfies this equation exactly thus providing greater accuracy.

Proof of error bound in Theorem 1

Initially, NITRA satisfies the condition given by equation (14)

where, x is any input vector. Since RHS of equations (11) and (14) are equal, LHS can be equated and is substituted from equation (12). The resultant will be

Substituting equation (15) in equation (14), equation (16) is obtained.

In order to apply Lemma 3 to represent the error bound , the term is multiplied to equation (16) in the following fashion:

Multiplication of to is avoided since δs << 1. Substituting equation (9) in equation (17), the error between the original vector and the reconstructed vector is found to prove equation (11), hence proving that the error by using NITRA for reconstruction is found to be minimum. The steps are as follows:

When the difference metric δs << 1, the denominator of equation (18) tends to unity and hence the final equation will be,

This proves that the error is far less than the combined errors obtained by adding ║xs║2 and the difference terms ║x - xs║2 and .

Theorem 2: Given a noisy observation y = φx + e , where xs is s sparse vector, if φ has RIP, then NITRA will recover an approximation xr satisfying

The accuracy of the estimation is

Proof of error bound in Theorem 2

The estimated error is dependent on the term ║xs - xbt║2 . This is the difference between the estimates before and after thresholding operation. With the help of triangular inequality, we can express the error to be,

After thresholding, xr becomes the best approximation to xbt than xs which means

and thus the error will not exceed twice the value of ║xs - xbt║. The error is expressed as,

But xbt =φTy + e = φTφxs + e . Using all the above findings, the error is represented as follows:

From Lemma 1, it can be written that, and ║(I - φTφ)xs║2 ≤ δs║xs║2

The term ║xs - xr║2 is nothing but the residue which can be denoted by ║rr║2. Therefore,

This can also be written as

where, a = 2δs and . The range of δs is 0 < δs < 1 . Hence ║rr║2 can be approximated as

Equation (28) proves that the error between the sparse vector and the estimated vector is less than or equal to the sum of norms of sparse vector and the observation error. On the assumption that the observation error is zero, i.e. ║e║2 = 0, we can write equation (28) as ║xs - xr║2 ≤ ║xs║2 or ║rr║2 ≤ ║xs║2 .

 

6. Results and Discussions

The quality of the recovery algorithm can be benchmarked by analyzing the same in two ways: analyzing the algorithm itself and by analyzing the results of the algorithm. Algorithm analysis can be carried out using various mathematical metrics like complexity calculation, time consumption etc. Validation of the results of the algorithm can be carried out using various quality measuring parameters like PSNR, SSIM, SC etc. The perceptual and objective quality measures of NITRA are calculated as given in the following subsections. The images and videos are in uncompressed portable network graphics (PNG)/tagged image file formats (TIFF) and YUV format respectively. More than 10 inbuilt test images in MATLAB and 10 video sequences taken from [15] were used to validate the efficiency of NITRA. The test images portrayed in this paper are lena, peppers, onion, coins and autumn and the test videos are foreman, akiyo, bus, Stephan and mother-daughter series. All comparisons and calculations portrayed here are measured when the number of measurements is minimum, i.e. M = 20 (considering only 31.25% of the original information). The objective measures which give the numerical estimation of the quality of the output considered for qualifying NITRA are peak signal to noise ratio (PSNR), mean square error (MSE), structural similarity (SSIM), structural content (SC), normalized cross correlation (NK), maximum difference (MD), mean absolute error (MAE) and normalized absolute error (NAE) [11].

6.1 Perceptual quality

As far as image and video processing is concerned, any algorithm is considered efficient when the results of the algorithm are perceptually perfect. Fig. 2 and Fig. 3 are the proofs that the perceptual qualities of the recovered images/videos are conserved to a greater extend along with better PSNR for only 20 measurements. For applications where clarity along with reduced computation time and complexity is required, NITRA will be a suitable choice. It is super important to note that the clarity can further be increased by increasing the number of measurements used during recovery process.

Fig. 2.Reconstructed images using NITRA with number of measurements and their corresponding PSNR in dB.

Fig. 3.Reconstructed video frames using NITRA with number of measurements and their corresponding PSNR in dB.

6.2 Comparison of perceptual quality of NITRA with LSP based algorithms

The perceptual quality of NITRA can be proved excellent when the output image and video frames are compared with the same obtained from other algorithms. The LSP based algorithms considered for comparison are OMP and StOMP. StOMP is proved to be best among the greedy algorithms for image and video reconstruction. It is a variation of OMP and hence these two LSP algorithms are considered for comparison of the performance of NITRA. Fig. 4 gives the quality comparison in terms of visual perception for both images and videos.

Fig. 4.Reconstructed image and video frame (frame no. 4) using a) and d) are the inputs b) and e) are reconstructed using NITRA c) and f) are reconstructed using OMP and d) and g) are reconstructed using StOMP. The PSNR displayed here are calculated for M = 20

From Fig. 4, it is evident that NITRA displayed better visual quality than OMP and StOMP for least number of measurements i.e. M = 20. The clarity of the image and video frame as a whole is high in NITRA. OMP and StOMP displays blockiness artifact which is avoided in NITRA due to the usage of augmented matrix for sensing the appropriate values from the sparse vector. The PSNR obtained by the LSP based algorithms are around 24 dB while NITRA exhibits a PSNR of 31 dB, with least information. Thus there is approximately 7 dB increase in PSNR while using the proposed algorithm. The appearance of the reconstructed image is better in comparison to other LSP based algorithms.

6.3 Objective measures

Ability of NITRA to perfectly reconstruct the image or video can be benchmarked by calculating various other quality measuring parameters and comparing them with algorithms like OMP and StOMP which is portrayed in Table 2.

Table 2.Objective Measures for the Quality of Reconstructed Images using NITRA, OMP and STOMP

Average PSNR obtained is around 30 dB for images which is higher than that of StOMP and OMP. The SSIM is approximately 92% for images using NITRA while it is 64% and 77% using OMP and StOMP respectively. Though the reconstruction depends upon the pixels contained in the input images or video frames, the results produced while using NITRA is superior to the iterative algorithms considered for comparison. It is evident from Table 2 that NITRA reaches a maximum PSNR of 40 dB for images similar to ‘onion.png’ while LSP based algorithms give a maximum PSNR of only about 28 dB. MSE is around 45 while it is in the order of few hundreds in other algorithms. Similarly the maximum difference in pixel value is around 75 while in OMP and StOMP, it is above 200.

The same process was carried out for videos. The quality measures were calculated for individual frames and were averaged over the total number of frames considered for the reconstruction process. Table 3 exhibits the objective measures for standard test video series, akiyo and foreman.

Table 3.Objective Measures for the Quality of Reconstructed Videos using NITRA, OMP and STOMP

For videos too, the PSNR is above 30 dB for 31.25 % of input information (M/N = 0.3125) which is an optimum value for lossy compression techniques. The average PSNR obtained is more than 30 dB using NITRA which is greater than the PSNRs obtained using StOMP and OMP. The SSIM is 94% for NITRA while OMP and StOMP yields only 79% and 86% for akiyo, 68% and 64% for foreman series using OMP and StOMP respectively. MD in pixel values is very less, approximately 77, while other algorithms have a difference above 200 like in the case of images, which proves that every pixel is recovered with highest accuracy by NITRA. Stefan video has higher entropy in pixel distribution and hence reconstruction using NITRA gives lesser PSNR and perceptual quality. Yet, NITRA produces a higher PSNR of 25 dB while the same for OMP and StOMP is approximately 20 dB. These results show the efficiency of NITRA in reconstructing the images and videos with higher perceptual quality.

When the number of measurements increase, there is an obvious increase in the PSNR and other quality measures since more information about the input image or video is provided to the recovery algorithm. But, Table 2 and Table 3 prove that NITRA estimates the input image and videos with higher perfection when compared with iterative algorithms even for least number of measurements.

6.4 Big O (O)

Big O notation gives the behavior of a function or algorithm when the number of arguments or trials reaches a very large value or infinity. With this idea, the complexity of reconstruction algorithms can also be expressed by Big O notation. NITRA shows reduced complexity because of the usage of simple arithmetic operators. Arithmetic operators exhibit linearity to the operands and hence the complexity contributed by them can be neglected while considering larger loops. The calculated Big O (O) for NITRA can be expressed as an order of N since there is only one loop which does the threshold operation Ntimes. Thus, NITRA shows a complexity of O(N) unlike existing greedy algorithms like OMP and IHT [9] which have the Big O value of O(MN) , since the latter solves least squares problem iteratively. The number of iterations depends upon the sparsity or any fixed value. Here M and N refer to the m×1measurements and n × n input matrix respectively.

6.5 Elapsed Time and Total Execution Time

Elapsed time and total run times were calculated using MATLAB scripts. The computing resource for these experiments was Dell Inspiron laptop with 64 bit operating system and INTEL’s i5 core processor. Since NITRA has no iteration, the time taken for the execution of NITRA for each block is considerably reduced. On an average, NITRA takes approximately 25.97 ㎲ for the execution of the algorithm for each block while other algorithms like CoSaMP, StOMP and OMP takes approximately 0.51 ㎳ , 3.72 ㎳ and 3.23 ㎳ respectively. Fig. 5 and Fig. 6 prove that NITRA takes only tens of ㎲ for recovery of both images and videos.

Fig. 5.Comparison of elapsed time per block for image using NITRA and various other algorithms

Fig. 6.Comparison of elapsed time per block for video using NITRA and various other algorithms

Fig. 6 represents the case of video which shows that NITRA takes less time to execute when compared to other greedy algorithms. NITRA takes an average of 26.65 ㎲ while CoSaMP, OMP and StOMP consume approximately 0.458 ㎳ , 2.86 ㎳ and 4.1 ㎳ respectively for each block. That would be 94.18%, 99.06% and 99.35% reduction in execution time while using NITRA for every block when compared with CoSaMP, OMP and StOMP. Since the measurement matrix is not fixed for each block of the image for CoSaMP, OMP and StOMP, the overall perceptual quality of the image/video frame is diminished since the algorithm forms different equations for every single block to be solved. This disadvantage is avoided in NITRA since the sensing matrix is fixed. The comparison of average elapsed time for ten blocks, for images and videos using NITRA and different algorithms like OMP, CoSaMP and StOMP is shown in Table 4.

Table 4.Averaged Elapsed Time (ms) for Image/Video using NITRA and Other Reconstruction Algorithms

This approximation is almost same for the blocks of both images as well as videos. The total execution time of the codes of NITRA and other algorithms considered for comparison is shown in the Table 5. It takes an average of 1.3 s and 10.45 s for the combined procedure of compression and reconstruction for images and videos respectively. For images, NITRA shows decrease in total runtime of 58.75%, 88.94% and 93.31% when compared to the time taken by CoSaMP, OMP and StOMP respectively. For videos too, NITRA showcases reduced runtime of 29%, 85.73% and 91.62% with respect to CoSaMP, OMP and StOMP respectively.

Table 5.Averaged Run Time (s) of NITRA and Other Algorithms for Reconstruction of Images and Videos

Another advantage of using NITRA is that the usage of fixed measurement matrix helps in finding the exact PSNR value in a single iteration. In other algorithms since the values of the random matrix change the PSNR obtained during one iteration will not be the same when the algorithm is executed for the second time. Hence the same algorithm must be run for a number of times and the PSNRs obtained must be averaged over the total number of executions to obtain the approximate PSNR.

 

7. Conclusion and Future Work

Wireless Sensor Networks (WSNs) require efficient reconstruction algorithms to perfectly recover the compressed data without delay at the receiver end. NITRA is a recovery algorithm with simple arithmetic for reconstruction of the original input from (M < N) measurements. Augmented measurement matrix and the less complex mathematical expressions used in the proposed algorithm, contributes to a less complexity of O(N) . The algorithmic steps involved in NITRA are quite simple consequently reducing the elapsed time to approximately 100 times when compared to LSP based CS recovery techniques. The total run time of NITRA is reduced to nearly 90% for both videos and images while comparing the same with other reconstruction algorithms like OMP, StOMP and CoSaMP which would go handy while dealing with large sized images and videos in a large scale. On the basis of the above results, NITRA projects itself as a best suited algorithm for reconstruction of images and videos with high output quality. In a nut shell, images and videos that are compressed to M < N measurements using CS and transmitted over WSNs, can be reconstructed perfectly by non-iterative threshold based recovery algorithm (NITRA), exhibiting reduced delay, better performance in terms of PSNR, SSIM, SC etc., accompanied by better perceptual quality. In future, NITRA’s modified version is to be applied to the chrominance part of images and videos. Embedding enhancement technique within the recovery algorithm is to be tried out to improve the perceptual quality of the reconstructed images and videos with meager information available.

피인용 문헌

  1. Non-iterative CS recovery algorithm for surveillance applications: subjective and real-time experience vol.30, pp.2, 2015, https://doi.org/10.1007/s11045-018-0584-2