DOI QR코드

DOI QR Code

Weighted DCT-IF for Image up Scaling

  • Lee, Jae-Yung (Department of Electrical Engineering, Sejong University) ;
  • Yoon, Sung-Jun (SK Telecom) ;
  • Kim, Jae-Gon (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Han, Jong-Ki (Department of Electrical Engineering, Sejong University)
  • Received : 2018.04.25
  • Accepted : 2018.08.07
  • Published : 2019.02.28

Abstract

The design of an efficient scaler to enhance the edge data is one of the most important issues in video signal applications, because the perceptual quality of the processed image is sensitively affected by the degradation of edge data. Various conventional scaling schemes have been proposed to enhance the edge data. In this paper, we propose an efficient scaling algorithm for this purpose. The proposed method is based on the discrete cosine transform-based interpolation filter (DCT-IF) because it outperforms other scaling algorithms in various configurations. The proposed DCT-IF incorporates weighting parameters that are optimized for training data. Simulation results show that the quality of the resized image produced by the proposed DCT-IF is much higher than that of those produced by the conventional schemes, although the proposed DCT-IF is more complex than other conventional scaling algorithms.

Keywords

1. Introduction

 The scalers are typically found inside consumer electronics devices such as HD/UHD televisions, mobile devices, DVD, Blu-ray disc players, and video game consoles. The scaler modules are often [1-20], [11,22,4]combined with other video processing devices or algorithms to create a video processing module that improves the apparent definition of video signals. In some systems, scaling is done by video processing chips that are installed in digital display devices, where the role of the video processing chip would be to upscale and process the video signal to fit the display resolution. As the resolutions supported by these systems become more diverse, the design of an efficient scaler module to increase the quality of the resized image becomes more important.

 A scaling scheme based on interpolation is used to reconstruct the intermediate values between discrete pixel values. Over the last several decades, various interpolation methods, including the nearest neighbor [1], bilinear [1], cubic B-spline [2], cubic convolution [3], interpolation using wavelet transform [4], and discrete cosine transform-based interpolation filter (DCT-IF) methods [5], have been studied to increase the quality of the resized image. Because edge data are visually important information in the image, the enhancement of edge information is one of the most crucial issues in applications using images and video signals. While edge data contain detailed information in the spatial domain, such data also contain high-frequency components in the frequency domain. Because the kernels of various scalers cause blurring effects like a low pass filter (LPF), blurred edge data are more frequently found in the resized image as the scaling ratio increases. To overcome this problem, various interpolation methods to enhance the edge data have been researched [6]-[15]. The directional cubic convolution [6] proposed by Zhou et al. preserves sharp edges and details by measuring the direction of the edges and combining weighted data at weak edge locations. Whereas the directional cubic convolution [6] supports only an integer scaling ratio, the scheme proposed by Rui and Qiong [7] resizes an image with an arbitrary scaling ratio, where the authors analyzed the relationship between the polynomial coefficients of the bicubic interpolation method and proposed a new kernel to obtain a high-resolution image by enhancing the edge information. The curvature interpolation method [8] was designed by estimating the curvature of a low-resolution (LR) image, with Jacobi iteration used to solve some equations related to the problem. The discrete wavelet transform (DWT) and stationary wavelet transform (SWT) based interpolation scheme proposed by H. Demirel et al. [9] also resizes an image considering frequency subband images. LR images are firstly decomposed different subband images using the DWT. After calibrating the decomposed high frequency subband images with high frequency subband images obtained by the SWT, all these subbands are combined to generate a new high resolution image by using inverse DWT (IDWT). The scheme is easy to implement, but the quality of obtained HR images tends to be poor. Huang et al. [10] proposed a novel interpolation chip that used the interpolation error theorem to enhance the edge features of the resized image. On the other hand, unlike conventional interpolation schemes concentrating on local linear relationship among neighboring pixels to reconstruct a high-resolution (HR) image form a corresponding LR input image, F. Cao et al. proposed a novel interpolation scheme which is based on a low-rank matrix completion to determine the local order of the linear model implicitly. The scheme can efficiently obtain a HR image and handle noisy data such as image denoising because of the low-rank matrix completion [11]. J. -K. Chae et al. concentrate on solving a scaling problem to minimize the information loss resulting from the resizing process. They proposed a optimized cubic convolution kernel considering the loss in both spatial and frequency domains. The performance of the scheme improved in terms of peak signal-to-noise ratio (PSNR) without a large increase in computational complexity [12]. S. Zhu et al. noticed that interpolation schemes which are based on solving minimum mean-square-error (MMSE) for local geometric similarity between a low and high resolution image cannot generate a reliable high resolution image. To solve the problem, they consider nonlocal geometric similarity which consists of MMSE and least squares error that is built upon a number of dual-reference patches drawn from the given LR image and regularized by the directional gradients of these patches [13].

 In this paper, we propose a weighted DCT-IF (W-DCT-IF) method, where the weighting parameters are optimized to enhance the edge information. The weighting parameters are calculated using an optimization algorithm, which is similar to the scheme used to make a Wiener filter.

 This paper is organized as follows. In Section 2, the DCT-IF is briefly explained. The problem to be solved is formulated in Section 3, where the weighted DCT-IF is modeled, and the adaptive edge enhancement scheme is explained. The optimization schemes for the parameters of the weighted DCT-IF are proposed in Section 4, where one-dimensional and two-dimensional models are optimized for training data. In Section 5, the performance of the proposed technique is evaluated and compared with those of several conventional schemes. Section 6 concludes this paper.

2. DCT-IF

 DCT-IF is an interpolation scheme whose kernel is constructed by combining the coefficients of DCT and inverse DCT. Among the various types of DCT, the most common type is type-II because it is asymptotically equivalent to the optimal Karhunen-Loève transform (KLT) [16]. DCT-IF is one of the best interpolation methods and provides highly accurate interpolation [5]. The DCT-IF especially has remarkable performances in comparison with other conventional interpolation methods such as bilinear and cubic convolution at an arbitrary scaling ratio.

 When the given pixels are denoted by \(f(i)\), the value \(f(i)\) interpolated by DCT-IF is represented as follows [17], [18].

\(\hat{f}(s)=\sum_{i=0}^{M-1} f(i) \sum_{u=0}^{M-1} a^{2}(u) C_{2 M}^{(2 s+1) u} C_{2 M}^{(2 i+1) u}\)       (1)

where

\(a(u)=\left\{\begin{array}{l} \sqrt{\frac{1}{M}}, { if } \ \ \ u=0 \\ \sqrt{\frac{2}{M}}, { else \ if \ \ \ } u \neq 0 \end{array}\right.\)       (2)

\(C_{k}^{l}=\cos \left(\frac{\pi l}{k}\right)\)       (3)

Note that s is a position to be interpolated, and M is the number of taps.

 When DCT-IF is applied to a 2D signal such as a picture, the DCT-IF can be extended to 2D DCT-IF as follows.

\(\hat{f}(s, z)=\sum_{i=0}^{M-1} \sum_{j=0}^{N-1} f(i, j)\left\{\begin{array}{c} \sum_{u=0}^{M-1} a^{2}(u) C_{2 M}^{(2 s+1) u} C_{2 M}^{(2 i+1) u} \\ \times \sum_{v=0}^{N-1} a^{2}(v) C_{2 N}^{(2 z+1) v} C_{2 N}^{(2 j+1) v} \end{array}\right\}\)       (4)

where

\(a(u)=\left\{\begin{array}{l} \sqrt{\frac{1}{M}}, \ { if }\ \ \ u=0 \\ \sqrt{\frac{2}{M}}, \ { else \ if }\ \ \ u \neq 0 \end{array}\right.\)       (5)

\(a(v)=\left\{\begin{array}{l} \sqrt{\frac{1}{N}}, \ { if }\ \ \ v=0 \\ \sqrt{\frac{2}{N}}, \ { else \ if }\ \ \ \mathrm{v} \neq 0 \end{array}\right.\)       (6)

 In (4), s and z are vertical and horizontal positions to be interpolated, and M and N are the numbers of taps in the vertical and horizontal directions, respectively. (s,z) indicates the arbitrary interpolation point between the locations of the given pixels. The interpolated pixel value \(\hat{f}(s, z)\) is found using the weighted sum of the neighbor pixels around (s,z) in the given image. Note that the DCT-IF is a separable process [19]. Thus, 2D DCT-IF can be implemented by consecutively applying the 1D DCT-IF of (1) in the horizontal and vertical directions.

 DCT-IF has been employed for motion compensation in a high-efficiency video codec (HEVC) [5], [20], where the fixed coefficients of DCT-IF are prepared for 1/2 and 1/4 pixel positions because the motion vectors are searched at those resolutions in HEVC. However, in various applications using up/down scalers, the interpolation filter would resize the resolution of the original image at an arbitrary scaling ratio, where the coefficients of DCT-IF should be calculated at each interpolation position. In this paper, we modify the coefficients of DCT-IF to enhance the edge information of the resized image. Because the edge contains important information related to the perceptual quality, the optimization of the coefficients of DCT-IF to enhance the edge data is a crucial problem in a variety of image processing applications.

3. Problem Formulation

3.1 Interpolation system using weighted DCT-IF

 

Fig. 1. (a) Weighted DCT-IF (W-DCT-IF) and (b) optimization process of w(u, v)

 Fig. 1 (a) shows the interpolation system using DCT-IF, where the coefficients of DCT-IF are weighted to enhance the edge information. The W-DCT-IF is derived as follows. The DCT data of the original image \(f_{o}(i, j)\) are calculated by the following equation.

\(F_{O}(u, v)=a(u) a(v) \sum_{i=0}^{M-1} \sum_{j=0}^{N-1} f_{O}(i, j) C_{2 M}^{(2 i+1) u} C_{2 N}^{(2 j+1) v}\)       (7)

If the DCT data \(F_{o}(u, v)\) are returned to the pixel domain after applying the weighted parameters w(u, v) to the DCT data, the reconstructed pixel data are

\(\hat{f}(s, z)=\sum_{u=0}^{M-1} \sum_{v=0}^{N-1} a(u) a(v) w(u, v) F_{O}(u, v) C_{2 M}^{(2 s+1) u} C_{2 N}^{(2 z+1) v}\)       (8)

where w(u, v) is the weighting parameters for the corresponding frequencies (u, v). Combining (7) and (8) produces  

\(\hat{f}(s, z)=\sum_{i=0}^{M-1} \sum_{j=0}^{N-1} f_{O}(i, j)\left[\sum_{u=0}^{M-1} \sum_{v=0}^{N-1} a^{2}(u) a^{2}(v) w(u, v)\times C_{2 M}^{(2 s+1) u} C_{2 N}^{(2 z+1) v} C_{2 M}^{(2 i+1) u} C_{2 N}^{(2 j+1) v}\right]\)       (9)

Equation (9) represents the 2D W-DCT-IF scheme proposed in this paper. We optimize the weighting parameter w(u, v) to enhance the edge information of the resized image.

 In the optimization process for the weighting parameter w(u, v) shown in Fig. 1 (b), the weighting parameters are calculated using various training images. Because the main goal in this paper is to modify the coefficients of DCT-IF to enhance the edge information of the resized image, the proposed algorithm is applied to the gray signal. When we resize the color image, the algorithm is used for Y signal only, where Y is a component of the YUV system. The training images are represented as \(f_{T}(t, n, m)\), where t is an index for training images, and n and m are indexes for the row and column locations of each pixel in the ith training image. When the size of DCT is M×N , each training image is divided into disjointed blocks whose size is M×N. In Fig. 1 (b), \(g_{T}(t, p, q, i, j)\) denotes the (ith,jth) pixel value in the (pth,qth) block of the tth training image \(f_{T}(t, n, m)\). p and q are the indexes of the blocks in the vertical and horizontal directions, respectively. The relation is represented as follows:

\(g_{T}(t, p, q, i, j)=f_{T}(t, M p+i, N q+j)\)       (10)

Here, m\(f_{E}(t, n, m)\) is the tth edge enhanced training image created by applying an edge enhancement scheme to the tth training image \(f_{T}(t, n, m)\). In this figure, \(g_{T}(t, p, q, i, j)\) denotes the (ith,jth) pixel value in the (pth,qth) block of the tth edge enhanced image \(f_{E}(t, n, m)\). The relation is represented as follows:

\(g_{E}(t, p, q, i, j)=f_{E}(t, M p+i, N q+j)\)       (11)

To analyze the property of the training data, each block is transformed by DCT as follows.

\(G_{T}(t, p, q, u, v)=a(u) a(v) \sum_{i=0}^{M-1} \sum_{j=0}^{N-1} g_{T}(t, p, q, i, j) C_{2 M}^{(2 i+1) u} C_{2 N}^{(2 j+1) v}\)       (12)

\(G_{E}(t, p, q, u, v)=a(u) a(v) \sum_{i=0}^{M-1} \sum_{j=0}^{N-1} g_{E}(t, p, q, i, j) C_{2 M}^{(2 i+1) u} C_{2 N}^{(2 j+1) v}\)       (13)

 In Fig. 1 (b), the weighting parameters w(u,v) are optimized to minimize the total difference between the DCT domain data \(G_{E}(t, p, q, u, v)\) of the edge-enhanced images and the weighted DCT domain data \(w(u, v) G_{T}(t, p, q, u, v)\) of the training images. The weighting parameters w(u,v) are optimized to minimize the following cost function.

\(D=\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} \sum_{u=0}^{M-1} \sum_{v=0}^{N-1}\left\{\begin{array}{c} G_{E}(t, p, q, u, v) \\ -w(u, v) G_{T}(t, p, q, u, v) \end{array}\right\}^{2}\)       (14)

where L is the total number of training images. P and Q denote the numbers of the transform blocks in the vertical and horizontal directions, respectively. The optimization procedure is represented as follows.

\(w(u, v)=\underset{w(u, v)}{\arg \min } D\)       (15)

The optimized w(u,v) can be used in (9) to enhance the edge information of the resized image in the proposed system.

3.2 Edge Enhanced Scheme

 In this subsection, we explain the edge enhancement technique used in Fig. 1 (b), which makes the image subjectively clearer by improving the contrast effect in the edge region [21], [22]. The enhancement process can be represented as follows:

\(f_{E}(t, n, m)=f_{T}(t, n, m)+h\left(f_{T}(t, n, m)\right)\)       (16)

where \(h\left(f_{T}(t, n, m)\right)\) denotes the enhancement component at the location (n,m). It depends on the property of pixel values at the local area that includes (n,m). To get the edge information from the image, the Sobel edge detector [21] is applied to each training image \(f_{T}(t, n, m)\), which produces an edge-detected image as shown in Fig. 2.

In (16), we design the enhancement component \(h\left(f_{T}(t, n, m)\right)\) as follows.

\(h\left(f_{T}(t, n, m)\right)=P\left(e_{n, m}\right) \times\left(f_{T}(t, n, m)-f_{L P F}(t, n, m)\right)\)       (17)

where \(f_{L P F}(t, n, m)\) is the image produced by applying the LPF to \(f_{T}(t, n, m)\). The LPF used in (17) is

\(L F P=\frac{1}{16}\left[\begin{array}{lll} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{array}\right]\)       (18)

 

Fig. 2. Diagram of adaptive edge enhancement

 In (17), \(P\left(e_{n, m}\right)\) denotes the probability that the edge pixel \(e_{n, m}\) at (n,m) will be blurred. \(P\left(e_{n, m}\right)\) is determined by the psychometric function [23] as follows.

\(P\left(e_{n, m}\right)=1-\exp \left(-\left|\frac{w\left(e_{n, m}\right)}{w_{J N B}\left(e_{n, m}\right)}\right|\right)\)       (19)

where , \(w\left(e_{n, m}\right)\) and , \(w_{J N B}\left(e_{n, m}\right)\) are the widths of the edge \(e_{n, m}\) and the edge with Just Noticeable Blur (JNB), respectively. Note that , \(w_{J N B}\left(e_{n, m}\right)\) depends on the local contrast in the neighbor of edge \(e_{n, m}\) . In calculation of , \(w_{J N B}\left(e_{n, m}\right)\) , the width around the edge is measured by counting the number of pixels with increasing or decreasing gray scale values in one direction of the edge pixel. Whereas the authors of [23] used the equation for , \(P\left(e_{n, m}\right)\) shown in (19) to propose an objective metric for evaluating the image sharpness, we use the concept to calculate the amount of alteration of the pixel values for enhancing the edge data. Based on the probability , \(P\left(e_{n, m}\right)\), if the blurring will be unnoticeable, the pixel data are not altered. On the other hand, if the probability that the blurring will be noticeable is high, the pixel values are modified, with the amount of alteration proportional to , \(P\left(e_{n, m}\right)\).

4. Optimization of Weighting Parameters

 In this section, we explain the optimization process for parameters w(u,v) using various training images. Because we use the training images, the parameters are denoted by the Parameters optimized for Training Images (PTI). The cost function of (14) is used in the optimization of PTI. This cost function can be minimized by taking the partial derivative with respect to w(u,v) as follows.

\(\frac{\partial D}{\partial w(u, v)}=-2 \times \sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1}\left\{\left(\begin{array}{l} G_{E}(t, p, q, u, v) \\ -w(u, v) G_{T}(t, p, q, u, v) \end{array}\right) \times G_{T}(t, p, q, u, v)\right\}=0\)(20)

Equation (20) gives a solution for the weighting parameters as follows.

\(w(u, v)=\frac{\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} G_{E}(t, p, q, u, v) G_{T}(t, p, q, u, v)}{\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} G_{T}(t, p, q, u, v) G_{T}(t, p, q, u, v)}\)       (21)

The weighting parameter in (21) is used in (9) to implement for 2D W-DCT-IF.

 When PTI is derived using the 1D DCT-IF, the parameters are independently optimized for the horizontal and vertical directions, because the properties of the data in these two directions are different from each other. The 1D DCT-IF scheme and the parameters optimized for the horizontal and vertical directions are represented as follows.

\(\hat{f}(s, j)=\sum_{i=0}^{M-1} f_{o}(i, j)\left\{\sum_{u=0}^{M-1} a^{2}(u) w_{V}(u) C_{2 M}^{(2 s+1) u} C_{2 M}^{(2 i+1) u}\right\}\)       (22)

\(\hat{f}(i, z)=\sum_{j=0}^{N-1} f_{O}(i, j)\left\{\sum_{v=0}^{N-1} a^{2}(v) w_{H}(v) C_{2 N}^{(2 z+1) v} C_{2 N}^{(2 j+1) v}\right\}\)       (23)

\(w_{H}(v)=\frac{\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} \sum_{i=0}^{M-1} A_{H}(t, p, q, i, v) B_{H}(t, p, q, i, v)}{\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} \sum_{i=0}^{M-1} B_{H}(t, p, q, i, v) B_{H}(t, p, q, i, v)}\)       (24)

\(w_{V}(u)=\frac{\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} \sum_{j=0}^{N-1} A_{V}(t, p, q, u, j) B_{V}(t, p, q, u, j)}{\sum_{t=0}^{L-1} \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} \sum_{j=0}^{N-1} B_{V}(t, p, q, u, j) B_{V}(t, p, q, u, j)}\)       (25)

where

\(A_{H}(t, p, q, i, v)=a(v) \sum_{j=0}^{N-1} f_{E}(t, M p+i, N q+j) C_{2 N}^{(2 j+1) v}\)       (26)

\(B_{H}(t, p, q, i, v)=a(v) \sum_{j=0}^{N-1} f_{T}(t, M p+i, N q+j) C_{2 N}^{(2 j+1) v}\)       (27)

\(A_{V}(t, p, q, u, j)=a(u) \sum_{i=0}^{M-1} f_{E}(t, M p+i, N q+j) C_{2 M}^{(2 i+1) u}\)       (28)

\(B_{V}(t, p, q, u, j)=a(u) \sum_{i=0}^{M-1} f_{T}(t, M p+i, N q+j) C_{2 M}^{(2 i+1) u}\)       (29)

 PTI is made by considering all of the edge information in several training images. Thus, PTI can generally be used for various test images to be scaled, after these have been optimized for the training images. Note that the test image to be scaled is not included in the set of training images. Because the PTI optimization is an offline process, it is not included in the scaling process. Thus, the computational complexity of the scaler system using PTI is similar to those of the conventional DCT-IF technique.

5. Simulation Results

5.1 Implementation

 A scaler based on the proposed algorithm can be implemented using the Verilog hardware description language and fitted on FPGA-based system. On the other hand, a software version of the scaler optimized for GPU computing can also be implemented in the GPU-based devices. The software scaler is capable of displaying Full HD (1,920x1,080 pixel) video output at roughly 60 fps on a device incorporating a 3.4GHz 6-Core Processor and main memory of 8 GB.

5.2 Configuration to Evaluate Schemes

 We evaluate the performance of the proposed method in this section, where W-DCT-IF using the PTI is compared with various conventional interpolation methods [1]-[3], [7], [8], [10], [17] from the viewpoint of edge enhancement. When PTI is used, the number ( L ) of training images is set to 20. We employed 20 training images among the dataset composed of 100 standard 512 × 512 grayscale test images which are shown in Fig. 4 of [24] to train the proposed weighted DCT-IF. Specifically, the training images are selected from the dataset randomly to avoid bias that might be in the set of training images.

 When the weighting coefficients are trained, as shown in Fig. 1 (b), each training image is divided into disjointed blocks whose size is M×N. In the case that the resolution of each training image is 512×512 and the size of block is 8×8, the number of training blocks made from 20 training images is 81,920. I think it is sufficient for our simulations.

 We utilize various measurement tools such as PSNR, SSIM, the image sharpness metric, edge map, and subjective quality for objective and subjective evaluations. When PSNR and SSIM are used as measurement tools, we need a reference image, which is compared with the images resized from the various scaling methods. Fig. 3 shows a block diagram of the PSNR and SSIM measurement. Various scaling algorithms are used to enlarge the low-resolution (LR) image to the resolution of the original image, where the LR image is made by applying LPF and bilinear interpolation to the original image.

 

Fig. 3. Block diagram of PSNR and SSIM measurement

 To evaluate the performances of the techniques for various test images, we use 10 images which are ‘FourPeople’, ‘Johnny’, ‘KristenAndSara’, ‘SlideEditing’, ‘SlideShow’ (HD 720 YUV 4:2:0 images), and ‘BasketballDrive’, ‘BQTerrace’, ‘Cactus’, ‘Kimono1’, ‘ParkScene’ (HD 1080 YUV 4:2:0 images).

5.3 Evaluation based on PSNR

The PSNRs of the images resulting from the proposed and various conventional schemes are compared in Tables 1 and 2. The PSNR is defined as follows.

\(P S N R=10 \cdot \log _{10}\left(\frac{M A X^{2}}{M S E}\right)[\mathrm{dB}]\)       (30)

where MAX is usually 255 and represents the possible maximum value of pixels when a pixel is represented with 8 bits. The MSE is the mean squared error between the reference picture and a picture resized by one of the various schemes. As PSNR increases, the objective quality of the resized image increases. For YUV 4:2:0 images, PSNR is calculated with the Y component. The 2D W-DCT-IF is the proposed algorithm using the weighting parameters optimized for 2D data, such as those in (21). The 1D W-DCT-IF is the algorithm using (22) - (25).

 As observed in Tables 1 and 2, the PSNRs of the proposed methods are always much higher than those of the conventional methods [1]-[3], [7], [8], [10], [17]. This is because PTI has been optimized for the training images. In the comparison between 1D W-DCT-IF and 2D W-DCT-IF implementations of the proposed algorithm, the performance of the 1D implementation is slightly lower than that of the 2D implementation, because the training procedure using wH(v) or wV(u) makes the filter non-separable. Thus, there are some cases where the optimized parameters wH(v) and wV(u) are mismatched for the data to be scaled.

 To show the performance tendencies for various scaling ratios, Fig. 4 shows the PSNRs of the schemes for a variety of ratios. Based on the results of Tables 1 and 2, and Fig. 4, the proposed algorithm outperforms other conventional schemes in relation to the objective quality of the resized image.

Table 1. PSNR Comparison of Interpolation Algorithms for 1.79 Times Zoom

 

Table 2. PSNR COMPARISON OF INTERPOLATION ALGORITHMS FOR 1.58 TIMES ZOOM

 

 

Fig. 4. PSNR comparison at different scaling ratios  

5.4 Evaluation based on SSIM

 SSIM is one of the most important tools for evaluating the perceptual quality of test images [25]. SSIM is used to evaluate the structural similarity between a reference image and a resized image. SSIM is defined as follows.

\(S S I M=\frac{\left(2 \mu_{O} \mu_{R}+c_{1}\right)\left(2 \sigma_{O R}+c_{2}\right)}{\left(\mu_{O}^{2}+\mu_{R}^{2}+c_{1}\right)\left(\sigma_{O}^{2}+\sigma_{R}^{2}+c_{2}\right)}\)       (31)

where µO and µR are the averages of the pixel values in the reference and resized images, respectively. σO and σR are the variances of the pixel values in these images, respectively. σOR implies the covariance of pixel values in the reference and resized images. As SSIM increases, the structural similarity between the resized image and the reference picture increases.

Table 3. SSIM COMPARISON OF INTERPOLATION ALGORITHMS FOR 1.79 TIMES ZOOM

 

 The SSIMs produced by various scaling schemes are compared in Tables 3 and 4, and Fig. 5. These results show that the proposed algorithms outperform the various conventional methods in relation to SSIM. The performance tendencies of the algorithms are similar to that of the simulation based on PSNR.

Table 4. SSIM COMPARISON OF INTERPOLATION ALGORITHMS FOR 1.58 TIMES ZOOM

 

 

Fig. 5. SSIM comparison for 1.58 times zoom

5.5 Evaluation based on Image Sharpness Metric

 To evaluate the sharpness of the resized images objectively, we consider the sharpness metric [23]. The sharpness metric based on the JNB is used to evaluate the amount of perceived sharpness without any reference image. Note that a higher sharpness metric value results in more edges in the image being preserved after up or down scaling. The sharpness values of the test images are listed in Table 5, where the scaling ratio is set to 2.3. As you see from Table 5, the sharpness values of the proposed methods are much higher than those of the various conventional schemes.

Table 5. SHARPNESS METRIC COMPARISON OF INTERPOLATION ALGORITHMS FOR 2.3 TIMES ZOOM

 

5.6 Evaluation based on Edge Map

 In this subsection, we consider the edge maps of specific enlarged regions of a Cactus image that have been magnified with a scaling ratio of 4.0. The edge information of the enlarged images is found using the Canny edge detector [26], which is well known as a powerful tool for detecting edges. In Fig. 6, the edge map of image up-scaled by the W-DCT-IF method has more edge data than the others. This implies that the edge of an image scaled by the proposed algorithm is more significantly enhanced than those interpolated by the conventional methods.

 

Fig. 6. Edge maps of the enlarged Cactus images when scaling ratio is 4.0 (a) Bilinear, (b) CIM, (c) CIM5 , (d) NIC, (e) ISAI, (f) 2D W-DCT-IF (PTI)

5.7 Evaluation based on subjective quality

 The subjective quality of the scaled image is evaluated in Fig. 7, where the Cactus image is magnified with a scaling ratio of 4.0. As can be observed from this figure, the images enlarged by the proposed algorithms provide a higher subjective quality than others. Because the edge provides important data in a subjective quality evaluation, the quality of the edge-enhanced image is perceptually higher.

 

Fig. 7. Particular regions of the enlarged Cactus images when scaling ratio is 4.0 (a) Bilinear, (b) CIM, (c) CIM5 , (d) NIC, (e) ISAI, (f) 2D W-DCT-IF (PTI).

Table 6. COMPLEXITY COMPARISON OF INTERPOLATION ALGORITHMS FOR 2.3 TIMES ZOOM

 

 To provide simulation results for the perceptual evaluation, the Single Stimulus Continuous Quality Evaluation (SSCQE) method [27] is chosen because the resolutions of scaled images are different from those of the original images. The Mean Opinion Score (MOS) is a simple measurement to obtain the viewers’ opinions. The MOS provides a numerical indication of the quality perceived by viewers. The MOS is expressed as a single number in a range of 1-5, where 1 means bad and 5 means excellent. In this test, the enlarged Cactus, Kimono1, and ParkScene images with a scaling ratio of 2.3 were evaluated and scored by 15 viewers who were college students. Fig. 8 shows the MOS values. The MOS values of the proposed methods are always much higher than those of the conventional methods [1]-[3], [7], [8], [10] ,[17].

 

Fig. 8. Mean opinion score (MOS) values of three enlarged images with scaling ratio of 2.3. (a) Ranges of MOS values for the three enlarged images and (b) average MOS values for the three enlarged images.

5.8 Evaluation based on Complexity

 Complexity is a crucial measurement because a scaler module is used in real-time applications. In this simulation, the complexity is measured by the consumed CPU time, where the algorithms are tested on a personal computer with a 3.4GHz 6-Core Processor and main memory of 8 GB.

 The consumed CPU times are represented in Table 6, where the numbers for PTI don’t include the training complexity. In this table, the complexity of the 2D W-DCT-IF (PTI) is the highest among those of all methods, because 2D W-DCT-IF (PTI) implemented by (9) needs the complex iterative process. Although 2D W-DCT-IF (PTI) is much more complex than other conventional schemes [1]-[3], [7], [8], [10] ,[17], the scheme produces a gain from the viewpoint of enhancing the subjective and objective quality. In contrast to 2D W-DCT-IF (PTI), the computational complexity of 1D W-DCT-IF (PTI) of (22) and (23) is almost equal to that of the conventional 1D DCT-IF [17]. Note that the weighting parameters are determined before the scaling process is applied. Thus, the complexity of the optimization process for PTI is not included in these results. In order to reduce the complexity of the scaling system, 1D W-DCT-IF (PTI) can be employed instead of 2D W-DCT-IF (PTI).

5. Conclusions

 In this paper, we designed the W-DCT-IF to enhance the edge information of a resized image. Because DCT-IF is known to be an outstanding method for resizing images, the optimization of DCT-IF is an important issue in image and video processing systems. Although the proposed 2D W-DCT-IF is more complex than other conventional scaling algorithms, the quality of the resized image produced by the proposed 2D W-DCT-IF is significantly improved. To reduce the complexity of the proposed method, the 2D W-DCT-IF can be implemented using 1D W-DCT-IF. While the performance of 1D W-DCT-IF is similar to that of 2D W-DCT-IF, the complexity of 1D W-DCT-IF is approximately equal to that of the conventional 1D DCT-IF. By employing the proposed algorithm, we can construct efficient scaling system without increasing the complexity.

References

  1. E. Meijering, "A chronology of interpolation: from ancient astronomy to modern signal and image processing," Proceedings of IEEE, vol. 90, no. 3, pp. 319-342, Mar. 2002. https://doi.org/10.1109/5.993400
  2. M. Unser, A. Aldroubi, and M. Eden, "B-spline signal processing. I. Theory," IEEE Trans. on Signal Processing, vol. 41, no. 2, pp. 821-833, Feb. 1993. https://doi.org/10.1109/78.193220
  3. R. G. Keys, "Cubic convolution interpolation for digital image processing," IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 29, no. 6, pp. 1153-1160, Dec. 1981. https://doi.org/10.1109/TASSP.1981.1163711
  4. H. Demirel and G. Anbarjafari, "Discrete Wavelet transform-based satellite image resolution enhancement," IEEE Trans. on Geoscience and Remote Sensing, vol. 49, no. 6, pp. 1997-2004, Jan. 2011. https://doi.org/10.1109/TGRS.2010.2100401
  5. K. McCann, W.-J. Han, and I.-K. Kim, "Samsung's response to the call for proposals on video compression technology," JCTVC-A124, Dresden, Germany, Apr. 2010.
  6. D. Zhou, X. Shen, and W. Dong "Image zooming using directional cubic convolution interpolation," IET Image Processing, vol. 6, no. 6, pp. 627-634, Aug. 2012. https://doi.org/10.1049/iet-ipr.2011.0534
  7. L. Rui and L. Qiong, "Image sharpening algorithm based on a variety of interpolation methods," in Proc. of IEEE Int. Conf. Image Analysis and Signal Processing (IASP), pp. 1-4, Nov. 2012.
  8. H. Kim, Y.-J. Cha, and S.-J Kim, "Curvature interpolation method for image zooming," IEEE Trans. on Image Processing, vol. 20, no. 7, pp. 1895-1903, Jan. 2011. https://doi.org/10.1109/TIP.2011.2107523
  9. H. Demirel, and G. Anbarjafari, "IMAGE Resolution Enhancement by Using Discrete and Stationary Wavelet Decomposition," IEEE Trans. on Image Processing, vol. 20, no. 5, pp. 1458-1460, May 2011. https://doi.org/10.1109/TIP.2010.2087767
  10. C.-C Huang, P.-Y. Chen, and C.-H Ma, "A novel interpolation chip for real-time multimedia applications," IEEE Trans. on Circuits and Systems for Video Technology, vol. 22, no. 10, pp. 1512-1525, Oct. 2012. https://doi.org/10.1109/TCSVT.2012.2202080
  11. F. Cao, M. Cai, and Y. Tan, "Image Interpolation via Low-Rank Matrix Completion and Recovery," IEEE Trans. on Circuits and Systems for Video Technology, vol. 25, no. 8, pp. 1261-1270, Aug. 2015. https://doi.org/10.1109/TCSVT.2014.2372351
  12. J. -K. Chae, J. -Y. Lee, M. -H. Lee, J. -K. Han, T. Q. Nguyen, and W. -Y. Yeo, "Cubic Convolution Scaler Optimized for Local Property of Image Data," IEEE Trans. on Image Processing, vol. 24, no. 12, pp. 4796-4809, Dec. 2015. https://doi.org/10.1109/TIP.2015.2468176
  13. S. Zhu, B. Zeng, L. Zeng, and M. Gabbouj, "Image Interpolation Based on Non-local Geometric Similarities and Directional Gradients," IEEE Trans. on Multimedia, vol. 18, no. 9, pp. 1707-1719, Sept. 2016. https://doi.org/10.1109/TMM.2016.2593039
  14. S.-Y Yang, Y.-H Kim, and J.-C Jeong, "Fine edge-preserving technique for display devices," IEEE Trans. Consumer Electronics, vol. 54, no. 4, pp.1761-1769, Nov. 2008. https://doi.org/10.1109/TCE.2008.4711232
  15. J.-W. Han, J.-H. Kim, S.-H. Cheon, and J.-O. Kim, "A Novel Image Interpolation Method Using the Bilateral Filter," IEEE Transactions on Consumer Electronics, vol. 56, no. 1, pp. 175-181, Feb. 2010. https://doi.org/10.1109/TCE.2010.5439142
  16. I.-H. Shin and H.-W. Park, "Adaptive up-sampling method using DCT for spatial scalability of scalable video coding," IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, no. 2, pp. 206-214, Feb. 2009. https://doi.org/10.1109/TCSVT.2008.2009256
  17. E. Alshina, J. Chen, A. Alshin, N. Shlyakhov, and W. J. Han, "CE3: experimental results of DCTIF by Samsung," JCTVC-D344, Daegu, Korea, Jan. 2011.
  18. S. Matsuo, S. Takamura, and H. Jozawa, "Improved intra angular prediction by DCT-based interpolation filter," in Signal Processing Conference (EUSIPCO), pp. 1568-1572, Aug. 2012.
  19. H. Lv, R. Wang, X. Xie, H. Jia, and W. Gao, "A comparison of fractional-pel interpolation filters in HEVC and H.264/AVC," in IEEE Int. Conf. Visual Communications and Image Processing (VCIP), pp. 1-6, Nov. 2012.
  20. E. Alshina, J.-H Park, and W.-J. Han, "DCT based interpolation filter for motion compensation in HEVC," in Proceedings of Applications of Digital Image Processing XXXV, vol. 8499, pp. 13, Oct. 2012.
  21. R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd edition, Prentice-Hall, 2008, pp. 165-168.
  22. R. Crane, A Simplified Approach to Image Processing in C, Prentice-Hall, 1997, pp. 79-93.
  23. R. Ferzli and L. J. Karam, "A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB)," IEEE Trans. on Image Processing, vol. 18, no. 4, pp. 717-728, Apr. 2009. https://doi.org/10.1109/TIP.2008.2011760
  24. J. A. Garcia, Rosa Rodriguez-Sanchez, and J. Fdez-Valdivia, "The emergence of a region-based approach to image transmission," Optical Engineering, vol. 44, no. 6, June 2005.
  25. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004. https://doi.org/10.1109/TIP.2003.819861
  26. J. Canny, "A computational approach to edge detection," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 670-698, Nov. 1986.
  27. Rec. ITU-R BT.500-13, "Methodology for the Subjective Assessment of the Quality of Television Pictures," Jan. 2012.

Cited by

  1. 빅데이터를 활용한 드론의 이상 예측시스템 연구 vol.21, pp.2, 2019, https://doi.org/10.7472/jksii.2020.21.2.27