1. Introduction
High Efficiency Video Coding (HEVC), the newest video coding standard, has been proved superior in coding efficiency over its precedents [1]. A few new techniques, such as larger coding tree unit (CTU) sizes and improved parallel processing methods, have been adopted in HEVC to improve two aspects: increasing video resolution and increasing use of parallel processing architectures, which cannot be settled very well by the prior standard H.264/AVC [2].
In general, the quantitation parameter (QP) should be specified when using the HEVC codec to encode video stream. Also, a target bitrate is often given as a parameter for video encoding according to the application environment. In order to regulate the encoded bit stream such that the best video quality can be achieved without violating the constraints imposed by the encoder/decoder buffer size and the available channel bandwidth, rate control (RC) is widely adopted in the video coding standard-based encoders, such as MPEG-4 [3,4], H.264/AVC [2] and HEVC [1]. The new characteristics of HEVC pose a new challenge for designing accurate and robust RC and remain RC a hot research issue.
In Section 2, we review the previous work in RC. Section 3 establishes a model that reveals the relationship between bit per pixel (bpp), the bitrate of the intra frame and the bitrate of the subsequent inter frames in a group of pictures (GOP), and details how to determine the target bitrate of the first intra frame. A robust and adaptive frame-level RC scheme is given in Section 4. Simulation results are presented in Section 5, followed by the conclusion in Section 6.
2. Related Work
There have been a number of investigations for RC. The existing RC schemes can be roughly categorized into three classes: Q-domain model, ρ-domain model, and λ-domain model. The Q-domain model builds direct relationship between bitrate and QP. In [5], Cauchy-distribution-based R-Q model was used to determine QP. In [6], a Laplace-distribution-based CTU level RC algorithm for HEVC was proposed. To overcome the high computational complexity, Choi et al. presented a pixel-wise unified R-Q model for multi-level RC [7]. In the ρ-domain-based RC algorithms [8,9], bitrate was modeled as a linear function of ρ, which was the percentage of zeroes in discrete cosine transform (DCT) coefficients. Since it is assumed that there is a one-to-one relationship between ρ and QP, the suitable QP can be determined to meet the target bitrate through ρ. In fact, both Q-domain and ρ-domain RC models utilize a close relationship between bitrate and QP. However, it becomes more difficult to accurately characterize the relationship between bitrate and QP when the video coding scheme is becoming more flexible. In [10], a λ-domain RC algorithm for HEVC was proposed for inter-frame coding. Since it achieves high coding performance, it has been already adopted by Joint Collaborative Team on Video Coding (JCT-VC) and integrated into the state-of-the-art RC scheme for HEVC encoder HM-16.0 [11], together with the λ-domain-based RC scheme for intra-frame coding which is proposed in [12].
In the state-of-the-art RC scheme for HM-16.0, sum of absolute transformed difference (SATD) is employed to measure the complexity of intra frames [12], and mean absolute difference (MAD) of the CTU at the same position in the previous decoded frame is used to predict the complexity of the current CTU [10]. These frame complexity measures are very simple. However, they perform poorly in allocating the target bits. It can be observed in our experiments that a large number of bits are over-spent in the intra frame and there are not enough target bits left to the subsequent inter frames in the same GOP. This may unavoidably lead to degradation of visual quality for these inter frames and undesirable fluctuation in the actual frame-level bitrate allocation. In order to solve this problem, some efforts have been done for seeking a more accurate content complexity measure. Several edge-based [13] and gradient-based [14] content complexity measures have been developed for H.264/AVC. However, since coding characteristics of HEVC are quite different from H.264/AVC, these methods for H.264/AVC are no longer applicable for HEVC. In [15,16], variance-based methods were proposed to measure the complexity for HEVC intra prediction. In [17,18], gradient was used to denote the picture content complexity for HEVC intra frame RC. Sun et al. proposed an edge-based frame complexity measure using the Gaussian gradient operator [19]. In [20], a model considering the spatial-temporal correlations was developed to measure the texture complexity, in which spatial complexity and temporal complexity referred to the texture similarities inside a single video frame and the stillness between consecutive frames in the temporal dimension, respectively.
The goal of RC is to avoid the undesirable fluctuation in bit allocation. For providing a good visual perception, it is also very important to avoid the video quality fluctuation. In [21], a RC algorithm was proposed to keep the consistent objective quality for HEVC, where distortion-quantization and rate-quantization models were derived using the Laplacian function. Unfortunately, this algorithm was developed in Q-domain, which is not appropriate for the state-of-the-art RC scheme for HM-16.0.
To better address the issues mentioned above, we attempt to achieve more accurate bit allocation and keep consistent object video quality with a different approach for HEVC in this paper. Unlike the complexity-based methods, the bit proportion of a GOP allocated to the intra frame is investigated first. Based on the research results, a novel frame-level bit allocation algorithm is developed, which provides a robust bit balancing scheme between intra frame and inter frame in a GOP to achieve the visual quality smoothness throughout the whole video sequence. Note that structural similarity (SSIM) index [22] is employed to measure the image quality in this paper since it has been shown to be effective and well matched to the perceived quality [23].
3. Initial Target Bitrate of the First Intra Frame
An accurate estimation of the initial target bitrate of the first intra frame is vital to improve the overall performance of RC. The more accurate the estimation is, the less time it will take to adjust the bit cost to a steady state. However, the initialization scheme in HM-16.0 only takes bpp into consideration, which is certainly not accurate [24]. Those complexity-based methods mentioned in Section 2 are not appropriate for HM-16.0 since new characteristics have been adopted in the state-of-the-art HEVC. In this section, we develop a novel but simple model to estimate the initial target bitrate of the first intra frame with the following new characteristics: (1) it assumes that the GOP structure is IB…B (an I frame followed by n B frames) and (2) the relationship between bpp, the bitrate of the I frame and the bitrate of the B frames in a GOP is investigated.
Denote RGOP as the target bitrate of a GOP, then
where RI is the target bitrate of the I frame and is the average target bitrate of the B frames in the GOP. Let
Then from Eqs. (1) and (2), we can obtain
i.e.
In order to discover the relationship between y and bpp, some experiments have been performed in HM-16.0, in which n = {3, 7, 11, 15} and flat QP (QPs = {17, 22, 27, 32, 37, 42}) are used in encoding. By performing curve fitting on extensive data, we find that this relationship can be accurately modeled by a Hyperbolic function as follows, which is represented by colorful curves in Fig. 1:
where a and b are the model parameters, and bpp can be calculated by:
where R is the bitrate of the sequence, f is the frame rate, w and h are the width and height of the picture respectively.
Fig. 1.bpp-y curves fitting according to Eq. (5)
Note that R2 in Fig. 1 is the correlation coefficient, which is between 0 and 1. The bigger R2 is, the closer the approximated curve is to the actual data. From the curve fitting results in Fig. 1, we can conclude that the model can fit the actual data points very well. According to Eqs. (4) and (5), RI can be obtained by:
Once the model parameters a and b in Eq. (5) are determined, RI can be well estimated. It can be observed from Fig. 1 that a and b are quite different for different video sequence, but for the same sequence, a and b for different GOP sizes are quite similar. Therefore, a simple solution is developed to obtain the parameters a and b for a specified sequence as follows:
(1) Pre-encode the first GOP using the original HM-16.0 with RC off.
(2) Compute the average QP of the B frames in the first GOP, and round it to the nearest integer, which is denoted as QP0.
(3) Use QP0 to encode the first two GOPs, from which two groups of actual bpp and y are obtained and denoted as (bpp1,y1) and (bpp2,y2), respectively.
(4) According to Eq. (5), the two sets of bpp and y from step (3) yield two equations:
Hence, the parameters a and b can be determined by solving Eq. (8) and represented as follows:
Denote RI1 as the target bitrate of the first intra frame, then according to Eq. (7), there is
where RGOP1 is the target bitrate of the first GOP.
4. Proposed Frame-Level RC Algorithm
In Section 3, an algorithm to estimate the target bitrate of the first intra frame is proposed, in which some frames need to be pre-encoded to get the model parameters. If all the intra frames in the video sequence are encoded in this way, extra complexity will be introduced and it will bring negative effects for real-time application. Furthermore, the RC scheme should be adaptive to the video content and achieve the visual quality smoothness throughout the whole video sequence. To intelligently balance bit allocation between intra and inter frames in a GOP, y should be dynamically updated. If the visual quality of the coded intra frame is higher than the average one of the inter frames in the same GOP, it means relatively more bits have been allocated to the intra frame and y of the next GOP should be decreased. Otherwise, more bits should be allocated to the next intra frame to improve its quality. After encoding the ith GOP, y of the next GOP, denoted as yi+1, can be updated as follows:
where is the actual bitrate of the ith GOP, is the actual bitrate of the intra frame in the ith GOP, k is an adjustment factor, is the average SSIM value of the inter frames in the ith GOP, and SSIMIi is the SSIM value of the intra frame in the ith GOP. Note that k is empirically set as follows:
By combining the y-updating strategy and the initial target bitrate estimation for the first intra frame described in Section 3, our proposed frame-level RC algorithm can be summarized as follows:
(1) Obtain a , b, and RI1, as described in Section 3, and set the initial y1 as follows:
(2) For the ith GOP
(3) Go to step (2) until the end of a sequence.
5. Experimental Results and Analysis
To evaluate the performance of the proposed algorithm, numerous experiments have been conducted. Twenty sequences from Class A, B, C, D, and F as specified in [25] are used for simulation. The detail information of the tested sequences is summarized in Table 1. In the experiments, Random Access (RA) Main Profile configuration is used. To conduct a fair comparison between our algorithm and the original RC scheme [10,12] in HM-16.0, we assign a target bitrate for each sequence which is obtained by performing the original HM-16.0 with RC disabled according to the HEVC common test conditions, then perform these two RC methods with the same configuration. It should be noted that the standard deviation of SSIM is implemented to measure the variation of video quality, and Bitrate error in Eq. (14) is calculated to measure the RC accuracy:
where Ract is the actual bitrate and Rtar is the target bitrate of the sequence.
Table 1.Information of test sequences used for simulation
First we perform experiments with coding structure IBBBIBBB… and GOP size equal to 4. All the sequences in Table 1 are tested. We use ten sequences as example (two sequences from each class) to show the efficiency of the proposed method in Tables 2 and 3. The overall results are illustrated in Table 4. From Tables 2, 3 and 4, it can be observed that when compared with the HM-16.0 RC method, our method significantly reduces the bitrate error. The average bitrate error of the HM-16.0 RC method is 7.16% while that of our proposed method is only 0.46%.Meanwhile, our algorithm saves more than 428 kbps bitrate on average when compared with the HM-16.0 RC method. In addition, we can find that when compared with the HM-16.0 RC method, our proposed algorithm provides better visual quality of sequences, which obtains up to 0.027440 SSIM value improvement. Moreover, the SSIM variation values of our proposed method are much smaller than that of HM-16.0 RC method, which shows the proposed method has the ability to keep more consistent objective video quality.
Table 2.Simulation results on RC (coding structure: IBBBIBBB…, GOP size: 4)
Table 3.Simulation results on objective quality (coding structure: IBBBIBBB…, GOP size: 4)
Table 4.Overall results of simulation (coding structure: IBBBIBBB…, GOP size: 4)
Then we perform experiments with coding structure IBBBBBBBIBBBBBBB… and GOP size equal to 8 to show the performance of the proposed method with bigger GOP size. All the sequences in Class C and D are tested. The overall results are illustrated in Table 5. We can find that in such configuration the proposed algorithm also works better than the HM-16.0 RC method. The bitrate error and SSIM variation of the proposed algorithm are far less than those of the HM-16.0 RC method. Meanwhile, the proposed RC scheme saves up to 793.50 kbps bitrate when compared with the HM-16.0 RC method. Besides bitrate reduction, the proposed method also has 0.001087 SSIM value on average, up to 0.009122 SSIM value gain over the HM-16.0 RC scheme.
Table 5.Overall results of simulation (coding structure: IBBBBBBBIBBBBBBB…, GOP size: 8)
Fig. 2 presents the frame-level bit allocation comparison of these two RC methods. It is obvious that when compared with the HM-16.0 RC method, the proposed RC method can keep the bit cost of different pictures in a video sequence within a narrower range.
Fig. 2.Bit cost comparison between the HM-16.0 RC method and our proposed algorithm (coding structure: IBBBIBBB…, GOP size: 4)
Fig. 3 demonstrates the frame-level objective visual quality comparison of these two RC methods. It can be observed that the SSIM curves of our proposed algorithm are consistently higher than those of the HM-16.0 RC method throughout the whole sequences. The HM-16.0 RC method unavoidably leads to obvious degradation of visual quality in the later part of the sequences. But our proposed algorithm can perform very well without much fluctuation in objective visual quality of the reconstructed frames under the same conditions.
Fig. 3.Objective visual quality comparison between the HM-16.0 RC method and our proposed algorithm (coding structure: IBBBIBBB…, GOP size: 4)
6. Conclusion
This paper presents a novel and efficient frame-level rate control algorithm for HEVC. An accurate estimation of the initial target bitrate of the first intra frame is proposed. Then a balanced frame-level bit allocation strategy is designed to improve the overall performance of RC scheme for HM-16.0. The simulation results show that the proposed algorithm is able to achieve more accurate RC and obtain better and smoother visual quality of reconstructed pictures with less bitrate when compared to the HM-16.0 RC method.
Regarding future work directions, with the objective to further enhance the overall performance of RC scheme, we will continue our research on exploring perceptual approaches for basic-unit-level bit allocation.
References
- J. R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan and T. Wiegand, “Comparison of the coding efficiency of video coding standards-including High Efficiency Video Coding (HEVC),” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1669-1684, December, 2012. Article (CrossRef Link). https://doi.org/10.1109/TCSVT.2012.2221192
- T. Wiegand, G. J. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July, 2003. Article (CrossRef Link). https://doi.org/10.1109/TCSVT.2003.815165
- A. Vetro, H. Sun and Y. Wang, “MPEG-4 rate control for multiple video objects,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 1, pp. 186-199, February, 1999. Article (CrossRef Link). https://doi.org/10.1109/76.744285
- H. J. Lee, T. H. Chiang and Y. Q. Zhang, “Scalable rate control for MPEG-4 video,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 6, pp. 878-894, September, 2000. Article (CrossRef Link). https://doi.org/10.1109/76.867926
- Y. J. Yoon, H. Kim, S. H. Jung, D. S. Jun, Y. H. Kim, J. S. Choi and S. J. Ko, "A new rate control method for hierarchical video coding in HEVC," in Proc. of IEEE International Symp. on Broadband Multimedia Systems and Broadcasting (BMSB), pp. 1-4, June 27-29, 2012. Article (CrossRef Link).
- J. J. Si, S. W. Ma, S. Q.Wang and W. Gao, "Laplace distribution based CTU level rate control for HEVC," in Proc. of Visual Communications and Image Processing (VCIP), pp. 1-67, November 17-20, 2013. Article (CrossRef Link).
- H. M. Choi, J. H. Yoo, J. H. Nam, D. Y. Sim and I. V. Baji, “Pixel-wise unified rate-quantization model for multi-level rate control,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 6, pp. 1112-1123, December, 2013. Article (CrossRef Link). https://doi.org/10.1109/JSTSP.2013.2272241
- T. Biatek, M. Raulety, J. F. Traversz and O. Deforges, "Efficient quantization parameter estimation in HEVC based on ρ-domain," in Proc. of the 22nd European Signal Processing Conference (EUSIPCO), pp. 296-300, September 1-5, 2014. Article (CrossRef Link).
- S. S. Wang, S. W. Ma, S. Q. Wang, D. B. Zhao and W. Gao, "Quadratic ρ-domain based rate control algorithm for HEVC," in Proc. of IEEE International Conf. of Acoustics, Speech and Signal Processing (ICASSP), pp. 1695-1699, May 26-31, 2013. Article (CrossRef Link).
- B. Li, H. Li, L. Li and J. Zhang, "Rate control by R-lambda model for HEVC," ITU-T SG16 Contribution, JCTVC-K0103, Shanghai, October, 2012. Article (CrossRef Link).
- JCT-VC of ISO/IEC MPEG and ITU-T VCEG, "HM Reference Software 16.0 [Online]," Article (CrossRef Link).
- M. Karczewicz and X. L. Wang, "Intra frame rate control based on SATD," in Proc. of 13th Meeting of JCTVC-M0257, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Incheon, KR, April 18-26, 2013. Article (CrossRef Link).
- Z. G. Cui and X. C. Zhu, “Image complexity adaptive intra-frame rate control algorithm for H.264,” Journal of Electronics & Information Technology, vol. 32, no. 11, pp. 2547-2553, November, 2010. Article (CrossRef Link). https://doi.org/10.3724/SP.J.1146.2009.01431
- Y. M. Zhou, Y. Sun, Z. D. Feng and S. X. Sun, “New rate-distortion modeling and efficient rate control for H.264/AVC video coding,” Signal Processing: Image Communication, vol. 24, no. 5, pp. 345-356, May, 2009. Article (CrossRef Link). https://doi.org/10.1016/j.image.2009.02.014
- W. Q. Zhao, L. Q. Shen, Z. M. Cao and Z. Y. Zhang, "Texture and correlation based fast intra prediction algorithm for HEVC," in Proc. of 9th International Forum on Digital TV and Wireless Multimedia Communication, pp. 284-291, November 9-10, 2012. Article (CrossRef Link).
- G. F. Tian and S. Goto, "Content adaptive prediction unit size decision algorithm for HEVC intra coding," in Proc. of 2012 Picture Coding Symposium, pp. 405-408, May 7-9, 2012. Article (CrossRef Link).
- L. Tian, Y. M. Zhou and X. J. Cao, "A new rate-complexity-QP algorithm (RCQA) for HEVC intra-picture rate control," in Proc. of 2014 International Conf. on Computing, Networking and Communications, pp. 375-380, February 3-6, 2014. Article (CrossRef Link).
- M. H. Wang, K. N. Ngan and H. L. Li, “An efficient frame-content based intra frame rate control for High Efficiency Video Coding,” IEEE Signal Processing Letters, vol. 22, no. 7, pp. 896-900, July, 2015. Article (CrossRef Link). https://doi.org/10.1109/LSP.2014.2377032
- L. Sun, O. C. Au, W. Dai, Y. F. Guo and R. B. Zou, "An adaptive frame complexity based rate quantization model for intra-frame rate control of High Efficiency Video Coding (HEVC)," in Proc. of 2012 Asia-Pacific Signal & Information Processing Association Annual Summit and Conference, pp. 1-6, December 3-6, 2012. Article (CrossRef Link).
- H. Sun, S. S. Gao and C. Zhang, “Adaptive bit allocation scheme for rate control in High Efficiency Video Coding with initial quantization parameter determination,” Signal Processing: Image Communication, vol. 29, no. 10, pp. 1029-1045, November, 2014. Article (CrossRef Link). https://doi.org/10.1016/j.image.2014.09.006
- C. W. Seo, J. H. Moon and J. K. Han, “Rate control for consistent objective quality in High Efficiency Video Coding,” IEEE Transactions on Image Processing, vol. 22, no. 6, pp. 2442-2454, June, 2013. Article (CrossRef Link). https://doi.org/10.1109/TIP.2013.2251647
- W. Zhou, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April, 2004. Article (CrossRef Link). https://doi.org/10.1109/TIP.2003.819861
- W. Zhou and A. C. Bovik, “Mean squared error: love it or leave it? A new look at signal fidelity measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98-117, January, 2009. Article (CrossRef Link). https://doi.org/10.1109/MSP.2008.930649
- H. Choi, J. Nam, J. Yoo, D. Sim and I. Baji ć, "Improvement of the rate control based on pixel-based URQ model for HEVC," in Proc. of 9th Meeting of JCTVCI0094, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 27 April -7 May, 2012. Article (CrossRef Link).
- F. Bossen, "HM 8 common test conditions and software reference configurations," ITU-T SG16 Contribution, JCTVC-J1100, Stockholm, July, 2012. Article (CrossRef Link).