1. Introduction
Many different factors, such as the source of video, the performance of network and transceiver terminal and expectation of user, may impair the quality of network video. The code of video will cause compression ratio, block effect, blur degree, which may impair the image quality of video. Network bandwidth, packet loss rate, bit error rate and other performance index of network will cause the loss frame of video and the pause of video. The performance of transceiver terminals, such as server performance, terminal processor and operating system, will affect the view of video. In addition, people’s expectation and environment of human may also affect the video quality. Therefore, how to assess the video quality to ensure the video quality is very necessary.
Because many factors may impair video quality, how to model the video quality assessment method is very important. In this case, the International Telecommunication Union (ITU) defines the quality of experience (QoE), which refers to the user’s comprehensive subjective feeling of the equipment, network and system, as well as the application service quality [1]. At present the designed various assessment methods focus on the QoE [2]. The objective assessment method is divided into two categories. The subjective quality assessment (SQA) is the first category. All the users are provided a controllable environment to watch videos. Next the subjective scores are given by users and analyzed [3]. The objective quality assessment (OQA) is the second category. This method uses mathematical model to assess the video quality [4].
SSCQE, DSCQS and DSIS are the typical SQA methods. A lot of non-professional people are arranged to watch videos and the subjective results are calculated by certain rules [5]. But these methods use a lot of resources, using many steps and time. They can’t be widely used in real time [6].
OQA method can be divided into different types. (1) According to original video, there are three kinds of assessment methods, including FR, RR and NR. (2) According to input parameters, there are parametric programming approach, packet layer, bit-stream layer, media layer and mixed layer.
(1) The first category
FR uses the original video and distorted video to assess video quality. SSIM, PSNR and MSE are typical SQA methods. But they can’t explain the visual and physiological characteristics; sometimes the video with different injury degree may have the same objective value. Sudeng Hu designed low-pass filter to retain the perceptible space-time domain features, and compensated and modified the relationship between MSE and video quality [7]. Xiwu Shang designed different levels of PSNRs of YCbCr to construct the correlation between the subjective scores and the combined PSNR [8]. Christos G.Bampis proposed two improved feature fusion methods, which integrated the features of space-time domain [9]. Chathura Galkandage introduced a novel human visual system model to assess video quality [10]. However, in practice the original video is usually not available, and the FR method is limited.RR method doesn’t use all information the original video. It only extracts the characteristics of the original video. National telecommunications and information administration video quality model (NTIA VQM) is the typical RR method [11].These methods also need characteristic features of video [12].
At present, NR uses the distortion characteristics and bit stream to assess video [13]. Geng Yang discussed the effect of the visible duration of visual impairment on the QoE and a Bayesian network was constructed which included hierarchical objective index and subjective evaluation result [14]. Nabajeet Barman analyzed many impairment factors, and proposed two NR methods of game video [15]. Joshua Peter Ebenezer used Space-Time Chips to model no-reference video quality assessment[16].
(2) The second category
Parametric programming approach needs the pre-set video and network parameters to predict video quality. Deutsche Telekom proposed a network parameter planning method for video service, which considered the impact of video coding distortion and packet loss distortion on video quality [17]. Jiarun Song also proposed a method, which combined channel and video characteristics to assess the video quality [18]. However, the real network environment is complex, and the related parameters are usually unpredictable, which makes the limited of this method. Packet layer uses the information of packet header to assess the video quality. Because only the packet header is needed, this kind of method can be applied in network nodes to monitor the video quality. Based on the analysis of packet header, the bit-stream layer can further analyze the information of video packets, such as frame information, macro-block information and pixel information. Compared with the packet layer, this kind of method can obtain more network and video information. In [19], a bit-stream layer method was proposed, which can better assess the video quality. In [20], a NR method was proposed, which extracted related information from bit stream and considered three key factors affecting video quality. It improved the accuracy of objective method. The above two kinds of method use more video information, it is easy to lead to large computational complexity. The media layer uses the information of video pixel to predict the video quality. Because the reconstructed video can be obtained, this method has been widely studied. In [21], an objective method based on three-dimensional discrete cosine transform was proposed. A group of spatiotemporal features was extracted, and the linear regression method was used to model the objective method. Compared with the above methods, this method only uses the information of pixel value, and omits other information. It has certain limitations in actual use. The hybrid layer not only uses the bit-stream information, but also uses the pixel information to assess the video quality. Juncai Yao comprehensively considered the characteristics of video and bit-stream information, and then added some weight coefficients to synthesize the hybrid method [22]. This method uses multiple influence parameters synthetically, so the process of extraction and calculation is relatively complex. How to use all kinds of information reasonably needs careful consideration.
In addition, some scholars use artificial intelligence methods to build assessment models. Xiaoming Tao proposed a video quality assessment model based on deep learning method [23].
Ali Al-Naji proposed a assessment method based on fuzzy interface system [24].
All the above OQA methods have their own advantages and disadvantages. But many factors may impair the video quality, the analysis process is complex. All the methods may have the related problems. 1. “Weak comprehensiveness”. Many methods don’t comprehensively consider the factors that impair the video quality. Some focus on the pixel of video frame, some focus on the network performance. 2. “Weak robustness”. Fixed mathematical model may leads to weak robustness of method. 3. “Low accuracy”. The structure of many methods can’t be adjusted to improve accuracy.
This paper comprehensively analyses many impairment factors of video and introduces Mamdani and Takagi-Sugeno fuzzy neural network to build the objective assessment model. By adjusting the network structure and optimizing the assessment model, the objective results are calculated. At the same time, the advantages and disadvantages of the two models are compared. Finally, many different methods are calculated to give objective results. This paper will give the experimental results and test the accuracy of different methods.
The main innovations of these methods are as follows.1. For the problem of “weak comprehensiveness”, different factors that impair the video quality are considered comprehensively in a model, such as application index and image index. 2. For the problem of “weak robustness”, two fuzzy neural networks are introduced, which can flexibly adjust the inference process and make the assessment method more applicable. 3. For the problem of “low accuracy”, the fuzzy neural networks can improve the accuracy of the assessment model by increasing the training times and the structure of them.
The introduction is the first section. The second section will give the impairment factors and experimental process in detail. The third section briefly introduces the principle of fuzzy neural network and proposes the assessment model. The fourth section analyzes the experimental results and compares different methods. This paper is concluded in the fifth section.
2. Impairment Factors and Experimental Process
2.1 Impairment Factors
This paper will comprehensively consider the impairment factors and improve the comprehensiveness of the OQA method. The first factor is application index, including mean re-buffering duration and re-buffering frequency [25]. The second factor is the image index of video. They are blur degree, block effect. All the indexes are considered in a objective model. Mean re-buffering duration can be calculated by equation (1).
\(\begin{aligned}T_{\text {rebuf }}=\left\{\begin{array}{c}0, \text { if } \beta \geq \lambda \\ \left(B_{\text {full }}-B_{\text {empty }}\right) \lambda / \beta, \text { if } \beta<\lambda\end{array}\right.\end{aligned}\) (1)
WhereTrebuf represents the mean re-buffering duration, Bfull is the video buffer’s size, Bempty is the remaining length of the buffered video, λ is the video’s bit-rate, and β is the average transmission control protocol (TCP) good-put. Let’s assume TCP is roughly the same as the network bandwidth.
If the length of video is l, the re-buffering frequency is calculated by equation (2).
\(\begin{aligned}F_{\text {rebuf }}=\left\{\begin{array}{c}0, \text { if } \beta \geq \lambda \\ n_{\text {rebuf }} / l, \text { if } \beta<\lambda\end{array}\right.\end{aligned}\) (2)
Where Frebuf represents re-buffering frequency. \(\begin{aligned}n_{\text {rebuf }}=\left\lceil\frac{l^{\prime}}{b_{\text {rebuf }}}\right\rceil, l^{\prime}=l-B_{\text {full }} /\left(1-\frac{\beta}{\lambda}\right)\end{aligned}\), \(\begin{aligned}b_{\text {rebuf }}=\left(B_{\text {full }}-B_{\text {empty }}\right) /\left(1-\frac{\beta}{\lambda}\right)\end{aligned}\).
When the video’s bit rate is more than average TCP good-put, the video will play back. Where l’ is the remaining length of video, and brebuf is the length of the played video.
When β ≪ λ , maximum re-buffering frequency is calculated by:
\(\begin{aligned}\max \left(f_{\text {rebuf }}\right)=\frac{1}{l}\left[\frac{l-B_{\text {full }}}{B_{\text {full }}-B_{\text {empty }}}\right]\end{aligned}\) (3)
Blur degree (Bd) and block effect (Be) are used to analyze the video quality from the perspective of image. Bd reflects the change of image detail. Calculating the temporal and spatial variation of Bd in each frame of video can reflect the change of video quality. According to reference [26], the Bd of horizontal and vertical directions is calculated respectively, so the weighted average is used to obtain the Bd of the video.
The Be reflects the edge of flat area and moving object. According to reference [27], using the periodicity of the block edge, the Be of each frame can be calculated. M represents the value of Be. Next each frame is weighted and averaged to get the Be of video.
From the above analysis, we can see that the TCP good-put is key network parameters. It can impair the application index. So, we choose the network bandwidth to simulate the real network environment, and test the videos to measure the application index.
2.2 Experimental Process
Fig. 1 shows the experimental system. The server stores many video clips. Different network environment can be simulated by switch and the network video can be transmitted to the client by router. All the experimental videos with different scene and bit rate are shown in Fig. 2 and their information is shown in Table 1. Next, the network bandwidth will be changed to test different videos.
Fig. 1. The system of experiment
Fig. 2. The test video
Table 1. Information of videos
All the videos are watched 1 minute. All the videos are watched 10 times by 26 person. According to DSIS, all subjective scores are be calculated by these people. These scores are averaged and given from 0 to 5. Finally, all videos will have their subjective scores. Meanwhile the impairment factors of different videos will be extracted.
Table 2 shows the test results of football. From the experimental data, we can see that the application indexes increase when the network bandwidth decreases. Especially Frebuf increases, user will not be able to watch or wait a long time. In addition, the phenomenon of Bd and Be will lead the decline of video quality. At the same time, the mean opinion score(MOS) also decreases. It can reflect the deterioration of video quality.
Table 2. The test results of Football
3. The Assessment Model
This paper introduces the fuzzy neural network to construct the assessment model. This model is composed of neural network and fuzzy system and used to improve the robustness and accuracy of the OQA model. It both has the ability of fuzzy system to deal with uncertain problems and the adaptive learning function of neural network [28]. Next the Mamdani and Takagi Sugeno fuzzy neural networks are introduced to construct the objective assessment model.
3.1 The Design Process
Fig. 3 shows the layers of Mamdani fuzzy neural network. The impairment factors (x = [x1, x2,..., xn]T) are inputted into the 1st layer, such as the application and image index. In the 2nd layer, Gaussian membership function is used to give weights to the index.
Fig. 3. Mamdani fuzzy neural network
\(\begin{aligned}\mu_{i}^{j}=e^{-\frac{\left(x_{i}-c_{i j}\right)}{\sigma_{i j}^{2}}} \quad(i=1 \ldots n, j=1 \ldots m)\end{aligned}\) (4)
C ij and σ ij respectively represent the parameters of membership function, μij is the weight of different index. Every node in 3rd layer represents fuzzy rule: αj = μ1i1 μ2i2...μnin. The 4th layer is to normalize the parameters of 3rd layer.
\(\begin{aligned}\overline{\alpha_{j}}=\alpha_{j} / \sum_{i=1}^{m} \alpha_{j}\end{aligned}\) (5)
The 5th layer is the output. This proposed model only sets one output.
\(\begin{aligned}y_{i}=\sum_{j=1}^{m} \omega_{i j} \overline{\alpha_{j}} \quad(i=1 \ldots r)\end{aligned}\) (6)
Fig. 4 shows the layers of Takagi-Sugeno fuzzy neural network. It includes the antecedent network and consequent network. The antecedent network’s process is the same as the Mamdani fuzzy neural network. Many parallel sub-networks with the same structure have merged to form the consequent network. Every sub-network gives one output.
Fig. 4. Takagi-Sugeno fuzzy neural network
All the impairment factors are inputted into the 1st layer, including application and image index. The 2nd layer has m nodes. Every node represents a fuzzy rule. The function of this layer gives every rule.
yij = Pj0i + Pj1i x1 +...+ Pjnixn (j = 1...m;i = 1...r) (7)
The 3rd layer is the output. The proposed model also only sets one output.
\(\begin{aligned}y_{i}=\sum_{j=1}^{m} \overline{\alpha_{j}} y_{i j} \quad(i=1 \ldots r)\end{aligned}\) (8)
3.2 The Training Process
The training process of proposed method is shown in Fig. 5. This method has designed three cycles. The 3rd cycle is to input the training sample set. If the predicted error range meets the set value, the current output and parameters will be retained. Next training sample set will be substituted into calculation. If the predicted error range doesn’t meet the set value, the parameters and membership function will be updated. This cycle uses the Takagi-Sugeno or Mamdani fuzzy neural network. The 2nd cycle sets the training times. When the 3rd cycle is completed, the training times will be increased by one. If the training times are accumulated to the initial set value, it will jump out of this cycle. The 1st cycle will compare the output with the initial set accuracy rate. If the accuracy of the predicted output reaches the set value, this cycle will jump out. At last, the parameters and test sample set are substituted into the calculation to obtain the assessment accuracy of the test set.
Fig. 5. Training process
The biggest difference between Takagi-Sugeno and Mamdani fuzzy neural network is that the former deals with the input value linearly. The next section will focus on analysing the changes of accuracy and running time under different training times.
4. Analysis of Experimental Results
40 groups of videos are selected to train, and the remaining 20 groups of videos are tested. The training videos include application index, image index and subjective scores. All the work is to verify the proposed model.
The experimental results of Mamdani and Takagi Sugeno models under different training times are recorded, and each training time is repeated five times.
With the increase of training times, the accuracy rate of Mamdani and Takagi-Sugeno fuzzy neural network all improves, but the running time increases. When the training times reach 1000 times, it will be found that the accuracy of Mamdani changes very little. But the Takagi-Sugeno continues to improve, until the training times reach 4000 times. If the training times are increased, the computational complexity will be increased and the running time will be too long.
By comparing the data in Table 3 and Table 4, we can see that the accuracy of Takagi-Sugeno model is better than Mamdani model, but Takagi-Sugeno model needs more training times and longer running time.
Table 3. The test results of Mamdani
Table 4. The test results of Takagi-Sugeno
In order to further verify this method, 1000 times Mamdani fuzzy neural network and 4000 times Takagi Sugeno fuzzy neural network are selected to test three public standard video databases(LIVE, CSIQ and IVP). Meanwhile, other different methods are compared with it. The PLCC and SROCC parameters are used to compare the accuracy of every method.
Table 5. The accuracy of different methods
The experimental results shows that the proposed method improves the similarity between the subjective and objective scores more better. The subjective and objective scores of different methods are shown in Fig. 6. The fuzzy neural network models have improved the linear relationship between different scores. They can give more accuracy objective scores, especially the Takagi-Sugeno network.
Fig. 6. Different method’s subjective and objective scores
5. Conclusions
In this paper, Mamdani and Takagi Sugeno fuzzy neural networks are introduced, and two models based on them are proposed. By adjusting the network structure, the objective results are optimized, and the existing assessment methods are compared. The test results show that the proposed methods effectively improve the accuracy.
The main advantages of the proposed method are as follows: 1. In view of the "weak comprehensiveness" of the assessment method, variety of impairment factors are comprehensively considered, and they are integrated into a model, so that they can reflect the video quality more comprehensively. 2. Aiming at the problem of "weak robustness", the applicability of the method is adjusted by the fuzzy neural network’s structure. 3. To solve the problem of "low accuracy", the training times of fuzzy neural network are increased to improve the accuracy.
Later more impairment factors and more complex network structure will be considered to improve the accuracy of proposed method.
Conflicts of interests
There is no conflict in this paper.
References
- Zhe Zhu, Hantao Liu, Jiaming Lu, et al, "A Metric for Video Blending Quality Assessment," IEEE Transactions on Image Processing, vol.29, pp.3014-3022, Nov., 2019. https://doi.org/10.1109/tip.2019.2955294
- Deepti Ghadiyaram, Janice Pan, Alan C.Bovik, "A Subjective and Objective Study of Stalling Events in Mobile Streaming Videos," IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.1, pp. 183-197, Jan. 2019. https://doi.org/10.1109/TCSVT.2017.2768542
- Zhiming Shi, Chengti Huang, "Network Video Quality Assessment Method using Fuzzy Decision Tree," IET Communications, vol.13, no.14, pp. 2192-2198, Aug.2019. https://doi.org/10.1049/iet-com.2019.0062
- Shyamprasad Chikkerur, Vijay Sundaram, Martin Reisslein, et al, "Objective Video Quality Assessment Methods: A Classification, Review and Performance Comparison," IEEE Transactions on Broadcasting, vol.57, no.2, pp. 165-182, June. 2011. https://doi.org/10.1109/TBC.2011.2104671
- Xiaoming Tao , Yiping Duan , Mai Xu , Zhishen Meng , Jianhua Lu , " Learning QoE of Mobile Video Transmission With Deep Neural Network: A Data-Driven Approach," IEEE Journal on Selected Areas in Communications, vol.37, no.6, pp.1337-1348, June.2019. https://doi.org/10.1109/jsac.2019.2904359
- Balasubramanyam Appina, Sathya Veera Reddy Dendi, K. Manasa, et al, "Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos," IEEE Transactions on Image Processing, vol.28, no.10, pp.5027-5040, Oct.2019. https://doi.org/10.1109/tip.2019.2914950
- Sudeng Hu, Lina Jin, Hanli Wang, et al, "Objective Video Quality Assessment Based on Perceptually Weighted Mean Squared Error," IEEE Transactions on Circuits and Systems for Video Technology, vol.27, no.9, pp.1844-1855, Sept. 2017. https://doi.org/10.1109/TCSVT.2016.2556499
- Xiwu Shang, Jie Liang, Guozhong Wang, et al, "Color-Sensitivity-Based Combined PSNR for Objective Video Quality Assessment," IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.5, pp.1239-1250, May.2019. https://doi.org/10.1109/tcsvt.2018.2836974
- Christos G. Bampis, Zhi Li, Alan C. Bovik, "Spatiotemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment," IEEE Transactions on Circuits and Systems for Video Technology, vol.29, no.8, pp.2256-2270, Aug.2019. https://doi.org/10.1109/tcsvt.2018.2868262
- Chathura Galkandage, Janko Calic, Safak Dogan, et al, "Full-Reference Stereoscopic Video Quality Assessment Using a Motion Sensitive HVS Model," IEEE Transactions on Circuits and Systems for Video Technology, vol.31, no.2, pp.452-466, Feb., 2021. https://doi.org/10.1109/TCSVT.2020.2981248
- Pinson M H, Wolf S, "A New Standardized Method for Objectively Measuring Video Quality," IEEE Transactions on Broadcasting, vol.50, no.3, pp.312-322, Sept. 2004. https://doi.org/10.1109/TBC.2004.834028
- Min Liu, Ke Gu, Guangtao Zhai, et al, "Perceptual Reduced-Reference Visual Quality Assessment for Contrast Alteration," IEEE Transactions on Broadcasting, vol.63, no.1, pp.71-81, Mar.2017. https://doi.org/10.1109/TBC.2016.2597545
- Zhiming Shi, Chengti Huang, "Quality of Experience Models for Network Video Quality," The Journal of China Universities of Posts and Telecommunications, vol.26, no.4, pp.80-88, Aug.2019.
- Geng Yang, Shao Sujie, Guo Shaoyong, et al, "Bayesian network-based video QoE assessment method using image sustained damage analysis," Journal on Communications, vol.38, no.6, pp.1-6, June, 2017.
- Nabajeet Barman, Emmanuel Jammeh, Seyed Ali Ghorashi, et al, "No-Reference Video Quality Estimation Based on Machine Learning for Passive Gaming Video Streaming Applications," IEEE Access, vol.7, pp. 74511-74527, June.2019. https://doi.org/10.1109/access.2019.2920477
- Joshua Peter Ebenezer, Zaixi Shang, Yongjun Wu, et al, "ChipQA: No-Reference Video Quality Prediction via Space-Time Chips," IEEE Transactions on Image Processing, vol.30, pp.8059-8074, Sep.2021. https://doi.org/10.1109/TIP.2021.3112055
- Raake A, Garcia M N, Moller S, et al, "TV-model: Parameter-based Prediction of IPTV Quality," in Proc. of 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, pp. 1149-1152, 31 March - 4 April, 2008.
- Jiarun Song, Fuzheng Yang, Yicong Zhou, et al, "Parametric Planning Model for Video Quality Evaluation of IPTV Services Combining Channel and Video Characteristics," IEEE Transactions on Multimedia, vol.19, no.5, pp.1015-1029, May.2017. https://doi.org/10.1109/TMM.2016.2638621
- Zhibo Chen, Ning Liao, Xiaodong Gu, et al, "Hybrid Distortion Ranking Tuned Bitstream-Layer Video Quality Assessment," IEEE Transactions on Circuits and Systems for Video Technology, vol.26, no.6, pp.1029-1043, June. 2016. https://doi.org/10.1109/TCSVT.2015.2441432
- Fuzheng Yang, Shuai Wan, Qingpeng Xie, et al, "No-Reference Quality Assessment for Networked Video via Primary Analysis of Bit Stream," IEEE Transactions on Circuits and Systems for Video Technology, vol.20, no.11, pp.1544-1554, Oct.2010. https://doi.org/10.1109/TCSVT.2010.2087433
- Xuelong Li, Qun Guo, Xiaoqiang Lu, "Spatiotemporal Statistics for Video Quality Assessment ," IEEE Transactions on Image Processing, vol.25, no.7, pp. 3329-3342, July, 2016. https://doi.org/10.1109/TIP.2016.2568752
- Juncai Yao , Guizhong Liu, "Bitrate-Based No-Reference Video Quality Assessment Combining the Visual Perception of Video Contents," IEEE Transactions on Broadcasting, vol.65, no.3, pp.546-557, Sept. 2019. https://doi.org/10.1109/tbc.2018.2878360
- Xiaoming Tao, Yiping Duan, Mai Xu, et al, "Learning QoE of Mobile Video Transmission With Deep Neural Network: A Data-Driven Approach," IEEE Journal on Selected Areas in Communications, vol.37, no.6, pp.1337-1348, Mar.2019. https://doi.org/10.1109/jsac.2019.2904359
- Ali Al-Naji, Sang-Heon Lee, Javaan Chahl, "Quality Index Evaluation of Videos Based on Fuzzy Interface System," IET Image Processing, vol.11, no.5, pp. 292-300, Sept.2017. https://doi.org/10.1049/iet-ipr.2016.0569
- Ricky K. P. Mok, Edmond W. W. Chan, Rocky K.C. Chang, "Measuring the Quality of Experience of HTTP Video Streaming," in Proc. of 2011 12th IFIP/IEEE International Symposium on Integrated Network Management and Workshops, Dublin, Ireland, pp.485-492, May 23-27, 2011.
- Taichi Kawano, Kazuhisa Yamagishi, Keishiro Watanabe, et al, "No Reference Video Quality Assessment Model For Video Streaming Services," in Proc. of 2010 18th International Packet Video Workshop, Hong Kong, China, pp. 158-164, Dec 13-14, 2010.
- Xu Zheng, Bo Yang, Yanwen Liu, et al, "Blockiness Evaluation for Reducing Blocking Artifacts in Compressed Images," in Proc. of 2009 Digest of Technical Papers International Conference Electronics, Las Vegas, NV, USA, pp.1-2, Jan 10-14 2009.
- Huiming Guo, Weiming Zeng, Yuhu Shi, et al, "Kernel Granger Causality Based on Back Propagation Neural Network Fuzzy Inference System on fMRI Data," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol.28, no.5, pp.1049-1058, May, 2020. https://doi.org/10.1109/tnsre.2020.2984519