1. Introduction
Recently, the demand for mobile virtual reality (VR) streaming services has significantly increased with widespread deployment of commercialized VR products such as head-mounted display, 360-degree cameras, real-time stitching equipments, headsets etc. [1-3]. Moreover, 4K ultra-high definition (UHD) VR live streaming service was first realized in South Korea in 2016. It is further expected that VR streaming can be used in medical and defense industries in addition to broadcasting. In order to improve the quality of such rich multimedia streaming for consumer electronic devices in a mobile environment, various approaches have been investigated in literature, aiming for optimal adjustment of data rates. This is important because the selection of optimal data rates corresponding to the network conditions can directly shorten the startup delay, which is the time required to begin a video streaming service, and re-buffering rates. This can lead to less user abandonment and more user engagement from the point of view of operators.
One of the most popular methods is that a user selects encoding bit rates at the application level by monitoring dynamic network conditions and subsequently feeds its corresponding video data rate back to a server, which can easily prevent re-buffering problems [4-8]. In the case of VR streaming, owing to the larger dimension of view points (e.g., 360-degree views) in videos compared to conventional video streaming, VR service requires higher data dates (i.e., 4K UHD streaming). Hence, it has been an important and challenging mission to design band width efficient VR streaming scheme to improve the satisfaction of consumer electronic devices.
By considering the situation that only part of the entire view is displayed whereas the remaining parts are decoded but not displayed in VR streaming, there have been various approaches to design bandwidth-efficient VR streaming [2], [9-14]. However, even though some recent studies related to VR streaming services have been carried out, most of work focuses on video on-demand based VR streaming instead of live streaming. Thus, without considering a synchronization problem among multiple clients from latency, enhancing view port adaptive streaming algorithm for bandwidth reduction have been focused. Further, another limitation is that there is no detailed measurement evaluation on commercial networks.
This paper proposes a dynamic-tiling-based bandwidth-efficient (DTBE) VR streaming scheme. We consider that there are multiple tiles with different encoding rates (e.g., high-efficiency video coding (HEVC) tile modes [15] for normal and high quality) such that the tiles with high encoding rates are selectively assigned to the focused view. Further, the other tiles with low encoding rates are assigned to the remaining view points. Accordingly, we introduce a view-point tracking (VPT) message as an explicit feedback information for providing real time view point tracking. The proposed scheme can perform dynamic tiling in real-time by tracking. For rapid viewpoint adaptation, we design a novel rapid view adaptation mechanism that selectively delivers an I-frame during view point updates by using frame indexing. This mechanism can be useful for live VR streaming with reduced view adaptation delay during head movement. The contributions of this paper are summarized as follows.
- Novel DTBE VR streaming scheme is proposed, which consists of dynamic tiling and novel rapid view adaptation mechanism. We implement the proposed scheme on a commercial VR test bed where we adopt the MPEG media transport (MMT) standard with HEVC tile mode.
- Through this VR test bed, we firstly evaluate the instantaneous data usages of the proposed scheme compared to conventional HTTP live streaming (HLS) scheme. The results demonstrate that the proposed scheme achieves an average data usage reduction of almost 65.2% and outperforms under good channel condition, bad channel condition and nomadic cases.
- We also conduct a measurement study on view adaptation delay over multiple clients. The performance evaluation reveals that the proposed scheme reduces almost 57.7% in live VR streaming compared to the conventional HLS scheme.
Through the commercial VR testbed, it is demonstrated that the proposed scheme (i) is easily implemented on consumer electronic devices and server-side, (ii) improves satisfactions of consumer electronic devices by providing reduced data usage as well as view adaptation delay.
The rest of this paper is organized as follows. Section II reviews previous studies on current VR streaming technologies. Then Section III describes the proposed DTBR VR streaming scheme in detail. The measurement based performance evaluation of the proposed scheme is provided in Section IV. Finally, Section V concludes this paper.
2. Related Work
As an initial research of VR streaming service, motion prediction based partial-viewpoint-transmission was provided in [9]. Ideally, the proposed scheme can reduce band width consumption by 80%. However, since the motion prediction might result in prediction error, there is high possibility to induce the quality degradation for VR streaming services. Afterwards, allocating different sizes of information (i.e., allocating different encoding rates) have been considered in VR streaming services. This concept was already introduced and widely adopted in medical services and military services with bandwidth limited situation as Region-of-interest (ROI) video coding [16-18], which disproportionately allocates larger information rate resources to the ROI as compared to the background to meet the limited band width requirement. The work in [10] suggested smoothing the top and the bottom regions of video before encoding by exploiting characteristic of the equirectangular mapping. The work of Zare et al. [2] proposed HEVC-compliant tile-based streaming for VR services. They aimed to reduce data usage for VR streaming by assigning high encoding rate only to the focused view. Similarly, the work of Corbillon et al. [11] proposed a viewport-adaptive streaming with a cube-map projection by introducing a quality emphasized region (QER). As discussed in [11], VR live streaming development as well as optimizing adaptation algorithms are still open problems. Recently, Zhou et al. [12] conducted a measurement study of 360-degree video streaming from reverse engineering. In this study, they revealed that a combination of quality level adaptation and view orientation adaptation is used for commercial VR devices. Especially, the authors verify that there is a distortion of spherical surface owing to assigning more pixels for the focused view. Nsrabadi et al. [13] adopt layered encoding for VR streaming services. Here, base layer is allocated to all view points while enhancement layers are allocated only to the focused view points. Even though the work in [13]represents interesting concept for VR streaming, but a detailed implementation and evaluation of the proposed scheme are not provided. As similar approach to [13], multiview coding and scalable video coding are applied to design view port adaptive streaming in [14]. However, such layered coding as well as multi-view coding could provide an insight to novel algorithm but it still requires high complexity to encode and decode for applying into real practical system.
To the best of our knowledge, this is the first evaluation of live VR streaming on the commercial network. Especially, we reveal that for live VR streaming service, synchronization among multiple clients is a very important criterion. For instance, when thereare frequent head movements, the focused view with high encoding rate should be updated by monitoring the instantaneous point of view. This update incurs delay for adjusting the viewpoint. We refer to this delay as a view adaptation delay. Hence, a delay for view adaptation significantly impacts the quality of live VR streaming service, which could result in asynchronous VR live streaming among clients. Nevertheless, there is no consideration of reduction of this delay for improving the quality of live VR streaming services. This is a limitation of the previous works.
3. Proposed DTBE VR Streaming Scheme and Its Implementation on
Commercial Testbed
In this section, we first explain the proposed scheme as illustrated in Fig. 1. We consider that VR contents are encoded with multiple encoding levels such as low encoding rates \(R_l\) and high encoding rates \(R_h\) for 4K VR contents (3840x1920). This encoded content consists of \(M\) multiple tiles. Thus, high encoding rates and low encoding rates per tile can be denoted as \(TR_l\) and \(TR_h\), respectively, where \(TR_l = R_l/M\) and \(TR_h = R_h /M\). The novel dynamic tiling method adopted in the proposed scheme assigns high encoding rates \(TR_h\) to N tiles in a focused view where \(N=M \cdot \eta \) . Notably, \(\eta(0<\eta \leq 1)\) is the ratio of the focused view to the total number of views such that \(N/M=\eta\). Furthermore, low encoding rates \(TR_l\)are assigned to H tiles in the remaining views where \(H=M \cdot(1-\eta)\) . Thus, the total data usage \(R_{total} \)for VR streaming with respect to is given by
\(R_{\text {total}}=T R_{l} \cdot H+T R_{h} \cdot N=T R_{l} \cdot M \cdot(1-\eta)+T R_{h} \cdot M \cdot \eta\)
Note that Fig. 2 shows a simple numerical analysis of the number of tiles and data usage for VR streaming where\( M = 48\), \(R_h = 10\) Mbps and \(R_l = 4 \)Mbps. Here, the lower the value of we set, the higher the data usage reduction gain. Nevertheless, the lower size of focused view results in frequent view point adaptations such that the clients should wait for refreshing the updated view point with \(TR_h\). We refer to this as view adaptation delay. This phenomenon deteriorates the quality of VR streaming services especially when clients have fast head movement. Thus, owing to the trade-off relationship between the reduction of data usage and quality of VR streaming, should be carefully managed by the operator. Setting the optimal isout of the scope of this letter. Its value relies on the policy of operators. In order to alleviatethis effect, the proposed scheme designs a rapid view point adaptation mechanism.
Fig. 1. Proposed DTBE VR streaming
Fig. 2. Total data usage, N and H with respect to \(\eta\)
The process of rapid view point adaptation mechanism is described as shown in Fig. 3. Here, a server generates a packet per encoded frame in real-time. For instance, I-/P-/B-frames are mapped to distinct packets with different packet sequence numbers. There is an I-frame indexer at the server, which only saves the packet sequence numbers of I-frame packets. Whenever view point updates happen owing to the client’s head movement, a client sends view adaptation request to dynamic tiling server for new view adaptation. Then, after the server receives this request, it searches I-frame packet sequence within 2 seconds from the requested time by referring to the I-frame indexer. Here, 2 seconds is the client’s buffer size. Finally, the server can find the packet sequence number of nearby I-frame and selectively delivers this I-frame packets to the clients for rapid view adaptation. By utilizing this novel rapid view point adaptation mechanism, the proposed scheme can send an I-frame packet assoon as possible. This is different from the conventional scheme without mapping between packet numbers and encoded frames. Accordingly, a conventional scheme cannot selectively deliver an I-frame packet, which almost requires a group of pictures time (e.g., 2 s) for view adaptation. Owing to this novel mechanism, the proposed scheme only takes a maximum of 0.675 s for view adaptation (conventional scheme requires 1.765 s).
Fig. 3. Overall procedure of rapid view point adaptation mechanism
We implement our proposed scheme on a commercial VR test bed. The test bed configuration of the proposed scheme is illustrated in Fig. 4 Compared to legacy live streaming protocol such as HLS, MMT provides low start-up latency owing to small buffer size. This can be achieved by fragmenting media data into very small chunks and having ashort signaling interval [19-20]. Further, we have customized the MMT protocol to support live VR streaming. In the test bed, there is a VR 360-degree camera connected to a live stitching system. The VR 360-degree camera records all the angles of a scene. Subsequently, the live stitching system creates a 360-degree video by stitching all the angles of a scene. The process in 4K Real-time stitching server conducted in the proposed test bed is depicted in Fig. 5. Firstly, to find matching line over 6 video inputs from 360 degree camera, 4K real-time stitching server calibrates multiple sources which is called as calibration process. And then, it corrects a brightness, contrast and color from 6 vides sources with same values to stitch thems moothly, which is exposure compensation process. Finally, blending and rendering process are conducted to stitch multiple input sources together and depict those aggregated 360 video contents as equirectangular/sphere manner on GPU. We adopt the mobile MMT standard as the streaming protocol. The live HEVC encoder encodes the 360-degree VR contents with low encoding rates and high encoding rates for HEVC stream, which are delivered to the proposed dynamic tiling server. The proposed dynamic tiling server manipulates the multiple tiles of encoded content such that the tiles with high encoding rates are selectively assigned to the focused view. Furthermore, the other tiles with low encoding rates are assigned to the remaining view points. For real-time tracking of a focused view, a VPT message is newly introduced as an explicit feedback information. Subsequently, the dynamic tiling server isinformed of this VPT by an application signaling a message from the client, and it manages the title allocation accordingly. Notably, the VPT message is dynamically generated at the client. The VPT message format is shown in Table 1. Finally, a VR player at the client consumes the 360-degree VR contents in a bandwidth efficient manner. The proposed DTBE VR streaming were applied into various use cases such that driving test, mobile streaming service, VR live sport streaming service as show in Fig. 6.
Fig. 4. Test bed configuration of the proposed DTBE VR streaming
Fig. 5. The process in 4K Real-time stitching server in the proposed test bed
Table 1. Proposed VPT Message Format
Fig. 6. Use cases the proposed DTBE VR streaming for (a) driving test (b) mobile streaming service and (c) VR live sport streaming service
4. Performance Evaluation
In this section, we provide the performance evaluation of the proposed DTBE VR streaming scheme in a commercial Long-Term-Evolution (LTE) environment. We compare our proposed DTBE VR streaming scheme with the conventional HLS-based VR streaming scheme. For the proposed scheme, we set M multiple tiles to 48 (=6 x 8) and to 0.25 for 4K 360-degree VR contents (3840 x 1920). Accordingly, N = 12 (=4x3) (1920 x 960) and H = 36 were adopted for the configuration. We set high bit rate (\(R_h\)) to 10 Mbps and low bit rate (\(R_l\)) to 4 Mbps, which is based on the commercial 4K mobile streaming data rate. Accordingly, we can simply calculate high bit rate per tile (\(TR_h\)) and low bit rate per tile (\(TR_l\)) as 10/48 = 0.21Mbps and 4/48 = 0.083 Mbps, respectively. Then, N accounts for 2.5 Mbps (\(=N \times TR_h \ or \ R_h \times \eta\) ). Further, H accounts for 3 Mbps (\(=H \times TR_l \ or \ R_l \times (1-\eta)\)). In this sense, as we discussed before, the lower the value of we set, we can have the lower size of focused view, which means that the value of H decreases and the value of L increases. Accordingly, the service provider would achieve more data reduction gain by allocating high encoding rates to the reduced number of tiles. Nevertheless, since such reduced size of focused view deteriorates the quality of VR streaming services depending on the clients’ head movement. Therefore, optimal encoding rates to the focused view decided by \(\eta\) should be carefully managed. By considering the clients’ head movement pattern and its relationship with view adaptation delay, various forms of optimal operation of VR streaming will be considered as our promising future research. Moreover, the proposed scheme requires additional signaling for VPT message delivery to the server and dynamic tiling servers to manage the tiles. Such additional overhead required in the proposed scheme and careful decision \(\eta\) of can be considered as a limitation of the proposed work.
For extensive measurement-based evaluation, we utilize a smartphone as a VR client, which is equipped with the proposed DTBE VR streaming scheme. This VR client requests a 360-degree VR content with 2 minutes over various wireless conditions (e.g., 15 Mbps - 80Mbps of average data rates). Moreover, the proposed dynamic tilling server is implemented on Linux centOS 6 server (2U) with 64GHz multi-core CPU-24 core xeon and 600GB SSD. HEVC encoder version 1-Main Profile/Level 5.1, which is the first approved version of the HEVC/H.265 standard on 2013, is adopted for tiling support.
Fig. 7 illustrates the instantaneous data usages while streaming 360-degree live VR contents to achieve the same quality of VR services. Further, the term quality of VR services is translated into the encoding rates of view point, which the client focuses on. We observe that the proposed scheme consumes average data of almost 6.4 Mbps under good channel conditions as shown in Fig. 7(a). However, the conventional scheme consumes average data of almost 16.5 Mbps owing to the support provided to entire view points with high data rates. Notably, the conventional scheme requires 16 Mbps for general 4K VR 360 streaming. Hence, the proposed scheme can reduce data rate by almost 65.2% from the dynamic view point adaptation. Moreover, owing to the adoption of the mobile MMT standard with live HEVC encoder, we can further reduce the data usage with less encoding overhead. An interesting result from this measurement evaluation is that the proposed scheme also outperforms the conventional scheme in terms of peak data usages. Specifically, the peak data rates of the proposed scheme are only 9 Mbps whereas those for the conventional scheme are almost 70Mbps. This indicates that the conventional scheme may easily result in bandwidth bottleneck under multiple VR clients. This is because the MMT-based VR streaming fragments media data into very small chunks with a short signaling interval such that it can smoothly fill the buffer with VR chunks and maintain this buffer at a low level. However, the conventional HLS-based scheme aggressively fills the buffer level at a high level. We also verify similar data usage patterns under different wireless conditions. Fig. 7(b) represents the data usage pattern under a bad channel condition. The only difference from the previous results is that the peak data usage of the conventional scheme is reduced owing to bad channel condition. The proposed scheme is still stably operating for live VR streaming in achieving low peak data usage. Moreover, in Fig. 7(c), we also evaluate the stability of the proposed scheme under nomadic condition. The result demonstrates that the proposed scheme shows good performance even under a dynamic channel condition when the client is moving. As shown in Fig. 7(a)-(c), the conventional scheme outperforms after approximately 100 seconds. This is because the conventional scheme already consumes all data for streaming due to aggressive buffer filling.
Fig. 7. Comparison of instantaneous data usages for the proposed DTBE and conventional scheme over time under (a) good channel condition (b) bad channel condition and (c) nomadic cases
Subsequently, as depicted in Fig. 8, we demonstrate the performance of view adaptation delay of multiple clients when a client is moving his head independently. From this evaluation, we measured the view adaptation delay of 20 clients while using proposed DTBE VR streaming scheme and conventional HLS based scheme. Specifically, the maximum value of view adaptation delay for the proposed scheme is bounded to 0.675s. On the other hand, the conventional scheme reaches at 1.765 s. In the case of minimum value of view adaptation delay, the proposed scheme also outperforms conventional scheme. Accordingly, from standard deviation of the view adaptation, conventional scheme presents more fluctuated view adaptation delay according to user head tracking behavior (from 0.860 s to 1.765 s). Conversely, the proposed scheme has small value of standard deviation, which means that view adaptation delay is bounded to specific range (i.e., from 0.461 s to 0.675 s). Finally, we verify that the proposed scheme achieves almost 57.648% reduction of view adaptation delay compared with the conventional scheme. The summary of the results of view adaptation delay are presented in Table 2.
Fig. 8. Comparison of view adaptation delay of multiple clients for the proposed DTBE and conventional scheme
Table 2. Summary of View Adaptation Delay
5. Conclusion
We have proposed a dynamic-tiling-based bandwidth efficient VR streaming scheme. We have presented an implementation of the proposed scheme on a commercial VR test bed for bandwidth reduction and view adaptation delay analysis. Finally, the results have demonstrated that our proposed scheme can achieve both average data usage and view adaptation delay reduction compared to the conventional scheme. For our future work, the effect of such delay on quality of VR services and psychological impact can be measured and analyzed.
Acknowledgement
The authors would like to thank the anonymous reviewers for their valuable comments that contributed to the improved quality of this paper.
References
- K. K. Sreedhar, A. Aminlou, M. M. Hannuksela and M. Gabbouj, "Viewport-Adaptive Encoding and Streaming of 360-Degree Video for Virtual Reality Applications," in Proc. of IEEE ISM '16, San Jose, CA, USA, 2016.
- A. Zare, A. Aminlou, M. M. Hannuksela and M. Gabbouj, "HEVC compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications," in Proc. of ACM MM, Amsterdam, Netherlands, 2016, pp. 601-605.
- S.N. Yao, "Headphone-Based Immersive Audio for Virtual Reality Headsets," IEEE Trans. Consumer Electronics, vol. 63, no. 3, pp. 300-308, Aug. 2017. https://doi.org/10.1109/TCE.2017.014951
- S. Akhshabi, A. C. Begen and C. Dovrolis, "An Experimental Evaluation of Rate-Adaptation Algorithms in Adaptive Streaming over HTTP," in Proc. of ACM MMSys, San Jose, CA, USA, 2011, pp. 157-168.
- T. Y. Huang, N. Handigol, B. Heller, N. Mackeown and R. Johari, "Confused, timid, and unstable: picking a video streaming rate is hard," in Proc. of ACM IMC, Boston, Massachusetts, USA, 2012, pp. 225-238.
- T. Y. Huang, R. Johari, N. Mackeown, M. Trunnell and M. Watson, "A buffer-based approach to rate adaptation: evidence from a large video streaming service," in Proc. of ACM SIGCOMM, Chicago, Illinois, USA, 2014, pp. 187-198.
- Y.-M. Hsiao, C.-H. Chen, J.-F. Lee, and Y.-S. Chu, "Designing and implementing a scalable video-streaming system using an adaptive control scheme," IEEE Trans. Consumer Electronics, vol. 58, no. 4, pp. 1314-1322, Nov. 2012. https://doi.org/10.1109/TCE.2012.6415001
- J. Hwang, J. Lee, N. Choi and C. Yoo, "HAVS: Hybrid Adaptive Video Streaming for Mobile Devices," IEEE Trans. Consumer Electronics, vol. 60, no. 2, pp.210-216, May 2014. https://doi.org/10.1109/TCE.2014.6851996
- Y. Bao, H. Wu, A. A. Ramli, B. Wang and X. Liu, "Viewing 360 Degree Videos: Motion Prediction and Bandwidth Optimization," in Proc. of IEEE ICNP, Singapore, 2016.
- M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson and A. Dickerson, "360 degrees video coding using region adaptive smoothing," in Proc. of IEEE ICIP, Quebec City, QC, Canada, 2015.
- X. Corbillon, G. Simon, A. Devlic and J. Chakareski, "Viewport-Adaptive Navigable 360-Degree Video Delivery," in Proc. of IEEE ICC, Paris, France, 2017.
- C. Zhou, Z. Li and Y. Liu, "A Measurement Study of Oculus 360 Degree Video Streaming," in Proc. of ACM MMSys, Taipei, Taiwan, 2017.
- A. T. Nasrabadi, A. Mahzri, J. D. Beshay and R. Prakash, "Adaptive 360-Degree Video Streaming using Layered Video Coding," in Proc. of IEEE VR, Los Angeles, CA, USA, 2017.
- E. Kurutepe, M. R. Civanlar and A. M. Tekalp, "Selective Streaming of Multi-View Video for Head-Tracking 3D Displays," in Proc. of ICIP, San Antonio, TX, USA, 2007.
- Information technology -High efficiency coding and media delivery in heterogeneous environments -Part 1: MPEG media transport (MMT), AMENDMENT 2: Enhancements for Mobile Environments, MPEG Standard, 2017.
- M. Makar, A. Mavlankar, P. Agrawal and B. Girod, "Real-time video streaming with interactive region-of-interest," in Proc. of IEEE ICIP, Hong Kong, China, 2010.
- S. Khire, S. Robertson, N. Jayant, E. A. Wood and M. E. Stachura, "Region-of-interest video coding for enabling surgical telementoring in low-bandwidth scenarios," in Proc. of IEEE MILCOM, Orlando, FL, USA, 2012.
- Y. Feng, G. Cheung, W. Tan, P. L Callet and Y. Ji, "Low-Cost Eye Gaze Prediction System for Interactive Networked Video Streaming," IEEE Trans. Multimedia, vol. 15, no. 8, pp.1865-1879, Dec. 2013. https://doi.org/10.1109/TMM.2013.2272918
- S. Cho, J. Lee and K. Park, "Low delayed Mobile Live Streaming method and its implementation," in Proc. of IEEE ICMEW, Turin, Italy, 2015.
- H. Jang, J. Lee, H. Choi and S. Cho, "Study on the field test result of mobile MMT trial service over LTE network at open dense area, subway and high speed train," in Proc. of IEEE ICME, Seattle, WA, USA, 2016.
Cited by
- [Paper] The Effect of MMT AL-FEC on QoE of Error-Concealed Video Streaming vol.8, pp.3, 2019, https://doi.org/10.3169/mta.8.186