DOI QR코드

DOI QR Code

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method

WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구

  • Kim, I-Gil (KT Institute of Convergence Technology)
  • Received : 2015.03.13
  • Accepted : 2015.04.30
  • Published : 2015.04.30

Abstract

In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.

빠르게 발전하는 IT환경 속에서 이제 동영상 콘텐츠는 TV를 통한 일방 시청이 아니라 언제 어디서든 다양한 단말에서 볼 수 있는 VOD (Video on Demand) 형태로 발전하고 있다. 이러한 동영상 시청형태의 변화는 디지털이란 특성 때문에 동영상의 재생 속도 또한 다양하게 조절할 수 있다는 부가적인 장점을 사용자에게 제공 한다. 지루하고 따분한 동영상 콘텐츠는 빠르게 돌려보고 흥미 있는 장면은 느리게 천천히 볼 수 있는 동영상 미세배속 재생 기능은 오늘날 다양한 동영상 플레이어에서 제공되고 있다. 동영상 미세배속 재생 시 동영상 콘텐츠 내용의 정확한 이해를 위해서는 시각정보 못지않게 음성정보 청취가 중요한데 정상속도 보다 빠르거나 느린 재생 시 발생하는 음성의 왜곡을 줄이기 위한 음성미세배속 기술들이 음성처리 분야에서 꾸준히 발전되어 왔다. 본 논문에서는 이중 WSOLA와 같은 우수한 음성미세배속 알고리즘에 대해 알아보고 동영상 시청 시 이러한 기능 제공이 실제 얼마나 사용자 니즈(needs)에 부합하는 지 분석해보고자 한다. 특히, 동영상 콘텐츠를 사용자의 콘텐츠 소비 목적에 따라 종류별로 구분하여 재생 배속의 선호도를 조사하고 그 결과를 분석해 봄으로써 동영상 미세배속 기능 제공시 콘텐츠별 소비 목적에 맞게 재생 배속을 제공하는 것이 필요하다는 것을 제안하고자 한다.

Keywords

References

  1. J. Laroche and M. Dolson, "Improved phase vocoder time-scale modification of audio," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 323-332, May1999. https://doi.org/10.1109/89.759041
  2. D. W. Griffin and J. S. Lim, "Signal estimation from modified short time Fourier transform," IEEE Trans. Audio, Speech, Signal Process., vol. ASSP-32, no.2, pp. 236-243, Apr. 1984.
  3. E. Moulines and J. Laroche, "Non-parametric techniques for pitchscale and time-scale modification of speech," Speech Commun., vol. 16, no. 2, pp. 175-206,1995. https://doi.org/10.1016/0167-6393(94)00054-E
  4. E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speechsynthesis using diphones," Speech Commun., vol. 9,no. 5-6, pp. 453-467, 1990. https://doi.org/10.1016/0167-6393(90)90021-Z
  5. W. Verhelst, "Overlap-add methods for time-scaling of speech," Speech Commun., vol. 30, no. 4, pp. 207-221, 2000. https://doi.org/10.1016/S0167-6393(99)00051-5
  6. Shahaf Grofit, Yizhar Lavner, "TimeScale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients", IEEE Transactions on Audio, Speech & Language Processing-TASLP, vol. 16, no. 1, pp. 106-115, 2008 https://doi.org/10.1109/TASL.2007.909444
  7. Ivan Damnjanovic, Dan Barry, David Dorran, JoshuaD. Reiss, "A Real-Time Framework for Video Timeand Pitch Scale Modification," IEEE Transactionson Multimedia-TMM, vol. 12, no. 4, pp. 247-256, 2010 https://doi.org/10.1109/TMM.2010.2046296
  8. Wlodarczyk, M., Sekalski, P., "Evaluation of time-scale modification methods for audio signals on mobile devices with android OS", Proceedings of the 21st International Conference on Mixed Design of Integrated Circuits & Systems (MIXDES), 2014
  9. H.Valbret,E.Moulines,andJ.P.Tubach,"Voice transformation using PSOLA techniques," Speech Communication., vol. 11, pp. 175-187, 1992. https://doi.org/10.1016/0167-6393(92)90012-V
  10. S. Roucos and A. Wilgus, "High quality time-scalemodification of speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Tampa, FL, Mar.,pp. 493-496, 1985.
  11. S. Grofit and Y. Lavner, Time-scale modification of audio signals using enhanced wsola with management of transients, IEEE Transactions on Audio, Speech & Language Processing, 16, pp. 106-115, 2008 https://doi.org/10.1109/TASL.2007.909444
  12. W. Verhelst and M. Roelands, "An overlap-add technique based on waveform similarity (WSOLA) forhigh quality time-scale modifi-cation of speech,"in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Minneapolis, MN, pp. 554-557, 1993.
  13. A. Efrat, Q. Fan, and S. Venkatasubramanian. Curve matching, time warping, and light fields: New algorithms for computing similarity between curves. J.Mathematic Imaging and Vision, 2007.
  14. M. Munich and P. Perona, "Continuous dynamic time warping for translation-invariant curve alignmentwith applications to signature verification," in International Conference on Computer Vision (ICCV), pp.108-115, 1999.
  15. K. Huang and H. Yan. On-line signature verification based on dynamic segmentation and global andlocal matching. Optical Engineering, 34(12):3480-3487, 1995. https://doi.org/10.1117/12.215474
  16. R. Martens and L. Claesen. On-line signature verification by dynamic time-warping. In Proc. 13th Int. Conf. Pattern Recognition, pages 38-42, 1996.
  17. http://www.g-school.co.kr/community/pollEnd.jsp?poll_code=2009030400001
  18. Sun-jin Kim, The present and prospect of Online Video, Music service and Media Usage, Journal of Digital Contents Society,. vol. 16, pp.137-144, 2015 https://doi.org/10.9728/dcs.2015.16.1.137