DOI QR코드

DOI QR Code

뇌파측정기술을 활용한 언어 기반 사운드 요약의 생성 방안 연구

Towards the Generation of Language-based Sound Summaries Using Electroencephalogram Measurements

  • 투고 : 2019.08.19
  • 심사 : 2019.09.24
  • 발행 : 2019.09.30

초록

본 연구는 시청자가 사운드 자료의 주제를 파악하는 과정과 사운드의 특성을 이해하기 위한 인지적 정보처리 모형을 구성하였다. 이후 사건관련유발전위(event related potentials, ERP)의 두뇌의 전후측에 걸쳐서 발현하는 N400, P600 구성요소들을 인지적 정보처리 모형의 언어적 표상에 접목시켜 사운드 요약을 생성하는 방안을 제안하기 위해서 연구 가설들을 수립하였다. 뇌파 실험을 통해서 연구 가설들을 검증한 결과, P600이 사운드 요약의 핵심 구성 요소로 나타났다. 본 연구 결과는 분류 알고리즘 설계에 적용되어 내용 기반 메타데이터 즉, 일반적인 또는 개인화된 미디어 요약(사운드 요약, 비디오 스킴)을 생성하는 데에 활용될 수 있을 것이다.

This study constructed a cognitive model of information processing to understand the topic of a sound material and its characteristics. It then proposed methods to generate sound summaries, by incorporating anterior-posterior N400/P600 components of event-related potential (ERP) response, into the language representation of the cognitive model of information processing. For this end, research hypotheses were established and verified them through ERP experiments, finding that P600 is crucial in screening topic-relevant shots from topic-irrelevant shots. The results of this study can be applied to the design of classification algorithm, which can then be used to generate the content-based metadata, such as generic or personalized sound summaries and video skims.

키워드

과제정보

연구 과제 주관 기관 : 한국연구재단

참고문헌

  1. Alwehaibi, H. (2015). The impact of using youtube in EFL classroom on enhancing EFL students' content learning. Journal of College Teaching and Learning, 12(2), 121-126. https://doi.org/10.19030/tlc.v12i2.9182
  2. Baddeley, A. (2007). Working memory, thought, and action. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198528012.001.0001
  3. Burmester, J., Spalek, K., & Wartenburger, I. (2014). Context updating during sentence comprehension: The effect of aboutness topic. Brain and Language, 137, 62-76. https://doi.org/10.1016/j.bandl.2014.08.001
  4. Buzzetto-More, N. (2015). Student attitudes towards the integration of youtube in online, hybrid, and web-assisted courses: An examination of the impact of course modality on perception. MERLOT Journal of Online Learning and Teaching, 11, 55-73.
  5. DeLong, K. A., Quante, L., & Kutas, M. (2014). Predictability, plausibility, and two late ERP positivities during written sentence comprehension. Neuropsychologia, 61, 150-162. https://doi.org/10.1016/j.neuropsychologia.2014.06.016
  6. Evans, W. J., Cui, L., & Starr, A. (1995). Olfactory event-related potentials in normal human subjects: Effects of age and gender. Electroencephalography and Clinical Neurophysiology, 95(4), 293-301. https://doi.org/10.1016/0013-4694(95)00055-4
  7. Geyer, A., Holcomb, P., Kuperberg, G., & Perlmutter, N. (2006). Plausibility and sentence comprehension. An ERP study. Cognitive Neuroscience Supplement, Abstract, 1-1.
  8. Hakoda, Y. (2010). Cognitive psychology: Brain, modeling and evidence. 강윤봉 (역). (2014). 인지심리학. 서울: 한국뇌기반교육연구소.
  9. Hu, W., Xie, N., Li, L., Zeng, X., & Maybank, S. (2011). A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 41(6), 797-819. https://doi.org/10.1109/TSMCC.2011.2109710
  10. Kim, A., & Osterhout, L. (2005). The independence of combinatory semantic processing: Evidence from event-related potentials. Journal of Memory and Language, 52(2), 205-225. https://doi.org/10.1016/j.jml.2004.10.002
  11. Kim, H. H., & Kim, Y. H. (2016). Generic speech summarization of transcribed lecture videos: Using tags and their semantic relations. Journal of the Association for Information Science and Technology, 67(2), 366-379. https://doi.org/10.1002/asi.23391
  12. Kim, H. H., & Kim, Y. H. (2019a). Video summarization using event-related potential responses to shot boundaries in real-time video watching. Journal of the Association for Information Science and Technology, 70(2), 164-175. http://doi.org/10.1002/asi.24103
  13. Kim, H. H., & Kim, Y. H. (2019b). ERP/MMR algorithm for classifying topic-relevant and topic-irrelevant visual shots of documentary videos. Journal of the Association for Information Science and Technology, 70(9), 931-941. https://doi.org/10.1002/asi.24179
  14. Kutas, M., & Federmeier, K. D. (2011). Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology, 62, 621-647. https://doi.org/10.1146/annurev.psych.093008.131123
  15. Luck, S. J. (2014). An introduction to the event-related potential technique. Cambridge, MA: MIT Press.
  16. Maskey, S., & Hirschberg, J. (2006). Summarizing speech without text using hidden markov models. In Proceedings of the Human Language Technology Conference of the NAACL (Companion Volume: Short Papers, pp. 89-92). Association for Computational Linguistics, Stroudsburg, PA, USA. https://doi.org/10.3115/1614049.1614072
  17. Martin, D. (2018). YouTube: The ultimate 2018 guide to grow your youtube channel, make money fast with proven techniques and foolproof step by step strategies. Cambridge: CreateSpace Independent Publishing Platform.
  18. Mayer, R. E. (2005). Cognitive theory of multimedia learning. The Cambridge handbook of multimedia learning (pp. 134-146). New York: Cambridge University Press.
  19. Moon, J., Kwon, Y., Park, J., & Yoon, W. C. (2019). Detecting user attention to video segments using interval EEG features. Expert Systems with Applications, 115, 578-592. https://doi.org/10.1016/j.eswa.2018.08.016
  20. Nakano, H., Rosario, M. A. M., Oshima-Takane, Y., Pierce, L., & Tate, S. G. (2014). Electrophysiological response to omitted stimulus in sentence processing. Neuroreport, 25(14), 1169-1174. https://doi.org/10.1097/WNR.0000000000000250
  21. Nieuwland, M. S., & Martin, A. E. (2012). If the real world were irrelevant, so to speak: The role of propositional truth-value in counterfactual sentence comprehension. Cognition, 122(1), 102-109. https://doi.org/10.1016/j.cognition.2011.09.001
  22. van Berkum, J. J., Hagoort, P., & Brown, C. M. (1999). Semantic integration in sentences and discourse: Evidence from the N400. Journal of Cognitive Neuroscience, 11(6), 657-671. https://doi.org/10.1162/089892999563724
  23. Wang, L., & Schumacher, P. B. (2013). New is not always costly: Evidence from online processing of topic and contrast in Japanese. Frontiers in Psychology, 4, 363. https://doi.org/10.3389/fpsyg.2013.00363
  24. Wilson, S. M., Bautista, A., & McCarron, A. (2018). Convergence of spoken and written language processing in the superior temporal sulcus. NeuroImage, 171, 62-74. https://doi.org/10.1016/j.neuroimage.2017.12.068
  25. Zhang, Z., & Fung, P. (2012). Active learning with semi-automatic annotation for extractive speech summarization. ACM Transactions on Speech and Language Processing, 8(4), 1-25. https://doi.org/10.1145/2093153.2093155