DOI QR코드

DOI QR Code

Korean Sentence Symbol Preprocess System for the Improvement of Speech Synthesis Quality

음성 합성 시스템의 품질 향상을 위한 한국어 문장 기호 전처리 시스템

  • Lee, Ho-Joon (Department of Smart IT, Youngdong University)
  • 이호준 (영동대학교 스마트IT학과)
  • Received : 2015.01.20
  • Accepted : 2015.02.11
  • Published : 2015.02.28

Abstract

In this paper, we propose a Korean sentence symbol preprocessor for a SSML (speech synthesis markup language) supported speech synthesis system in order to improve the quality of the synthesized result. After the analysis of Korean Wikipedia documents, we propose 8 categories for the meaning of sentence symbols and 11 regular expression for the classification of each category. After the development of a Korean sentence symbol preprocess system we archived 56% of precision and 71.45% of recall ratio for 63,000 sentences.

본 논문에서는 한국어 문장 기호의 처리를 통해 자연스러운 음성 합성 결과를 생성하는 방법에 대해서 논의한다. 이를 위해 한국어 위키피디아 문서를 분석하여 문장 기호의 사용을 8가지 형태로 분류하고, 11개의 정규표현식 규칙으로 문장 기호를 처리하는 방안을 제시한다. 그 결과 63,000 문장에 대해 56%의 정확도와 71.45%의 재현율을 달성하였으며, 문장 기호 처리 결과를 SSML 기반의 음성 합성 표현으로 변환하여 음성 합성 결과의 품질을 향상시키는 방법을 제안한다.

Keywords

References

  1. Revised Guidelines on Korean Orthography, Ministry of Culture, Sports and Tourism, 2014.
  2. Jin-Hyung Kim, So-Young Park, "Rule-based Speech Recognition Error Correction for Mobile Environment," Journal of the Korea Society of Computer and Information, vol. 17, no. 10, pp. 25-33, October 2012. https://doi.org/10.9708/jksci/2012.17.10.025
  3. Gyeongyong Heo, Woo-Young Jang, Jun-Pyo Park, "Digital Doorlock with Voice Recognition," Proceedings of the Korean Society of Computer Information Conference, pp. 269-270, July 2012.
  4. Seong Jin Cho, Seongho Lee, Sungyoung Lee, "Design of Emotion Recognition system utilizing fusion of Speech and Context based emotion recognition in Smartphone," Proceedings of the Korean Society of Computer Information Conference, pp. 323-324, July 2012.
  5. Kee-Beak Kim, Jong-Ho Choi, "Contents Navigation System using Speech Recognition," KSCI Review, vol. 15, no. 1, pp. 99-102, June 2007.
  6. Myung-Hun Kim, Chi-Geun Lee, In-Mi So, Sung-Tae Jung, "Design and Implementation of a Bimodal User Recognition System using Face and Audio," Journal of the Korea Society of Computer and Information, vol. 10, no. 5, pp. 353-362, November, 2005.
  7. Jin-Koo Ji, Sung-Il Yun, "Design and Implementation of Speaker Verification System Using Voice," Journal of the Korea Society of Computer and Information, vol. 5, no. 3, pp. 91-98, September 2000.
  8. Sangho Lee, Yung-Hwan Oh, "Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems," Speech Communication, vol. 28, no. 4, pp. 283-300, 1999. https://doi.org/10.1016/S0167-6393(99)00014-X
  9. Sang-Min Lee, Ho-Joon Lee, "How to Express Emotion: Role of Prosody and Voice Quality Parameters," Journal of the Korea Society of Computer and Information, vol. 19, no. 11, pp. 159-166, November, 2014. https://doi.org/10.9708/jksci.2014.19.11.159
  10. Gi-Jeong Lim, Jung-Chul Lee, "Improvement of Naturalness for a HMM-based Korean TTS using the prosodic boundary information," Journal of the Korea Society of Computer and Information, vol. 17, no. 9, pp. 75-84, September 2012. https://doi.org/10.9708/jksci/2012.17.9.075