DOI QR코드

DOI QR Code

Vocabulary Retrieve System using Improve Levenshtein Distance algorithm

개선된 Levenshtein Distance 알고리즘을 사용한 어휘 탐색 시스템

  • Lee, Jong-Sub (Dept. of General Education, Semyung University) ;
  • Oh, Sang-Yeob (Dept. of Computer Media Convergence, College of IT, Gachon University)
  • 이종섭 (세명대학교 교양과정부) ;
  • 오상엽 (가천대학교 글로벌캠퍼스 IT대학 컴퓨터미디어융합학과)
  • Received : 2013.09.12
  • Accepted : 2013.11.20
  • Published : 2013.11.28

Abstract

In general, Levenshtein distance algorithm have a problem with not distinguish the consideration of vacabulary retrieve, because Levenshtein methode is used to vocabulary order are not defined. In this paper, we propose a improved Levenshtein methode, it effectively manage the vocabulary retrieve by frequency use of a vocabulary, and it gives the weight number which have a order between vocabularies. Therefore proposed methode have a advantage of solve the defect of perception rate in the case of increase the vocabulary, improve the recognition time become higher and it can be effectively retrieval space management.. System performance as a result of represent vocabulary dependence recognition rate of 97.81%, vocabulary independence recognition rate of 96.91% in indoor environment. Also, vocabulary dependence recognition rate of 91.11%, vocabulary independence recognition rate of 90.01% in outdoor environment.

기존의 Levenshtein distance 알고리즘은 어휘들 간의 순서가 정해져 있지 않은 경우에 사용되므로 어휘 탐색 작업의 중요도를 구분할 수 없는 단점을 가진다. 본 연구에서 제안하는 개선된 Levenshtein 방법에서는 효율적으로 사용빈도에 따라 어휘들을 탐색하고, 어휘들 간의 순서를 가지는 가중치를 부여한다. 따라서 어휘의 수가 증가하는 경우에도 효율적으로 사용빈도에 따라 어휘를 탐색하여 인식율이 저하되는 단점을 해결하고, 인식 시간을 향상 및 탐색 공간의 효율적으로 관리할 수 있는 장점을 가진다. 제안한 시스템을 분석한 결과 실내 환경에서 어휘 종속 인식률은 97.81%, 어휘 독립 인식률은 96.91%의 인식률을 나타났다. 또한, 실외 환경에서 어휘 종속 인식률은 91.11%, 어휘 독립 인식률은 90.01%의 인식률을 나타났다.

Keywords

References

  1. Chan-Shik Ahn, Sang-Yeob Oh. Efficient Continuous Vocabulary Clustering Modeling for Tying Model Recognition Performance Improvement. Journal of the Korea Society of Computer and Information. Vol. 15, No. 1, pp. 177-183, 2010. https://doi.org/10.9708/jksci.2010.15.1.177
  2. Chan-Shik Ahn, Sang-Yeob Oh. Echo Noise Robust HMM Learning Model using Average Estimator LMS Algorithm. The Journal of Digital Policy and Management. Vol. 10, No. 10, pp. 277-282, 2012.
  3. Kris Demuynck, Tom Laureys, Dirk van Compernolle, and Hugo van Hamme, FLaVor:a flexible architecture for LVCSR, In EUROSPEECH-2003, pp. 1973-1976, 2003.
  4. K. Demuynck, J. Duchateau, and D. Van Compernolle, A static lexicon network representation for cross-word context dependent phones, In Proc. EUROSPEECH, Vol.1, pp. 143-146, 1997.
  5. Chan-Shik Ahn, Sang-Yeob Oh. Gaussian Model Optimization using Configuration Thread Control In CHMM Vocabulary Recognition. The Journal of Digital Policy and Management. Vol. 10, No. 7, pp. 167-172, 2012.
  6. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000.
  7. S. M. Naqvi, M. Yu, J. A. Chamber. A Multimodal Approach to Blind Source Separation of Moving Sources. IEEE Trans. Signal Processing. Vol. 4, No. 5, pp. 895-910, 2010.
  8. Y. Shao, S. Srinivasan, Z. Jin, D. Wang. A Computational Auditory Scene Analysis System for Robust Speech Recognition. Computer Speech & Language. Vol. 24, No. 1, pp. 77-93, 2010.
  9. L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993.
  10. Eiichi Tanaka and Tamotsu Kasai, Synchronization and Substitution Error-correcting codes for the Levenshtein Metric, IEEE Trans. Information Theory, Vol.IT-22, No.2, pp. 156-176, 1976. https://doi.org/10.1109/TIT.1976.1055532
  11. S. Ortmanns, A. Eiden, H. Ney, and N. Coenen, Look-ahead Techniques for Fast Beam Search, InProc. IEEE ICASSP-1997, pp. 1783-1786, 1997.
  12. W. Daelemans, S. Buchholz, and J. Veenstra, Memorybased shallow parsing, in Proc. CoNLL, pp. 53-60, 1999.
  13. Justin Zobel and Philip Dart Phonetic String Matching: Lessons from Information Retrieval, SIGIR'96, pp. 166-173, 1996.