A STUDY ON THE IMPLEMENTATION OF ARTIFICIAL NEURAL NET MODELS WITH FEATURE SET INPUT FOR RECOGNITION OF KOREAN PLOSIVE CONSONANTS

한국어 파열음 인식을 위한 피쳐 셉 입력 인공 신경망 모델에 관한 연구

  • Kim, Ki-Seok (Dept. of Computer Engineering, Seoul National University) ;
  • Kim, In-Bum (Dept. of Computer Engineering, Seoul National University) ;
  • Hwang, Hee-Yeung (Dept. of Computer Engineering, Seoul National University)
  • 김기석 (서울대학교 컴퓨터공학과) ;
  • 김인범 (서울대학교 컴퓨터공학과) ;
  • 황희융 (서울대학교 컴퓨터공학과)
  • Published : 1990.07.05

Abstract

The main problem in speech recognition is the enormous variability in acoustic signals due to complex but predictable contextual effects. Especially in plosive consonants it is very difficult to find invariant cue due to various contextual effects, but humans use these contextual effects as helpful information in plosive consonant recognition. In this paper we experimented on three artificial neural net models for the recognition of plosive consonants. Neural Net Model I used "Multi-layer Perceptron ". Model II used a variation of the "Self-organizing Feature Map Model". And Model III used "Interactive and Competitive Model" to experiment contextual effects. The recognition experiment was performed on 9 Korean plosive consonants. We used VCV speech chains for the experiment on contextual effects. The speech chain consists of Korean plosive consonants /g, d, b, K, T, P, k, t, p/ (/ㄱ, ㄷ, ㅂ, ㄲ, ㄸ, ㅃ, ㅋ, ㅌ, ㅍ/) and eight Korean monothongs. The inputs to Neural Net Models were several temporal cues - duration of the silence, transition and vot -, and the extent of the VC formant transitions to the presence of voicing energy during closure, burst intensity, presence of asperation, amount of low frequency energy present at voicing onset, and CV formant transition extent from the acoustic signals. Model I showed about 55 - 67 %, Model II showed about 60%, and Model III showed about 67% recognition rate.

Keywords