Speech Recognition and Its Learning by Neural Networks

신경회로망을 이용한 음성인식과 그 학습

  • Published : 1991.04.01

Abstract

A speech recognition system based on a neural network, which can be used for telephon number services was tested. Because in Korea two different cardinal number systems, a koreanic one and a sinokoreanic one, are in use, it is necessary that the used systems is able to recognize 22 discret words. The structure of the neural network used had two layers, also a structure with 3 layers, one hidden layreformed of each 11, 22 and 44 hidden units was tested. During the learning phase of the system the so called BP-algorithm (back propagation) was applied. The process of learning can e influenced by using a different learning factor and also by the method of learning(for instance random or cycle). The optimal rate of speaker independent recognition by using a 2 layer neural network was 96%. A drop of recognition was observed by overtraining. This phenomen appeared more clearly if a 3 layer neural network was used. These phenomens are described in this paper in more detail. Especially the influence of the construction of the neural network and the several states during the learning phase are examined.

본 논문에서는 전화번호 서비스시 사용되고 있는 영(zero)에서 일까지의 2종류의 숫자음(한글발음의 셈수와 한자발음의 읽음수) 22개에 대하여 신경회로망을 이용한 음성인식 실험의 결과와 학습과정에서 나타난 제 현상에 관해 논하였다. 신경회로망은 입력단과 출력단만을 갖는 2단구조와 한 개의 은익단을 갖는 3단구조의 회로망으로 은익단의 뉴론(Neuron) 수를 11, 12 및 44개로 가변해 가면서 BP(Back-Propagation) 알고리즘에 의하여 학습하였고 학습과정에서는 학습팩터(Learning factor), 학습방법(예로써 Random or Cycle), 모멘텀(Momentum)등을 조정해 가면서 최적의 학습과정을 찾고자 하였다. 실험결과 2단구조에 의한 화자독립의 경우 최고 96%의 인식율을 나타냈고 학습과정이 너무 많을 경우 오히려 인식율이 낮아졌으며 이 현상은 3단구조의 회로망에서 더욱 두드러지게 나타났다.

Keywords