• Title/Summary/Keyword: recognition-rate

Search Result 2,809, Processing Time 0.032 seconds

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2E
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

A Time-Domain Parameter Extraction Method for Speech Recognition using the Local Peak-to-Peak Interval Information (국소 극대-극소점 간의 간격정보를 이용한 시간영역에서의 음성인식을 위한 파라미터 추출 방법)

  • 임재열;김형일;안수길
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.2
    • /
    • pp.28-34
    • /
    • 1994
  • In this paper, a new time-domain parameter extraction method for speech recognition is proposed. The suggested emthod is based on the fact that the local peak-to-peak interval, i.e., the interval between maxima and minima of speech waveform is closely related to the frequency component of the speech signal. The parameterization is achieved by a sort of filter bank technique in the time domain. To test the proposed parameter extraction emthod, an isolated word recognizer based on Vector Quantization and Hidden Markov Model was constructed. As a test material, 22 words spoken by ten males were used and the recognition rate of 92.9% was obtained. This result leads to the conclusion that the new parameter extraction method can be used for speech recognition system. Since the proposed method is processed in the time domain, the real-time parameter extraction can be implemented in the class of personal computer equipped onlu with an A/D converter without any DSP board.

  • PDF

Speech Feature Extraction Based on the Human Hearing Model

  • Chung, Kwang-Woo;Kim, Paul;Hong, Kwang-Seok
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.435-447
    • /
    • 1996
  • In this paper, we propose the method that extracts the speech feature using the hearing model through signal processing techniques. The proposed method includes the following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using the discrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digit recognition experiments were carried out using both the DTW and the VQ-HMM. The results showed that, in the case of using DTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in the case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potential for use as a simple and efficient feature for recognition task

  • PDF

Face Detection and Recognition Using Ellipsodal Information and Wavelet Packet Analysis (타원형 정보와 웨이블렛 패킷 분석을 이용한 얼굴 검출 및 인식)

  • 정명호;김은태;박민용
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2327-2330
    • /
    • 2003
  • This paper deals with face detection and recognition using ellipsodal information and wavelet packet analysis. We proposed two methods. First, Face detection method uses general ellipsodal information of human face contour and we find eye position on wavelet transformed face images A novel method for recognition of views of human faces under roughly constant illumination is presented. Second, The proposed Face recognition scheme is based on the analysis of a wavelet packet decomposition of the face images. Each face image is first located and then, described by a subset of band filtered images containing wavelet coefficients. From these wavelet coefficients, which characterize the face texture, the Euclidian distance can be used in order to classify the face feature vectors into person classes. Experimental results are presented using images from the FERET and the MIT FACES databases. The efficiency of the proposed approach is analyzed according to the FERET evaluation procedure and by comparing our results with those obtained using the well-known Eigenfaces method. The proposed system achieved an rate of 97%(MIT data), 95.8%(FERET databace)

  • PDF

Face Recognition using wavelet transform and PCA/LDA (웨이브릿 변환과 PCA/LDA를 이용한 얼굴 인식)

  • 송영준;김영길;문성원;권혁봉
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.05a
    • /
    • pp.392-395
    • /
    • 2004
  • It was recently focus to face recognition at a security system according to development of computer. The face recognition has method using geometrical feature and one using statistical feature. The proposed method uses k level LL, LH, HL, HH subband images adopting wavelet transform. And, we adopt PCA/LDA to subband images. As a result of simulation, recognition rate of subband images using wavelet transform is more high than one of full size image.

  • PDF

Development of Infants Music Education Application Using Augmented Reality

  • Yeon, Seunguk;Seo, Sukyong
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.1
    • /
    • pp.69-76
    • /
    • 2018
  • Augmented Reality (AR) technology has rapidly been applied to various application areas including e-learning and e-education. Focusing on the design and development of android tablet application, this study targeted to develop infant music education using AR technology. We used a tablet instead of personal computer because it is more easily accessible and more convenient. Our system allows infant users to play with teaching aids like blocks or puzzles to mimic musical play like game. The user sets the puzzle piece on the playground in front of the tablet and presses the play button. Then, the system extracts a region of interest among the images acquired by internal camera and separates the foreground image from the background image. The block recognition software analyzes, recognizes and shows the result using AR technology. In order to have reasonably working recognition ratio, we did experiments with more than 5,000 frames of actual playing scenarios. We found that the recognition rate can be secured up to 95%, when the threshold values are selected well using various condition parameters.

A Tow-stage Recognition Approach Based on Error Pattern Hypotheses for Connected Digit Recognition

  • Oh, Wook-Kwon;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.31-36
    • /
    • 1996
  • In this paper, a two-stage recognition approach based on error pattern hypotheses is proposed to reduce errors of a connected digit recognizer. In the approach, a conventional recognizer is first used to produce N-best candidate strings, and then error patterns are hypothesized by examining the candidate strings. For substitution error pattern hypotheses, error-pattern-dependent classifiers having more discriminative power than the first-stage classifier are used ; and for insertion and deletion errors, word duration and energy contour information are exploited are exploited to discriminated confusing pairs. Simulation results showed that the proposed approach achieves 15% decrease in word error rate for speaker-independent Korean connected digit recognition when a hidden Markov model-based recognizer is used for the first-stage classifier.

  • PDF

A Korean Flight Reservation System Using Continuous Speech Recognition

  • Choi, Jong-Ryong;Kim, Bum-Koog;Chung, Hyun-Yeol;Nakagawa, Seiichi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.60-65
    • /
    • 1996
  • This paper describes on the Korean continuous speech recognition system for flight reservation. It adopts a frame-synchronous One-Pass DP search algorithm driven by syntactic constraints of context free grammar(CFG). For recognition, 48 phoneme-like units(PLU) were defined and used as basic units for acoustic modeling of Korean. This modeling was conducted using a HMM technique, where each model has 4-states 3-continuous output probability distributions and 3-discrete-duration distributions. Language modeling by CFG was also applied to the task domain of flight reservation, which consisted of 346 words and 422 rewriting rules. In the tests, the sentence recognition rate of 62.6% was obtained after speaker adaptation.

  • PDF

Recognition of Handwritten Numerals using Eigenvectors (고유벡터를 이용한 필기체 숫자인식)

  • 박중조;김경민;송명현
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.6
    • /
    • pp.986-991
    • /
    • 2002
  • This paper presents off-line handwritten numeral recognition method by using Eigen-Vectors. In this method, numeral features are extracted statistically by using Eigen-Vectors through KL transform and input numeral is recognized in the feature space by the nearest-neighbor classifier. In our feature extraction method, basis vectors which express best the property of each numeral type within the extensive database of sample numeral images are calculated, and the numeral features are obtained by using this basis vectors. Through the experiments with the unconstrained handwritten numeral database of Concordia University, we have achieved a recognition rate of 96.2%.

Recognition of Noise Quantity by Neural Network using Linear Predictive Coefficient (선형예측계수를 사용한 신경회로망에 의한 잡음량의 인식)

  • Choi, Jae-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.379-382
    • /
    • 2008
  • In order to reduce the noise quantity in a conversation under the noisy environment, it is necessary for the signal processing system to process adaptively according to the noise quantity in order to enhance the performance. There fore this paper presents a recognition method for noise quantity by linear predictive coefficient using a three layered neural network, which is trained using three kinds of speech that is degraded by various background noises. In the experiment, the average values of the recognition results were 97.6% or more for various noises using Aurora2 database.

  • PDF