• Title/Summary/Keyword: connected digit recognition

Search Result 48, Processing Time 0.024 seconds

Comparison of the recognition performance of Korean connected digit telephone speech depending on channel compensation methods and feature parameters (채널보상기법 및 특징파라미터에 따른 한국어 연속숫자음 전화음성의 인식성능 비교)

  • Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kim Sang Hun
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.201-204
    • /
    • 2002
  • As a preliminary study for improving recognition performance of the connected digit telephone speech, we investigate feature parameters as well as channel compensation methods of telephone speech. The CMN and RTCN are examined for telephone channel compensation, and the MFCC, DWFBA, SSC and their delta-features are examined as feature parameters. Recognition experiments with database we collected show that in feature level DWFBA is better than MFCC and for channel compensation RTCN is better than CMN. The DWFBA+Delta_ Mel-SSC feature shows the highest recognition rate.

  • PDF

The Optimal and Complete Prompts Lists Generation Algorithm for Connected Spoken Word Speech Corpus (연결 단어 음성 인식기 학습용 음성DB 녹음을 위한 최적의 대본 작성 알고리즘)

  • 유하진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.187-191
    • /
    • 2004
  • This paper describes an efficient algorithm to generate compact and complete prompts lists for connected spoken words speech corpus. In building a connected spoken digit recognizer, we have to acquire speech data in various contexts. However, in many speech databases the lists are made by using random generators. We provide an efficient algorithm that can generate compact and complete lists of digits in various contexts. This paper includes the proof of optimality and completeness of the algorithm.

7-Segment Optical Character Recognition Using Template Matching (템플릿 매칭을 이용한 7-세그먼트 광학 문자 인식)

  • Jung, Min Chul
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.4
    • /
    • pp.130-134
    • /
    • 2020
  • This paper proposes a new method for the digit recognition on a 7-segment display. The proposed method uses morphological processing that dilates segments of digits and connects them into strokes. The digits are extracted by connected component analysis and finally, template matching method recognizes the extracted digits. The proposed method is implemented using C language in Raspberry Pi 4 system with a camera module for a real-time image processing. Experiments were conducted by using various 7-segment LED displays and 7-segment mono LCD displays. The results show that the proposed method is successful for the digit recognition on the 7-segment displays.

A Study on Korean 4-connected Digit Recognition Using Demi-syllable Context-dependent Models (반음절 문맥종속 모델을 이용한 한국어 4 연숫자음 인식에 관한 연구)

  • 이기영;최성호;이호영;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.175-181
    • /
    • 2003
  • Because a word of Korean digits is a syllable and deeply coarticulatied in connected digits, some recognition models based on demisyllables have been proposed by researchers. However, they could not show an excellent recognition results yet. This paper proposes a recognition model based on extended and context-dependent demisyllables, such as a tri-demisyllable like a tri-phone, for the Korean 4-connected digits recognition. For experiments, we use a toolkit of HTK 3.0 for building this model of continuous HMMs using training Korean connected digits from SiTEC database and for recognizing unknown ones. The results show that the recognition rate is 92% and this model has an ability to improve the recognition performance of Korean connected digits.

Recognition experiment of Korean connected digit telephone speech using the temporal filter based on training speech data (훈련데이터 기반의 temporal filter를 적용한 한국어 4연숫자 전화음성의 인식실험)

  • Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kang Jeom Ja
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.149-152
    • /
    • 2003
  • In this paper, data-driven temporal filter methods[1] are investigated for robust feature extraction. A principal component analysis technique is applied to the time trajectories of feature sequences of training speech data to get appropriate temporal filters. We did recognition experiments on the Korean connected digit telephone speech database released by SITEC, with data-driven temporal filters. Experimental results are discussed with our findings.

  • PDF

A study on the recognition performance of connected digit telephone speech for MFCC feature parameters obtained from the filter bank adapted to training speech database (훈련음성 데이터에 적응시킨 필터뱅크 기반의 MFCC 특징파라미터를 이용한 전화음성 연속숫자음의 인식성능 향상에 관한 연구)

  • Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kang Jeom Ja
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.119-122
    • /
    • 2003
  • In general, triangular shape filters are used in the filter bank when we get the MFCCs from the spectrum of speech signal. In [1], a new feature extraction approach is proposed, which uses specific filter shapes in the filter bank that are obtained from the spectrum of training speech data. In this approach, principal component analysis technique is applied to the spectrum of the training data to get the filter coefficients. In this paper, we carry out speech recognition experiments, using the new approach given in [1], for a large amount of telephone speech data, that is, the telephone speech database of Korean connected digit released by SITEC. Experimental results are discussed with our findings.

  • PDF

A study on Effective Feature Parameters Comparison for Speaker Recognition (화자인식에 효과적인 특징벡터에 관한 비교연구)

  • Park TaeSun;Kim Sang-Jin;Kwang Moon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.145-148
    • /
    • 2003
  • In this paper, we carried out comparative study about various feature parameters for the effective speaker recognition such as LPC, LPCC, MFCC, Log Area Ratio, Reflection Coefficients, Inverse Sine, and Delta Parameter. We also adopted cepstral liftering and cepstral mean subtraction methods to check their usefulness. Our recognition system is HMM based one with 4 connected-Korean-digit speech database. Various experimental results will help to select the most effective parameter for speaker recognition.

  • PDF

A study on the connected-digit recognition using MLP-VQ and Weighted DHMM (MLP-VQ와 가중 DHMM을 이용한 연결 숫자음 인식에 관한 연구)

  • Chung, Kwang-Woo;Hong, Kwang-Seok
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.8
    • /
    • pp.96-105
    • /
    • 1998
  • The aim of this paper is to propose the method of WDHMM(Weighted DHMM), using the MLP-VQ for the improvement of speaker-independent connect-digit recognition system. MLP neural-network output distribution shows a probability distribution that presents the degree of similarity between each pattern by the non-linear mapping among the input patterns and learning patterns. MLP-VQ is proposed in this paper. It generates codewords by using the output node index which can reach the highest level within MLP neural-network output distribution. Different from the old VQ, the true characteristics of this new MLP-VQ lie in that the degree of similarity between present input patterns and each learned class pattern could be reflected for the recognition model. WDHMM is also proposed. It can use the MLP neural-network output distribution as the way of weighing the symbol generation probability of DHMMs. This newly-suggested method could shorten the time of HMM parameter estimation and recognition. The reason is that it is not necessary to regard symbol generation probability as multi-dimensional normal distribution, as opposed to the old SCHMM. This could also improve the recognition ability by 14.7% higher than DHMM, owing to the increase of small caculation amount. Because it can reflect phone class relations to the recognition model. The result of my research shows that speaker-independent connected-digit recognition, using MLP-VQ and WDHMM, is 84.22%.

  • PDF

ON IMPROVING THE PERFORMANCE OF CODED SPECTRAL PARAMETERS FOR SPEECH RECOGNITION

  • Choi, Seung-Ho;Kim, Hong-Kook;Lee, Hwang-Soo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.250-253
    • /
    • 1998
  • In digital communicatioin networks, speech recognition systems conventionally reconstruct speech followed by extracting feature [parameters. In this paper, we consider a useful approach by incorporating speech coding parameters into the speech recognizer. Most speech coders employed in the networks represent line spectral pairs as spectral parameters. In order to improve the recognition performance of the LSP-based speech recognizer, we introduce two different ways: one is to devise weighed distance measures of LSPs and the other is to transform LSPs into a new feature set, named a pseudo-cepstrum. Experiments on speaker-independent connected-digit recognition showed that the weighted distance measures significantly improved the recognition accuracy than the unweighted one of LSPs. Especially we could obtain more improved performance by using PCEP. Compared to the conventional methods employing mel-frequency cepstral coefficients, the proposed methods achieved higher performance in recognition accuracies.

  • PDF

A study of broad board classification of korean digits using symbol processing (심볼을 이용한 한국어 숫자음의 광역 음소군 분류에 관한 연구)

  • Lee, Bong-Gu;Lee, Guk;Hhwang, Hee-Yoong
    • Proceedings of the KIEE Conference
    • /
    • 1989.07a
    • /
    • pp.481-485
    • /
    • 1989
  • The object of this parer is on the design of an broad board classifier for connected. Korean digit. Many approaches have been applied in speech recognition systems: parametric vector quantization, dynamic programming and hiden Markov model. In the 80's the neural network method, which is expected to solve complex speech recognition problems, came bach. We have chosen the rule based system for our model. The phoneme-groups that we wish to classify are vowel_like, plosive_like fricative_like, and stop_like.The data used are 1380 connected digits spoken by three untrained male speakers. We have seen 91.5% classification rate.

  • PDF