• Title/Summary/Keyword: Speech recognition model

Search Result 623, Processing Time 0.039 seconds

GMM-Based Maghreb Dialect Identification System

  • Nour-Eddine, Lachachi;Abdelkader, Adla
    • Journal of Information Processing Systems
    • /
    • v.11 no.1
    • /
    • pp.22-38
    • /
    • 2015
  • While Modern Standard Arabic is the formal spoken and written language of the Arab world; dialects are the major communication mode for everyday life. Therefore, identifying a speaker's dialect is critical in the Arabic-speaking world for speech processing tasks, such as automatic speech recognition or identification. In this paper, we examine two approaches that reduce the Universal Background Model (UBM) in the automatic dialect identification system across the five following Arabic Maghreb dialects: Moroccan, Tunisian, and 3 dialects of the western (Oranian), central (Algiersian), and eastern (Constantinian) regions of Algeria. We applied our approaches to the Maghreb dialect detection domain that contains a collection of 10-second utterances and we compared the performance precision gained against the dialect samples from a baseline GMM-UBM system and the ones from our own improved GMM-UBM system that uses a Reduced UBM algorithm. Our experiments show that our approaches significantly improve identification performance over purely acoustic features with an identification rate of 80.49%.

LFMMI-based acoustic modeling by using external knowledge (External knowledge를 사용한 LFMMI 기반 음향 모델링)

  • Park, Hosung;Kang, Yoseb;Lim, Minkyu;Lee, Donghyun;Oh, Junseok;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.607-613
    • /
    • 2019
  • This paper proposes LF-MMI (Lattice Free Maximum Mutual Information)-based acoustic modeling using external knowledge for speech recognition. Note that an external knowledge refers to text data other than training data used in acoustic model. LF-MMI, objective function for optimization of training DNN (Deep Neural Network), has high performances in discriminative training. In LF-MMI, a phoneme probability as prior probability is used for predicting posterior probability of the DNN-based acoustic model. We propose using external knowledges for training the prior probability model to improve acoustic model based on DNN. It is measured to relative improvement 14 % as compared with the conventional LF-MMI-based model.

Design of Intelligent Emotion Recognition Model (지능형 감정인식 모델설계)

  • 김이곤;김서영;하종필
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.46-50
    • /
    • 2001
  • Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed in the speech, the gesture, the physiological phenomena (the breath, the beating of the pulse, etc). In this paper, the method to have cognizance of emotion from anyone's voice signals is presented and simulated by using neuro-fuzzy model.

  • PDF

Phoneme-based Recognition of Korean Speech Using HMM(Hidden Markov Model) and Genetic Algorithm (HMM과 GA를 이용한 한국어 음성의 음소단위 인식)

  • 박준하;조성원
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1997.10a
    • /
    • pp.291-295
    • /
    • 1997
  • 현재에 주로 개발되어 상용화가 시작되고 있는 음성인식 시스템의 대부분은 단어인식을 기분으로 하는 시스템으로 적용 단어수를 늘려줌으로서 인식범위를 늘일 수 있으나, 그에 따라 검색해야하는 단어수가 늘어남으로서 전체적인 시스템의 속도 및 성능이 저하되는 경향이 있다. 이러한 단점의 극복을 위하여 본 논문에서는 HMM(Hidden Markov Model)과 GA(Genetic Algorithm)를 이용한 한국어 음성의 음소단위 인식 시스템을 구현하였다. 음성 특징으로는 LPC Cepstrum 계수를 사용하였으며, 인식시는 인식대상이 되는 단어에 대하여 GA(Genetic Algorithm)을 통하여 각 음소를 분리하고, 음소단위로 학습된 HMM 파라미터를 적용하여 인식함으로써 각각의 음소별 가능하도록 하는 방법을 제안하였다.

  • PDF

A New Speaker Adaptation Technique using Maximum Model Distance

  • Tahk, Min-Jea
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.154.2-154
    • /
    • 2001
  • This paper presented a adaptation approach based on maximum model distance (MMD) method. This method shares the same framework as they are used for training speech recognizers with abundant training data. The MMD method could adapt to all the models with or without adaptation data. If large amount of adaptation data is available, these methods could gradually approximate the speaker-dependent ones. The approach is evaluated through the phoneme recognition task on the TIMIT corpus. On the speaker adaptation experiments, up to 65.55% phoneme error reduction is achieved. The MMD could reduce phoneme error by 16.91% even when ...

  • PDF

A New Speaker Adaptation Technique using Maximum Model Distance

  • Lee, Man-Hyung;Hong, Suh-Il
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.99.1-99
    • /
    • 2001
  • This paper presented an adaptation approach based on maximum model distance (MMD) method. This method shares the same framework as they are used for training speech recognizers with abundant training data. The MMD method could adapt to all the models with or without adaptation data. If large amount of adaptation data is available, these methods could gradually approximate the speaker-dependent ones. The approach is evaluated through the phoneme recognition task on the TIMIT corpus. On the speaker adaptation experiments, up to 65.55% phoneme error reduction is achieved. The MMD could reduce phoneme error by 16.91% even when only one adaptation utterance is used.

  • PDF

Speech Recognition in Noisy environment using Transition Constrained HMM (천이 제한 HMM을 이용한 잡음 환경에서의 음성 인식)

  • Kim, Weon-Goo;Shin, Won-Ho;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.85-89
    • /
    • 1996
  • In this paper, transition constrained Hidden Markov Model(HMM) in which the transition between states occur only within prescribed time slot is proposed and the performance is evaluated in the noisy environment. The transition constrained HMM can explicitly limit the state durations and accurately de scribe the temporal structure of speech signal simply and efficiently. The transition constrained HMM is not only superior to the conventional HMM but also require much less computation time. In order to evaluate the performance of the transition constrained HMM, speaker independent isolated word recognition experiments were conducted using semi-continuous HMM with the noisy speech for 20, 10, 0 dB SNR. Experiment results show that the proposed method is robust to the environmental noise. The 81.08% and 75.36% word recognition rates for conventional HMM was increased by 7.31% and 10.35%, respectively, by using transition constrained HMM when two kinds of noises are added with 10dB SNR.

  • PDF

Speaker Recognition Performance Improvement by Voiced/Unvoiced Classification and Heterogeneous Feature Combination (유/무성음 구분 및 이종적 특징 파라미터 결합을 이용한 화자인식 성능 개선)

  • Kang, Jihoon;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1294-1301
    • /
    • 2014
  • In this paper, separate probabilistic distribution models for voiced and unvoiced speech are estimated and utilized to improve speaker recognition performance. Also, in addition to the conventional mel-frequency cepstral coefficient, skewness, kurtosis, and harmonic-to-noise ratio are extracted and used for voiced speech intervals. Two kinds of scores for voiced and unvoiced speech are linearly fused with the optimal weight found by exhaustive search. The performance of the proposed speaker recognizer is compared with that of the conventional recognizer which uses mel-frequency cepstral coefficient and a unified probabilistic distribution function based on the Gassian mixture model. Experimental results show that the lower the number of Gaussian mixture, the greater the performance improvement by the proposed algorithm.

Performance Improvement in Speech Recognition by Weighting HMM Likelihood (은닉 마코프 모델 확률 보정을 이용한 음성 인식 성능 향상)

  • 권태희;고한석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.145-152
    • /
    • 2003
  • In this paper, assuming that the score of speech utterance is the product of HMM log likelihood and HMM weight, we propose a new method that HMM weights are adapted iteratively like the general MCE training. The proposed method adjusts HMM weights for better performance using delta coefficient defined in terms of misclassification measure. Therefore, the parameter estimation and the Viterbi algorithms of conventional 1:.um can be easily applied to the proposed model by constraining the sum of HMM weights to the number of HMMs in an HMM set. Comparing with the general segmental MCE training approach, computing time decreases by reducing the number of parameters to estimate and avoiding gradient calculation through the optimal state sequence. To evaluate the performance of HMM-based speech recognizer by weighting HMM likelihood, we perform Korean isolated digit recognition experiments. The experimental results show better performance than the MCE algorithm with state weighting.

Multi-layer Speech Processing System for Point-Of-Interest Recognition in the Car Navigation System (차량용 항법장치에서의 관심지 인식을 위한 다단계 음성 처리 시스템)

  • Bhang, Ki-Duck;Kang, Chul-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.1
    • /
    • pp.16-25
    • /
    • 2009
  • In the car environment that the first priority is a safety problem, the large vocabulary isolated word recognition system with POI domain is required as the optimal HMI technique. For the telematics terminal with a highly limited processing time and memory capacity, it is impossible to process more than 100,000 words in the terminal by the general speech recognition methods. Therefore, we proposed phoneme recognizer using the phonetic GMM and also PDM Levenshtein distance with multi-layer architecture for the POI recognition of telematics terminal. By the proposed methods, we obtained high performance in the telematics terminal with low speed processing and small memory capacity. we obtained the recognition rate of maximum 94.8% in indoor environment and of maximum 92.4% in the car navigation environments.

  • PDF