• 제목/요약/키워드: Korean speech recognition

검색결과 1,115건 처리시간 0.025초

은닉 마코프 모델과 켑스트럴 계수들에 따른 한국어 속삭임의 인식 비교 (Comparison of HMM models and various cepstral coefficients for Korean whispered speech recognition)

  • 박찬응
    • 전자공학회논문지 IE
    • /
    • 제43권2호
    • /
    • pp.22-29
    • /
    • 2006
  • 본 논문에서는 모바일 환경에 따른 속삭임의 사용이 증가하는 데 따른 속삭임 인식을 위하여 음성인식에 많이 사용되고 있는 특징벡터들을 은닉 마코프 모델을 이용, 정상어 모델, 속삭임 모델, 정상어, 속삭임 통합 모델들에 인식 시험하고 결과를 분석하여 가장 적합한 인식 시스템을 찾으려고 하였다. 인식 시험을 통하여 속삭임의 인식은 정상어 모델로 인식하는 시스템은 낮은 인식률로 실용성이 없으며 속삭임 모델을 별도로 사용하는 것이 85%이상의 가장 높은 인식률을 보였다. 또한 '정상어+속삭임' 모델도 인식률은 조금 벌어지나 가능성을 확인할 수 있었다. 특징벡터로는 속삭임 모델을 사용하는 경우 MFCC 혹은 PLCC를 사용하는 것이 거의 유사하게 높은 인식률을 얻을 수 있었으나 '정상어+속삭임' 모델을 사용하는 경우 PLCC를 특징벡터로 사용하는 것이 속삭임 인식에서 가장 좋은 결과를 보였다.

독립 성분 분석과 스펙트럼 향상에 의한 잡음 환경에서의 음성인식 (Speech Recognition in Noise Environment by Independent Component Analysis and Spectral Enhancement)

  • 최승호
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.81-91
    • /
    • 2003
  • In this paper, we propose a speech recognition method based on independent component analysis (ICA) and spectral enhancement techniques. While ICA tris to separate speech signal from noisy speech using multiple channels, some noise remains by its algorithmic limitations. Spectral enhancement techniques can compensate for lack of ICA's signal separation ability. From the speech recognition experiments with instantaneous and convolved mixing environments, we show that the proposed approach gives much improved recognition accuracies than conventional methods.

  • PDF

An Efficient Model Parameter Compensation Method foe Robust Speech Recognition

  • 정용주
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.107-115
    • /
    • 2003
  • An efficient method that compensates the HMM parameters for the noisy speech recognition is proposed. Instead of assuming some analytical approximations as in the PMC, the proposed method directly re-estimates the HMM parameters by the segmental k-means algorithm. The proposed method has shown improved results compared with the conventional PMC method at reduced computational cost.

  • PDF

신뢰도 벡터 기반의 다단계 음성인식 (Multi-stage Speech Recognition Using Confidence Vector)

  • 전형배;황규웅;정훈;김승희;박준;이윤근
    • 대한음성학회지:말소리
    • /
    • 제63호
    • /
    • pp.113-124
    • /
    • 2007
  • In this paper, we propose a use of confidence vector as an intermediate input feature for multi-stage based speech recognition architecture to improve recognition accuracy. A multi-stage speech recognition structure is introduced as a method to reduce the computational complexity of the decoding procedure and then accomplish faster speech recognition. Conventional multi-stage speech recognition is usually composed of three stages, acoustic search, lexical search, and acoustic re-scoring. In this paper, we focus on improving the accuracy of the lexical decoding by introducing a confidence vector as an input feature instead of phoneme which was used typically. We take experimental results on 220K Korean Point-of-Interest (POI) domain and the experimental results show that the proposed method contributes on improving accuracy.

  • PDF

한국어 음성인식 플랫폼의 설계 (Design of a Korean Speech Recognition Platform)

  • 권오욱;김회린;유창동;김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제51호
    • /
    • pp.151-165
    • /
    • 2004
  • For educational and research purposes, a Korean speech recognition platform is designed. It is based on an object-oriented architecture and can be easily modified so that researchers can readily evaluate the performance of a recognition algorithm of interest. This platform will save development time for many who are interested in speech recognition. The platform includes the following modules: Noise reduction, end-point detection, met-frequency cepstral coefficient (MFCC) and perceptually linear prediction (PLP)-based feature extraction, hidden Markov model (HMM)-based acoustic modeling, n-gram language modeling, n-best search, and Korean language processing. The decoder of the platform can handle both lexical search trees for large vocabulary speech recognition and finite-state networks for small-to-medium vocabulary speech recognition. It performs word-dependent n-best search algorithm with a bigram language model in the first forward search stage and then extracts a word lattice and restores each lattice path with a trigram language model in the second stage.

  • PDF

잡음음성 음향모델 적응에 기반한 잡음에 강인한 음성인식 (Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation)

  • 정용주
    • 말소리와 음성과학
    • /
    • 제6권2호
    • /
    • pp.29-34
    • /
    • 2014
  • In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.

자동차 주행 환경에서의 음성 전달 명료도와 음성 인식 성능 비교 (Comparison of Speech Intelligibility & Performance of Speech Recognition in Real Driving Environments)

  • 이광현;최대림;김영일;김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.99-110
    • /
    • 2004
  • The normal transmission characteristics of sound are hardly obtained due to the various noises and structural factors in a running car environment. It is due to the channel distortion of the original source sound recorded by microphones, and it seriously degrades the performance of the speech recognition in real driving environments. In this paper we analyze the degree of intelligibility under the various sound distortion environments by channels according to driving speed with respect to speech transmission index(STI) and compare the STI with rates of speech recognition. We examine the correlation between measures of intelligibility depending on sound pick-up patterns and performance in speech recognition. Thereby we consider the optimal location of a microphone in single channel environment. In experimentation we find that high correlation is obtained between STI and rates of speech recognition.

  • PDF

신경망과 퍼지논리를 이용한 음소인식에 관한 연구 (A Study on Phoneme Recognition using Neural Networks and Fuzzy logic)

  • 한정현;최두일
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1998년도 하계학술대회 논문집 G
    • /
    • pp.2265-2267
    • /
    • 1998
  • This paper deals with study of Fast Speaker Adaptation Type Speech Recognition, and to analyze speech signal efficiently in time domain and time-frequency domain, utilizes SCONN[1] with Speech Signal Process suffices for Fast Speaker Adaptation Type Speech Recognition, and examined Speech Recognition to investigate adaptation of system, which has speech data input after speaker dependent recognition test.

  • PDF

음성 변환을 사용한 감정 변화에 강인한 음성 인식 (Emotion Robust Speech Recognition using Speech Transformation)

  • 김원구
    • 한국지능시스템학회논문지
    • /
    • 제20권5호
    • /
    • pp.683-687
    • /
    • 2010
  • 본 논문에서는 인간의 감정 변화에 강인한 음성 인식 시스템을 구현하기 위하여 음성 변환 방법 중의 한가지인 주파수 와핑 방법을 사용한 연구를 수행하였다. 이러한 목표를 위하여 다양한 감정이 포함된 음성 데이터베이스를 사용하여 감정의 변화에 따라 음성의 스펙트럼이 변화한다는 것과 이러한 변화는 음성 인식 시스템의 성능을 저하시키는 원인 중의 하나임을 관찰하였다. 본 논문에서는 이러한 음성의 변화를 감소시키는 방법으로 주파수 와핑을 학습 과정에 사용하는 방법을 제안하여 감정 변화에 강인한 음성 인식 시스템을 구현하였고 성도 길이 정규화 방법을 사용한 방법과 성능을 비교하였다. HMM을 사용한 단독음 인식 실험에서 제안된 학습 방법은 사용하면 감정이 포함된 데이터에 대한 인식 오차가 기존 방법보다 감소되었다.

방송뉴스 인식에서의 잡음 처리 기법에 대한 고찰 (A Study on Noise-Robust Methods for Broadcast News Speech Recognition)

  • 정용주
    • 대한음성학회지:말소리
    • /
    • 제50호
    • /
    • pp.71-83
    • /
    • 2004
  • Recently, broadcast news speech recognition has become one of the most attractive research areas. If we can transcribe automatically the broadcast news and store their contents in the text form instead of the video or audio signal itself, it will be much easier for us to search for the multimedia databases to obtain what we need. However, the desirable speech signal in the broadcast news are usually affected by the interfering signals such as the background noise and/or the music. Also, the speech of the reporter who is speaking over the telephone or with the ill-conditioned microphone is severely distorted by the channel effect. The interfered or distorted speech may be the main reason for the poor performance in the broadcast news speech recognition. In this paper, we investigated some methods to cope with the problems and we could see some performance improvements in the noisy broadcast news speech recognition.

  • PDF