Search | Korea Science

A Study on the Performance Improvement of Connected Digit Telephone Speech Recognition (연속 숫자음 전화음성의 인식 성능 향상에 관한 연구)

Kim Min Sung;Jung Sung Yun;Son Jong Mok;Bae Keun Sung
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.143-146
- /
- 2002
전화음성의 경우 전화 회선의 채널 대역폭 제한과 통화로 형성시 달라지는 채널의 특성으로 인하여 마이크 음성에 비하여 인식 성능이 많이 저하된다. 본 연구에서는 연속 숫자음 전화음성의 인식율 향상을 위해 채널 왜곡 보상 기법들을 적용하고, HTK 기반의 인식 실험을 통해 보상 기법에 따른 인식 성능을 비교하였다. 채널 왜곡 보상 기법으로 CMN, RASTA, RTCN 등을 적용하고, 각 보상 기법에 따라 HMM의 state 수, mixture 수를 바꾸어 가며 인식 실험한 결과를 제시한다.
PDF

Parallel Gaussian Processes for Gait and Phase Analysis (보행 방향 및 상태 분석을 위한 병렬 가우스 과정)

Sin, Bong-Kee
- Journal of KIISE
- /
- v.42 no.6
- /
- pp.748-754
- /
- 2015
This paper proposes a sequential state estimation model consisting of continuous and discrete variables, as a way of generalizing all discrete-state factorial HMM, and gives a design of gait motion model based on the idea. The discrete state variable implements a Markov chain that models the gait dynamics, and for each state of the Markov chain, we created a Gaussian process over the space of the continuous variable. The Markov chain controls the switching among Gaussian processes, each of which models the rotation or various views of a gait state. Then a particle filter-based algorithm is presented to give an approximate filtering solution. Given an input vector sequence presented over time, this finds a trajectory that follows a Gaussian process and occasionally switches to another dynamically. Experimental results show that the proposed model can provide a very intuitive interpretation of video-based gait into a sequence of poses and a sequence of posture states.
https://doi.org/10.5626/JOK.2015.42.6.748 인용 KSCI

A Study on Gesture Recognition Using Principal Factor Analysis (주 인자 분석을 이용한 제스처 인식에 관한 연구)

Lee, Yong-Jae;Lee, Chil-Woo
- Journal of Korea Multimedia Society
- /
- v.10 no.8
- /
- pp.981-996
- /
- 2007
In this paper, we describe a method that can recognize gestures by obtaining motion features information with principal factor analysis from sequential gesture images. In the algorithm, firstly, a two dimensional silhouette region including human gesture is segmented and then geometric features are extracted from it. Here, global features information which is selected as some meaningful key feature effectively expressing gestures with principal factor analysis is used. Obtained motion history information representing time variation of gestures from extracted feature construct one gesture subspace. Finally, projected model feature value into the gesture space is transformed as specific state symbols by grouping algorithm to be use as input symbols of HMM and input gesture is recognized as one of the model gesture with high probability. Proposed method has achieved higher recognition rate than others using only shape information of human body as in an appearance-based method or extracting features intuitively from complicated gestures, because this algorithm constructs gesture models with feature factors that have high contribution rate using principal factor analysis.
PDF

Speech Recognition in the Pager System displaying Defined Sentences (문자출력 무선호출기를 위한 음성인식 시스템)

Park, Gyu-Bong;Park, Jeon-Gue;Suh, Sang-Weon;Hwang, Doo-Sung;Kim, Hyun-Bin;Han, Mun-Sung
- Annual Conference on Human and Language Technology
- /
- 1996.10a
- /
- pp.158-162
- /
- 1996
본 논문에서는 문자출력이 가능한 무선호출기에 음성인식 기술을 접목한, 특성화된 한 음성인식 시스템에 대하여 설명하고자 한다. 시스템 동작 과정은, 일단 호출자가 음성인식 서버와 접속하게 되면 서버는 호출자의 자연스런 입력음성을 인식, 그 결과를 문장 형태로 피호출자의 호출기 단말기에 출력시키는 방식으로 되어 있다. 본 시스템에서는 통계적 음성인식 기법을 도입하여, 각 단어를 연속 HMM으로 모델링하였다. 가우시안 혼합 확률밀도함수를 사용하는 각 모델은 전통적인 HMM 학습법들 중의 하나인 Baum-Welch 알고리듬에 의해 학습되고 인식시에는 이들에 비터비 빔 탐색을 적용하여 최선의 결과를 얻도록 한다. MFCC와 파워를 혼용한 26 차원 특징벡터를 각 프레임으로부터 추출하여, 최종적으로, 83 개의 도메인 어휘들 및 무음과 같은 특수어휘들에 대한 모델링을 완성하게 된다. 여기에 구문론적 기능과 의미론적 기능을 함께 수행하는 FSN을 결합시켜 자연발화음성에 대한 연속음성인식 시스템을 구성한다. 본문에서는 이상의 사항들 외에도 음성 데이터베이스, 레이블링 등과 갈이 시스템 성능과 직결되는 시스템의 외적 요소들에 대해 고찰하고, 시스템에 구현되어 있는 다양한 특성들에 대해 밝히며, 실험 결과 및 앞으로의 개선 방향 등에 대해 논의하기로 한다.
PDF

A Study on Variation and Determination of Gaussian function Using SNR Criteria Function for Robust Speech Recognition (잡음에 강한 음성 인식에서 SNR 기준 함수를 사용한 가우시안 함수 변형 및 결정에 관한 연구)

전선도;강철호
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.7
- /
- pp.112-117
- /
- 1999
In case of spectral subtraction for noise robust speech recognition system, this method often makes loss of speech signal. In this study, we propose a method that variation and determination of Gaussian function at semi-continuous HMM(Hidden Markov Model) is made on the basis of SNR criteria function, in which SNR means signal to noise ratio between estimation noise and subtracted signal per frame. For proving effectiveness of this method, we show the estimation error to be related with the magnitude of estimated noise through signal waveform. For this reason, Gaussian function is varied and determined by SNR. When we test recognition rate by computer simulation under the noise environment of driving car over the speed of 80㎞/h, the proposed Gaussian decision method by SNR turns out to get more improved recognition rate compared with the frequency subtracted and non-subtracted cases.
PDF

An Implementation of Speech Recognition System for Car's Control (자동차 제어용 음성 인식시스템 구현)

이광석;김현덕
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.5 no.3
- /
- pp.451-458
- /
- 2001
In this paper, we propose speech control system for a various control device in the car with real time control speech. A real time speech control system is detected start-end points from speech data processing by A/D conversion, and recognize by one pass dynamic programming method. The results displays a monitor, and transports control data to control interfaces. The HMM model is modeled by a continuous control speech consists of control speech and digit speech for controlling of a various control device in the car The recognition rates is an average 97.3% in case of word & control speech, and is an average 96.3% in case of digit speech.
PDF

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

Chung, Yong-Joo
- Speech Sciences
- /
- v.15 no.2
- /
- pp.55-65
- /
- 2008
Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.
PDF

Speaker Adaptation Using Linear Transformation Network in Speech Recognition (선형 변환망을 이용한 화자적응 음성인식)

이기희
- Journal of the Korea Society of Computer and Information
- /
- v.5 no.2
- /
- pp.90-97
- /
- 2000
This paper describes an speaker-adaptive speech recognition system which make a reliable recognition of speech signal for new speakers. In the Proposed method, an speech spectrum of new speaker is adapted to the reference speech spectrum by using Parameters of a 1st linear transformation network at the front of phoneme classification neural network. And the recognition system is based on semicontinuous HMM(hidden markov model) which use the multilayer perceptron as a fuzzy vector quantizer. The experiments on the isolated word recognition are performed to show the recognition rate of the recognition system. In the case of speaker adaptation recognition, the recognition rate show significant improvement for the unadapted recognition system.
PDF

Gaussian Model Optimization using Configuration Thread Control In CHMM Vocabulary Recognition (CHMM 어휘 인식에서 형상 형성 제어를 이용한 가우시안 모델 최적화)

Ahn, Chan-Shik;Oh, Sang-Yeob
- Journal of Digital Convergence
- /
- v.10 no.7
- /
- pp.167-172
- /
- 2012
In vocabulary recognition using HMM(Hidden Markov Model) by model for the observation of a discrete probability distribution indicates the advantages of low computational complexity, but relatively low recognition rate has the disadvantage that require sophisticated smoothing process. Gaussian mixtures in order to improve them with a continuous probability density CHMM (Continuous Hidden Markov Model) model is proposed for the optimization of the library system. In this paper is system configuration thread control in recognition Gaussian mixtures model provides a model to optimize of the CHMM vocabulary recognition. The result of applying the proposed system, the recognition rate of 98.1% in vocabulary recognition, respectively.
https://doi.org/10.14400/JDPM.2012.10.7.167 인용 PDF

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

김락용;조훈영;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.56-62
- /
- 2001
In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12％ in a typical noise situation.
PDF

Search Result 150, Processing Time 0.018 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)