Search | Korea Science

Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation (자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리)

Park Hyeyoung;Kim Hyungsoon
- MALSORI
- /
- no.43
- /
- pp.137-150
- /
- 2002
Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.
PDF

Spoken Document Retrieval Based on Phone Sequence Strings Decoded by PVDHMM (PVDHMM을 이용한 음소열 기반의 SDR 응용)

Choi, Dae-Lim;Kim, Bong-Wan;Kim, Chong-Kyo;Lee, Yong-Ju
- MALSORI
- /
- no.62
- /
- pp.133-147
- /
- 2007
In this paper, we introduce a phone vector discrete HMM(PVDHMM) that decodes a phone sequence string, and demonstrates the applicability to spoken document retrieval. The PVDHMM treats a phone recognizer or large vocabulary continuous speech recognizer (LVCSR) as a vector quantizer whose codebook size is equal to the size of its phone set. We apply the PVDHMM to decode the phone sequence strings and compare the outputs with those of a continuous speech recognizer(CSR). Also we carry out spoken document retrieval experiment through PVDHMM word spotter on the phone sequence strings which are generated by phone recognizer or LVCSR and compare its results with those of retrieval through the phone-based vector space model.
PDF

Performance Comparison of Feature Parameters and Classifiers for Speech/Music Discrimination (음성/음악 판별을 위한 특징 파라미터와 분류기의 성능비교)

Kim Hyung Soon;Kim Su Mi
- MALSORI
- /
- no.46
- /
- pp.37-50
- /
- 2003
In this paper, we evaluate and compare the performance of speech/music discrimination based on various feature parameters and classifiers. As for feature parameters, we consider High Zero Crossing Rate Ratio (HZCRR), Low Short Time Energy Ratio (LSTER), Spectral Flux (SF), Line Spectral Pair (LSP) distance, entropy and dynamism. We also examine three classifiers: k Nearest Neighbor (k-NN), Gaussian Mixure Model (GMM), and Hidden Markov Model (HMM). According to our experiments, LSP distance and phoneme-recognizer-based feature set (entropy and dunamism) show good performance, while performance differences due to different classifiers are not significant. When all the six feature parameters are employed, average speech/music discrimination accuracy up to 96.6% is achieved.
PDF

Weighted filter bank analysis and model adaptation for improving the recognition performance of partially corrupted speech (부분 손상된 음성의 인식성능 향상을 위한 가중 필터뱅크 분석 및 모델 적응)

Cho Hoon-Young;Oh Yung-Hwan
- MALSORI
- /
- no.44
- /
- pp.157-169
- /
- 2002
We propose a weighted filter bank analysis and model adaptation (WFBA-MA) scheme to improve the utilization of uncorrupted or less severely corrupted frequency regions for robust speech recognition. A weighted met frequency cepstral coefficient is obtained by weighting log filter bank energies with reliability coefficients and hidden Markov models are also modified to reflect the local reliabilities. Experimental results on TIDIGITS database corrupted by band-limited noises and car noise indicated that the proposed WFBA-MA scheme utilizes the uncorrupted speech information well, significantly improving recognition performance in comparison to multi-band speech recognition systems.
PDF

근전도신호를 이용한 노약자/장애인용 재활 보조시스템의 인터페이스기법

장영건;신철규;이은실;권장우;홍승홍
- Proceedings of the ESK Conference
- /
- 1997.04a
- /
- pp.107-113
- /
- 1997
In this paper, an interfacing method to control rehabilitation assitance system with bio-signal is proposed. Controlling with EMG signals method has certain advantage on signal-collecting, but has some drawbacks in the function resolution of EMG signals because data-processing process is not efficient. To improve function-resolution and to increase the efficiency of EMG signal interfacing with rehabilitation assistance system, Multi-layer Perception which is highly effective with static signal and hidden-Markov model for dynamic signal resolving are fused together. In proposed method. The direction and average speed of the rehabilitation assitance system are controlled by the trajectory control and estimation of the moving direction result from the fused model. From the experiment, proposed GMM and 2-level MLP hybrid-classifier yielded 8.6% perception-error rate, improving function resolution. New acceleration control method constructed with 3 nested linear filter produced continuous acceleration paths without the information of destination point. Thus, the mass output caused by non- continuous acceleration-deceleration was eliminated. In the simulation, the necessary calculation, in the case of multiplication, was reduced by 11.54%.
PDF

Phoneme-Model Word Recognizer on RASTA-PLP (RASTA-PLP의 음소 모델 단어 인식기 적용)

허창원
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1997.06a
- /
- pp.9-12
- /
- 1997
대부분의 음성 파？너 추정 기법은 통신 채널의 주파수 응답에 의해 쉽게 영향을 받는다. 이 논문에서 우리는 음성에서 그러한 안정상태의 스펙트럼 계수에 있어서 좀더 강인한 기법인 RASTA-PLP 방법을 적용하여 파라미터를 추출하고 그 파라미터를 연속 HMM 인식기의 입력으로 사용하여 문맥독립 음소 모델을 훈련하는 과정에서 최적의 모델을 찾게 된다. 여기서는 ETRI 445 DB에 RASTA-PLP를 적용하였을 때 가장 좋은 성능을 나타내는 재추정 횟수와 mixutre 수를 찾는 데 목표를둔다. 문맥독립음소모델은 한국어의 발성학적 근거를 토대로 하고 여기에 묵음(silence)을 추가하여 총 40개로 정의하였다. 문맥독립 음소모델은 3개의 상태를 가지는 전형적인 left-to right CHMM(Continuous Hidden Markov Model)을 이용하여 훈련한다. 그리고 훈련시간을 줄이기 위해 Viterbi beam 탐색법을 적용한다.
PDF

Discriminative Training of Stochastic Segment Model Based on HMM Segmentation for Continuous Speech Recognition

Chung, Yong-Joo;Un, Chong-Kwan
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.4E
- /
- pp.21-27
- /
- 1996
In this paper, we propose a discriminative training algorithm for the stochastic segment model (SSM) in continuous speech recognition. As the SSM is usually trained by maximum likelihood estimation (MLE), a discriminative training algorithm is required to improve the recognition performance. Since the SSM does not assume the conditional independence of observation sequence as is done in hidden Markov models (HMMs), the search space for decoding an unknown input utterance is increased considerably. To reduce the computational complexity and starch space amount in an iterative training algorithm for discriminative SSMs, a hybrid architecture of SSMs and HMMs is programming using HMMs. Given the segment boundaries, the parameters of the SSM are discriminatively trained by the minimum error classification criterion based on a generalized probabilistic descent (GPD) method. With the discriminative training of the SSM, the word error rate is reduced by 17% compared with the MLE-trained SSM in speaker-independent continuous speech recognition.
PDF

Emotion Recognition using Prosodic Feature Vector and Gaussian Mixture Model (운율 특성 벡터와 가우시안 혼합 모델을 이용한 감정인식)

Kwak, Hyun-Suk;Kim, Soo-Hyun;Kwak, Yoon-Keun
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2002.11a
- /
- pp.375.2-375
- /
- 2002
This paper describes the emotion recognition algorithm using HMM(Hidden Markov Model) method. The relation between the mechanic system and the human has just been unilateral so far This is the why people don't want to get familiar with multi-service robots. If the function of the emotion recognition is granted to the robot system, the concept of the mechanic part will be changed a lot. (omitted)
PDF

Robust Action Recognition Using Multiple View Image Sequences (다중 시점 영상 시퀀스를 이용한 강인한 행동 인식)

Ahmad, Mohiuddin;Lee, Seong-Whan
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10b
- /
- pp.509-514
- /
- 2006
Human action recognition is an active research area in computer vision. In this paper, we present a robust method for human action recognition by using combined information of human body shape and motion information with multiple views image sequence. The principal component analysis is used to extract the shape feature of human body and multiple block motion of the human body is used to extract the motion features of human. This combined information with multiple view sequences enhances the recognition of human action. We represent each action using a set of hidden Markov model and we model each action by multiple views. This characterizes the human action recognition from arbitrary view information. Several daily actions of elderly persons are modeled and tested by using this approach and they are correctly classified, which indicate the robustness of our method.
PDF

Biped Walking Robot Control using Hand Gesture (손 제스처를 사용한 보행로봇 제어)

Seo, In-Gyo;Jang, Sang-Su;Kim, Hang-Joon
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.11b
- /
- pp.577-579
- /
- 2005
본 논문에서는 손 제스처를 사용한 2족 보행로봇 제어방법을 제안한다. 제안된 방법은 연속된 입력 영상으로부터 사용자의 손을 검출하기 위해, 피부색 정보와 Hue의 불변 모멘트 정보를 사용한다. 검출된 손 영역은 Active Contour Model를 사용하여 추적한다. 손 제스처를 인식하기 위해 Hue의 불변 모멘 정보로부터, 검출된 손의 모양을판단하고 그 결과를 미리 정해둔 심벌 중에 하나로 할당한다. 이렇게 연속적으로 할당된 심벌들은 HMM(Hidden Markov Model) 인식기를 통해 인식 되고 로봇 명령어를 출력하며, 출력된 명령어에 따라 로봇이 제어된다. 제안된 방법의 효율성을 증명하기 위해, 자체 제작한 2족 보행로봇(KAI)으로 6개의 손 제스처를 이용하여 사용자가 원격지에 있는 로봇의 보행을 제어하는 원격 로봇 보행 제어 시스템에 응용해 보았다. 실험 결과, $94\%$의 인식률을 보였다.
PDF

Search Result 641, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)