Search | Korea Science

Frame Reliability Weighting for Robust Speech Recognition (프레임 신뢰도 가중에 의한 강인한 음성인식)

조훈영;김락용;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.3
- /
- pp.323-329
- /
- 2002
This paper proposes a frame reliability weighting method to compensate for a time-selective noise that occurs at random positions of speech signal contaminating certain parts of the speech signal. Speech frames have different degrees of reliability and the reliability is proportional to SNR (signal-to noise ratio). While it is feasible to estimate frame Sl? by using the noise information from non-speech interval under a stationary noisy situation, it is difficult to obtain noise spectrum for a time-selective noise. Therefore, we used statistical models of clean speech for the estimation of the frame reliability. The proposed MFR (model-based frame reliability) approximates frame SNR values using filterbank energy vectors that are obtained by the inverse transformation of input MFCC (mal-frequency cepstral coefficient) vectors and mean vectors of a reference model. Experiments on various burnt noises revealed that the proposed method could represent the frame reliability effectively. We could improve the recognition performance by using MFR values as weighting factors at the likelihood calculation step.
PDF KSCI

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

Shekofteh, Yasser;Almasganj, Farshad
- ETRI Journal
- /
- v.35 no.1
- /
- pp.100-108
- /
- 2013
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.
https://doi.org/10.4218/etrij.13.0112.0074 인용 PDF KSCI

Content-based music retrieval using temporal characteristics (Temporal 특성을 이용한 내용기반 음악 정보 검색)

Park Chuleui;Park Mansoo;Kim Sungtak;Kim Hoi-Rin;Kang Kyeongok
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.299-302
- /
- 2004
본 논문에서는 내용 기반 음악 정보 검색에 음악의 temporal 특징을 이용한 검색 방법을 제안한다. 방송환경에 적용하기 위해 검색 범위를 드라마나 영화의 배경 음악으로 사용되는 OST 앨범으로 제한하였다. 오디오의 특징 벡터로써 UFCC(Mel Frequency Cepstral Coefficient)를 사용하였으며 이 특징 벡터를 이용하여 VQ(Vector Quantization)로 부호화한 codeword로 오디오 신호의 시변 특성을 표현한다. 본 논문에서는 제안한 음악의 temporal 특성을 반영한 codeword-sequence를 이용하는 방법을 pitch-histogram을 기반으로 하는 방법 및 MFCC codeword-histogram을 기반으로 하는 방법과 비교하고 성능 개선을 보여주었다.
PDF

Identification of Underwater Ambient Noise Sources Using MFCC (MFCC를 이용한 수중소음원의 식별)

Hwang, Do-Jin;Kim, Jea-Soo
- Proceedings of the Korea Committee for Ocean Resources and Engineering Conference
- /
- 2006.11a
- /
- pp.307-310
- /
- 2006
Underwater ambient noise originating from the geophysical, biological, and man-made acoustic sources contains much information on the sources and the ocean environment affecting the performance of the sonar equipments. In this paper, a set of feature vectors of the ambient noises using MFCC is proposed and extracted to form a data base for the purpose of identifying the noise sources. The developed algorithm for the pattern recognition is applied to the observed ocean data, and the initial results are presented and discussed.
PDF

Implementation of Hidden Markov Model based Speech Recognition System for Teaching Autonomous Mobile Robot (자율이동로봇의 명령 교시를 위한 HMM 기반 음성인식시스템의 구현)

조현수;박민규;이민철
- 제어로봇시스템학회:학술대회논문집
- /
- 2000.10a
- /
- pp.281-281
- /
- 2000
This paper presents an implementation of speech recognition system for teaching an autonomous mobile robot. The use of human speech as the teaching method provides more convenient user-interface for the mobile robot. In this study, for easily teaching the mobile robot, a study on the autonomous mobile robot with the function of speech recognition is tried. In speech recognition system, a speech recognition algorithm using HMM(Hidden Markov Model) is presented to recognize Korean word. Filter-bank analysis model is used to extract of features as the spectral analysis method. A recognized word is converted to command for the control of robot navigation.
PDF

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

Farhadipour, Aref;Veisi, Hadi;Asgari, Mohammad;Keyvanrad, Mohammad Ali
- ETRI Journal
- /
- v.40 no.5
- /
- pp.643-652
- /
- 2018
Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.
https://doi.org/10.4218/etrij.2017-0260 인용 PDF KSCI

Enhancement of Mobile Authentication System Performance based on Multimodal Biometrics (다중 생체인식 기반의 모바일 인증 시스템 성능 개선)

Jeong, Kanghun;Kim, Sanghoon;Moon, Hyeonjoon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.342-345
- /
- 2013
본 논문은 모바일 환경에서의 다중생체인식을 통한 개인인증 시스템을 제안한다. 다중생체인식을 위하여 얼굴인식과 화자인식을 선택하였으며, 시스템의 인식 시나리오는 다음을 따른다. 얼굴인식을 위하여 Modified census transform (MCT) 기반의 얼굴검출과 k-means 클러스터 분석 (cluster analysis) 알고리즘 기반의 눈 검출을 통해 얼굴영역 전처리를 수행하고, principal component analysis (PCA) 기반의 얼굴인증 시스템을 구현한다. 화자인식을 위하여 음성의 끝점 추출과 Mel frequency cepstral coefficient(MFCC) 특징을 추출하고, dynamic time warping (DTW) 기반의 화자 인증 시스템을 구현한다. 그리고 각각의 생체인식을 본 논문에서 제안된 방법을 기반으로 융합하여 인식률을 향상시킨다.
https://doi.org/10.3745/PKIPS.y2013m05a.342 인용 PDF

Music Genre Classification Based on Timbral Texture and Rhythmic Content Features

Baniya, Babu Kaji;Ghimire, Deepak;Lee, Joonwhon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.204-207
- /
- 2013
Music genre classification is an essential component for music information retrieval system. There are two important components to be considered for better genre classification, which are audio feature extraction and classifier. This paper incorporates two different kinds of features for genre classification, timbral texture and rhythmic content features. Timbral texture contains several spectral and Mel-frequency Cepstral Coefficient (MFCC) features. Before choosing a timbral feature we explore which feature contributes less significant role on genre discrimination. This facilitates the reduction of feature dimension. For the timbral features up to the 4-th order central moments and the covariance components of mutual features are considered to improve the overall classification result. For the rhythmic content the features extracted from beat histogram are selected. In the paper Extreme Learning Machine (ELM) with bagging is used as classifier for classifying the genres. Based on the proposed feature sets and classifier, experiment is performed with well-known datasets: GTZAN databases with ten different music genres, respectively. The proposed method acquires the better classification accuracy than the existing approaches.
https://doi.org/10.3745/PKIPS.y2013m05a.204 인용 PDF

Speech Emotion Recognition with SVM, KNN and DSVM

Hadhami Aouani ;Yassine Ben Ayed
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.40-48
- /
- 2023
Speech Emotions recognition has become the active research theme in speech processing and in applications based on human-machine interaction. In this work, our system is a two-stage approach, namely feature extraction and classification engine. Firstly, two sets of feature are investigated which are: the first one is extracting only 13 Mel-frequency Cepstral Coefficient (MFCC) from emotional speech samples and the second one is applying features fusions between the three features: Zero Crossing Rate (ZCR), Teager Energy Operator (TEO), and Harmonic to Noise Rate (HNR) and MFCC features. Secondly, we use two types of classification techniques which are: the Support Vector Machines (SVM) and the k-Nearest Neighbor (k-NN) to show the performance between them. Besides that, we investigate the importance of the recent advances in machine learning including the deep kernel learning. A large set of experiments are conducted on Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of our experiments showed given good accuracy compared with the previous studies.
https://doi.org/10.22937/IJCSNS.2023.23.8.6 인용 PDF

Channel-attentive MFCC for Improved Recognition of Partially Corrupted Speech (부분 손상된 음성의 인식 향상을 위한 채널집중 MFCC 기법)

조훈영;지상문;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.4
- /
- pp.315-322
- /
- 2003
We propose a channel-attentive Mel frequency cepstral coefficient (CAMFCC) extraction method to improve the recognition performance of speech that is partially corrupted in the frequency domain. This method introduces weighting terms both at the filter bank analysis step and at the output probability calculation of decoding step. The weights are obtained for each frequency channel of filter bank such that the more reliable channel is emphasized by a higher weight value. Experimental results on TIDIGITS database corrupted by various frequency-selective noises indicated that the proposed CAMFCC method utilizes the uncorrupted speech information well, improving the recognition performance by 11.2% on average in comparison to a multi-band speech recognition system.
PDF KSCI

Search Result 67, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)