• Title/Summary/Keyword: speech feature extraction

Search Result 155, Processing Time 0.025 seconds

Phoneme-Boundary-Detection and Phoneme Recognition Research using Neural Network (음소경계검출과 신경망을 이용한 음소인식 연구)

  • 임유두;강민구;최영호
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1999.11a
    • /
    • pp.224-229
    • /
    • 1999
  • In the field of speech recognition, the research area can be classified into the following two categories: one which is concerned with the development of phoneme-level recognition system, the other with the efficiency of word-level recognition system. The resonable phoneme-level recognition system should detect the phonemic boundaries appropriately and have the improved recognition abilities all the more. The traditional LPC methods detect the phoneme boundaries using Itakura-Saito method which measures the distance between LPC of the standard phoneme data and that of the target speech frame. The MFCC methods which treat spectral transitions as the phonemic boundaries show the lack of adaptability. In this paper, we present new speech recognition system which uses auto-correlation method in the phonemic boundary detection process and the multi-layered Feed-Forward neural network in the recognition process respectively. The proposed system outperforms the traditional methods in the sense of adaptability and another advantage of the proposed system is that feature-extraction part is independent of the recognition process. The results show that frame-unit phonemic recognition system should be possibly implemented.

  • PDF

Mel-Frequency Cepstral Coefficients Using Formants-Based Gaussian Distribution Filterbank (포만트 기반의 가우시안 분포를 가지는 필터뱅크를 이용한 멜-주파수 켑스트럴 계수)

  • Son, Young-Woo;Hong, Jae-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.370-374
    • /
    • 2006
  • Mel-frequency cepstral coefficients are widely used as the feature for speech recognition. In FMCC extraction process. the spectrum. obtained by Fourier transform of input speech signal is divided by met-frequency bands, and each band energy is extracted for the each frequency band. The coefficients are extracted by the discrete cosine transform of the obtained band energy. In this Paper. we calculate the output energy for each bandpass filter by taking the weighting function when applying met-frequency scaled bandpass filter. The weighting function is Gaussian distributed function whose center is at the formant frequency In the experiments, we can see the comparative performance with the standard MFCC in clean condition. and the better Performance in worse condition by the method proposed here.

Terms Based Sentiment Classification for Online Review Using Support Vector Machine (Support Vector Machine을 이용한 온라인 리뷰의 용어기반 감성분류모형)

  • Lee, Taewon;Hong, Taeho
    • Information Systems Review
    • /
    • v.17 no.1
    • /
    • pp.49-64
    • /
    • 2015
  • Customer reviews which include subjective opinions for the product or service in online store have been generated rapidly and their influence on customers has become immense due to the widespread usage of SNS. In addition, a number of studies have focused on opinion mining to analyze the positive and negative opinions and get a better solution for customer support and sales. It is very important to select the key terms which reflected the customers' sentiment on the reviews for opinion mining. We proposed a document-level terms-based sentiment classification model by select in the optimal terms with part of speech tag. SVMs (Support vector machines) are utilized to build a predictor for opinion mining and we used the combination of POS tag and four terms extraction methods for the feature selection of SVM. To validate the proposed opinion mining model, we applied it to the customer reviews on Amazon. We eliminated the unmeaning terms known as the stopwords and extracted the useful terms by using part of speech tagging approach after crawling 80,000 reviews. The extracted terms gained from document frequency, TF-IDF, information gain, chi-squared statistic were ranked and 20 ranked terms were used to the feature of SVM model. Our experimental results show that the performance of SVM model with four POS tags is superior to the benchmarked model, which are built by extracting only adjective terms. In addition, the SVM model based on Chi-squared statistic for opinion mining shows the most superior performance among SVM models with 4 different kinds of terms extraction method. Our proposed opinion mining model is expected to improve customer service and gain competitive advantage in online store.

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters (MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상)

  • Kang, Ji Hoon;Kim, Bo Ram;Kim, Kyu Young;Lee, Sang Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.679-686
    • /
    • 2020
  • In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.

Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal (성대신호 기반의 명령어인식기를 위한 특징벡터 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.3
    • /
    • pp.226-234
    • /
    • 2007
  • In this paper, we develop a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. However, because of the absence of high frequencies and the partially loss of formant frequencies, previous systems developed with those devices have shown a lower recognition rate than systems which use standard microphone signals. This problem has led to researchers using throat microphone signals as supplementary data sources supporting standard microphone signals. In this paper, we present a high performance ASR system which we developed using only a throat microphone by taking advantage of Korean Phonological Feature Theory and a detailed throat signal analysis. Analyzing the spectrum and the result of FFT of the throat microphone signal, we find that the conventional MFCC feature vector that uses a critical pass filter does not characterize the throat microphone signals well. We also describe the conditions of the feature extraction algorithm which make it best suited for throat microphone signal analysis. The conditions involve (1) a sensitive band-pass filter and (2) use of feature vector which is suitable for voice/non-voice classification. We experimentally show that the ZCPA algorithm designed to meet these conditions improves the recognizer's performance by approximately 16%. And we find that an additional noise-canceling algorithm such as RAST A results in 2% more performance improvement.

Crack Detection of Rotating Blade using Hidden Markov Model (회전 블레이드의 크랙 발생 예측을 위한 은닉 마르코프모델을 이용한 해석)

  • Lee, Seung-Kyu;Yoo, Hong-Hee
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2009.10a
    • /
    • pp.99-105
    • /
    • 2009
  • Crack detection method of a rotating blade was suggested in this paper. A rotating blade was modeled with a cantilever beam connected to a hub undergoing rotating motion. The existence and the location of crack were able to be recognized from the vertical response of end tip of a rotating cantilever beam by employing Discrete Hidden Markov Model (DHMM) and Empirical Mode Decomposition (EMD). DHMM is a famous stochastic method in the field of speech recognition. However, in recent researches, it has been proved that DHMM can also be used in machine health monitoring. EMD is the method suggested by Huang et al. that decompose a random signal into several mono component signals. EMD was used in this paper as the process of extraction of feature vectors which is the important process to developing DHMM. It was found that developed DHMMs for crack detection of a rotating blade have shown good crack detection ability.

  • PDF

Analyzing the Acoustic Elements and Emotion Recognition from Speech Signal Based on DRNN (음향적 요소분석과 DRNN을 이용한 음성신호의 감성 인식)

  • Sim, Kwee-Bo;Park, Chang-Hyun;Joo, Young-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.45-50
    • /
    • 2003
  • Recently, robots technique has been developed remarkably. Emotion recognition is necessary to make an intimate robot. This paper shows the simulator and simulation result which recognize or classify emotions by learning pitch pattern. Also, because the pitch is not sufficient for recognizing emotion, we added acoustic elements. For that reason, we analyze the relation between emotion and acoustic elements. The simulator is composed of the DRNN(Dynamic Recurrent Neural Network), Feature extraction. DRNN is a learning algorithm for pitch pattern.

Audio Segmentation and Classification Using Support Vector Machine and Fuzzy C-Means Clustering Techniques (서포트 벡터 머신과 퍼지 클러스터링 기법을 이용한 오디오 분할 및 분류)

  • Nguyen, Ngoc;Kang, Myeong-Su;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.19-26
    • /
    • 2012
  • The rapid increase of information imposes new demands of content management. The purpose of automatic audio segmentation and classification is to meet the rising need for efficient content management. With this reason, this paper proposes a high-accuracy algorithm that segments audio signals and classifies them into different classes such as speech, music, silence, and environment sounds. The proposed algorithm utilizes support vector machine (SVM) to detect audio-cuts, which are boundaries between different kinds of sounds using the parameter sequence. We then extract feature vectors that are composed of statistical data and they are used as an input of fuzzy c-means (FCM) classifier to partition audio-segments into different classes. To evaluate segmentation and classification performance of the proposed SVM-FCM based algorithm, we consider precision and recall rates for segmentation and classification accuracy for classification. Furthermore, we compare the proposed algorithm with other methods including binary and FCM classifiers in terms of segmentation performance. Experimental results show that the proposed algorithm outperforms other methods in both precision and recall rates.

Front-End Processing for Speech Recognition in the Telephone Network (전화망에서의 음성인식을 위한 전처리 연구)

  • Jun, Won-Suk;Shin, Won-Ho;Yang, Tae-Young;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.57-63
    • /
    • 1997
  • In this paper, we study the efficient feature vector extraction method and front-end processing to improve the performance of the speech recognition system using KT(Korea Telecommunication) database collected through various telephone channels. First of all, we compare the recognition performances of the feature vectors known to be robust to noise and environmental variation and verify the performance enhancement of the recognition system using weighted cepstral distance measure methods. The experiment result shows that the recognition rate is increasedby using both PLP(Perceptual Linear Prediction) and MFCC(Mel Frequency Cepstral Coefficient) in comparison with LPC cepstrum used in KT recognition system. In cepstral distance measure, the weighted cepstral distance measure functions such as RPS(Root Power Sums) and BPL(Band-Pass Lifter) help the recognition enhancement. The application of the spectral subtraction method decrease the recognition rate because of the effect of distortion. However, RASTA(RelAtive SpecTrAl) processing, CMS(Cepstral Mean Subtraction) and SBR(Signal Bias Removal) enhance the recognition performance. Especially, the CMS method is simple but shows high recognition enhancement. Finally, the performances of the modified methods for the real-time implementation of CMS are compared and the improved method is suggested to prevent the performance degradation.

  • PDF

A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis (영어 트위터 감성 분석을 위한 SentiWordNet 활용 기법 비교)

  • Kang, In-Su
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.317-324
    • /
    • 2013
  • Twitter sentiment analysis is to classify a tweet (message) into positive and negative sentiment class. This study deals with SentiWordNet(SWN)-based twitter sentiment analysis. SWN is a sentiment dictionary in which each sense of an English word has a positive and negative sentimental strength. There has been a variety of SWN-based sentiment feature extraction methods which typically first determine the sentiment orientation (SO) of a term in a document and then decide SO of the document from such terms' SO values. For example, for SO of a term, some calculated the maximum or average of sentiment scores of its senses, and others computed the average of the difference of positive and negative sentiment scores. For SO of a document, many researchers employ the maximum or average of terms' SO values. In addition, the above procedure may be applied to the whole set (adjective, adverb, noun, and verb) of parts-of-speech or its subset. This work provides a comparative study on SWN-based sentiment feature extraction schemes with performance evaluation on a well-known twitter dataset.