• 제목/요약/키워드: speech feature extraction

검색결과 155건 처리시간 0.028초

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Son Jongmok;Kwon Hongseok;Kim Siho;Bae Keunsung
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2004년도 ICEIC The International Conference on Electronics Informations and Communications
    • /
    • pp.391-394
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5kbytes for program code. Maximum required time of 29.2ms for processing a frame of 32ms of speech validates real-time operation of the implemented system.

  • PDF

Classification of Sasang Constitution Taeumin by Comparative of Speech Signals Analysis (음성 분석 정보값 비교를 통한 사상체질 태음인의 분류)

  • Kim, Bong-Hyun;Lee, Se-Hwan;Cho, Dong-Uk
    • The KIPS Transactions:PartB
    • /
    • 제15B권1호
    • /
    • pp.17-24
    • /
    • 2008
  • This paper proposes Sasang constitution classification through speech signals analysis values and comparison. For this, this paper wishes to propose Taeumin classification method of output values signals that comes out speech signal analysis to connect with process classification of Soeumin through skin diagnosis by first step in the whole system configuration to provide for objective index of Sasang constitution. First of all, these characteristic of voices wish to extract phonetic elements that each Sasang constitution groups' clear features. Also, we wish to classify Taeumin through constitution groups' difference and similarity on the basis of results value. Finally, the effectiveness of this method is verified through the experiments.

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

  • Shin, Jun-Soo;Kim, Hark-Soo
    • Journal of KIISE:Software and Applications
    • /
    • 제37권12호
    • /
    • pp.946-950
    • /
    • 2010
  • Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.

Independent Component Analysis on a Subband Domain for Robust Speech Recognition (음성의 특징 단계에 독립 요소 해석 기법의 효율적 적용을 통한 잡음 음성 인식)

  • Park, Hyeong-Min;Jeong, Ho-Yeong;Lee, Tae-Won;Lee, Su-Yeong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • 제37권6호
    • /
    • pp.22-31
    • /
    • 2000
  • In this paper, we propose a method for removing noise components in the feature extraction process for robust speech recognition. This method is based on blind separation using independent component analysis (ICA). Given two noisy speech recordings the algorithm linearly separates speech from the unwanted noise signal. To apply ICA as closely as possible to the feature level for recognition, a new spectral analysis is presented. It modifies the computation of band energies by previously averaging out fast Fourier transform (FFT) points in several divided ranges within one met-scaled band. The simple analysis using sample variances of band energies of speech and noise, and recognition experiments showed its noise robustness. For noisy speech signals recorded in real environments, the proposed method which applies ICA to the new spectral analysis improved the recognition performances to a considerable extent, and was particularly effective for low signal-to-noise ratios (SNRs). This method gives some insights into applying ICA to feature levels and appears useful for robust speech recognition.

  • PDF

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

  • Seung-min Kim;So-hee Park;Dae-seon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • 제34권3호
    • /
    • pp.439-449
    • /
    • 2024
  • Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.

Emotion Recognition using Pitch Parameters of Speech (음성의 피치 파라메터를 사용한 감정 인식)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제25권3호
    • /
    • pp.272-278
    • /
    • 2015
  • This paper studied various parameter extraction methods using pitch information of speech for the development of the emotion recognition system. For this purpose, pitch parameters were extracted from korean speech database containing various emotions using stochastical information and numerical analysis techniques. GMM based emotion recognition system were used to compare the performance of pitch parameters. Sequential feature selection method were used to select the parameters showing the best emotion recognition performance. Experimental results of recognizing four emotions showed 63.5% recognition rate using the combination of 15 parameters out of 56 pitch parameters. Experimental results of detecting the presence of emotion showed 80.3% recognition rate using the combination of 14 parameters.

Speech/Mixed Content Signal Classification Based on GMM Using MFCC (MFCC를 이용한 GMM 기반의 음성/혼합 신호 분류)

  • Kim, Ji-Eun;Lee, In-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • 제50권2호
    • /
    • pp.185-192
    • /
    • 2013
  • In this paper, proposed to improve the performance of speech and mixed content signal classification using MFCC based on GMM probability model used for the MPEG USAC(Unified Speech and Audio Coding) standard. For effective pattern recognition, the Gaussian mixture model (GMM) probability model is used. For the optimal GMM parameter extraction, we use the expectation maximization (EM) algorithm. The proposed classification algorithm is divided into two significant parts. The first one extracts the optimal parameters for the GMM. The second distinguishes between speech and mixed content signals using MFCC feature parameters. The performance of the proposed classification algorithm shows better results compared to the conventionally implemented USAC scheme.

Extraction of MFCC feature parameters based on the PCA-optimized filter bank and Korean connected 4-digit telephone speech recognition (PCA-optimized 필터뱅크 기반의 MFCC 특징파라미터 추출 및 한국어 4연숫자 전화음성에 대한 인식실험)

  • 정성윤;김민성;손종목;배건성
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • 제41권6호
    • /
    • pp.279-283
    • /
    • 2004
  • In general, triangular shape filters are used in the filter bank when we extract MFCC feature parameters from the spectrum of the speech signal. A different approach, which uses specific filter shapes in the filter bank that are optimized to the spectrum of training speech data, is proposed by Lee et al. to improve the recognition rate. A principal component analysis method is used to get the optimized filter coefficients. Using a large amount of 4-digit telephone speech database, in this paper, we get the MFCCs based on the PCA-optimized filter bank and compare the recognition performance with conventional MFCCs and direct weighted filter bank based MFCCs. Experimental results have shown that the MFCC based on the PCA-optimized filter bank give slight improvement in recognition rate compared to the conventional MFCCs but fail to achieve better performance than the MFCCs based on the direct weighted filter bank analysis. Experimental results are discussed with our findings.

Hate Speech Detection Using Modified Principal Component Analysis and Enhanced Convolution Neural Network on Twitter Dataset

  • Majed, Alowaidi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.112-119
    • /
    • 2023
  • Traditionally used for networking computers and communications, the Internet has been evolving from the beginning. Internet is the backbone for many things on the web including social media. The concept of social networking which started in the early 1990s has also been growing with the internet. Social Networking Sites (SNSs) sprung and stayed back to an important element of internet usage mainly due to the services or provisions they allow on the web. Twitter and Facebook have become the primary means by which most individuals keep in touch with others and carry on substantive conversations. These sites allow the posting of photos, videos and support audio and video storage on the sites which can be shared amongst users. Although an attractive option, these provisions have also culminated in issues for these sites like posting offensive material. Though not always, users of SNSs have their share in promoting hate by their words or speeches which is difficult to be curtailed after being uploaded in the media. Hence, this article outlines a process for extracting user reviews from the Twitter corpus in order to identify instances of hate speech. Through the use of MPCA (Modified Principal Component Analysis) and ECNN, we are able to identify instances of hate speech in the text (Enhanced Convolutional Neural Network). With the use of NLP, a fully autonomous system for assessing syntax and meaning can be established (NLP). There is a strong emphasis on pre-processing, feature extraction, and classification. Cleansing the text by removing extra spaces, punctuation, and stop words is what normalization is all about. In the process of extracting features, these features that have already been processed are used. During the feature extraction process, the MPCA algorithm is used. It takes a set of related features and pulls out the ones that tell us the most about the dataset we give itThe proposed categorization method is then put forth as a means of detecting instances of hate speech or abusive language. It is argued that ECNN is superior to other methods for identifying hateful content online. It can take in massive amounts of data and quickly return accurate results, especially for larger datasets. As a result, the proposed MPCA+ECNN algorithm improves not only the F-measure values, but also the accuracy, precision, and recall.

An Implementation of Real-Time Speaker Verification System on Telephone Voices Using DSP Board (DSP보드를 이용한 전화음성용 실시간 화자인증 시스템의 구현에 관한 연구)

  • Lee Hyeon Seung;Choi Hong Sub
    • MALSORI
    • /
    • 제49호
    • /
    • pp.145-158
    • /
    • 2004
  • This paper is aiming at implementation of real-time speaker verification system using DSP board. Dialog/4, which is based on microprocessor and DSP processor, is selected to easily control telephone signals and to process audio/voice signals. Speaker verification system performs signal processing and feature extraction after receiving voice and its ID. Then through computing the likelihood ratio of claimed speaker model to the background model, it makes real-time decision on acceptance or rejection. For the verification experiments, total 15 speaker models and 6 background models are adopted. The experimental results show that verification accuracy rates are 99.5% for using telephone speech-based speaker models.

  • PDF