• Title/Summary/Keyword: 음성추출

Search Result 987, Processing Time 0.042 seconds

Design of Multi-Purpose Preprocessor for Keyword Spotting and Continuous Language Support in Korean (한국어 핵심어 추출 및 연속 음성 인식을 위한 다목적 전처리 프로세서 설계)

  • Kim, Dong-Heon;Lee, Sang-Joon
    • Journal of Digital Convergence
    • /
    • v.11 no.1
    • /
    • pp.225-236
    • /
    • 2013
  • The voice recognition has been made continuously. Now, this technology could support even natural language beyond recognition of isolated words. Interests for the voice recognition was boosting after the Siri, I-phone based voice recognition software, was presented in 2010. There are some occasions implemented voice enabled services using Korean voice recognition softwares, but their accuracy isn't accurate enough, because of background noise and lack of control on voice related features. In this paper, we propose a sort of multi-purpose preprocessor to improve this situation. This supports Keyword spotting in the continuous speech in addition to noise filtering function. This should be independent of any voice recognition software and it can extend its functionality to support continuous speech by additionally identifying the pre-predicate and the post-predicate in relative to the spotted keyword. We get validation about noise filter effectiveness, keyword recognition rate, continuous speech recognition rate by experiments.

Target Speech Segregation Using Non-parametric Correlation Feature Extraction in CASA System (CASA 시스템의 비모수적 상관 특징 추출을 이용한 목적 음성 분리)

  • Choi, Tae-Woong;Kim, Soon-Hyub
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.1
    • /
    • pp.79-85
    • /
    • 2013
  • Feature extraction of CASA system uses time continuity and channel similarity and makes correlogram of auditory elements for the use. In case of using feature extraction with cross correlation coefficient for channel similarity, it has much computational complexity in order to display correlation quantitatively. Therefore, this paper suggests feature extraction method using non-parametric correlation coefficient in order to reduce computational complexity when extracting the feature and tests to segregate target speech by CASA system. As a result of measuring SNR (Signal to Noise Ratio) for the performance evaluation of target speech segregation, the proposed method shows a slight improvement of 0.14 dB on average over the conventional method.

Speech Recognition Error Compensation using MFCC and LPC Feature Extraction Method (MFCC와 LPC 특징 추출 방법을 이용한 음성 인식 오류 보정)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.137-142
    • /
    • 2013
  • Speech recognition system is input of inaccurate vocabulary by feature extraction case of recognition by appear result of unrecognized or similar phoneme recognized. Therefore, in this paper, we propose a speech recognition error correction method using phoneme similarity rate and reliability measures based on the characteristics of the phonemes. Phonemes similarity rate was phoneme of learning model obtained used MFCC and LPC feature extraction method, measured with reliability rate. Minimize the error to be unrecognized by measuring the rate of similar phonemes and reliability. Turned out to error speech in the process of speech recognition was error compensation performed. In this paper, the result of applying the proposed system showed a recognition rate of 98.3%, error compensation rate 95.5% in the speech recognition.

A study on Gabor Filter Bank-based Feature Extraction Algorithm for Analysis of Acoustic data of Emergency Rescue (응급구조 음향데이터 분석을 위한 Gabor 필터뱅크 기반의 특징추출 알고리즘에 대한 연구)

  • Hwang, Inyoung;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1345-1347
    • /
    • 2015
  • 본 논문에서는 응급상황이 신고되는 상황에서 수보자에게 전달되는 신고자의 주변음향신호로부터 신고자의 주변상황을 추정하기 위하여 음향의 주파수적 특성 및 변화특성의 모델링 성능이 뛰어난 Gabor 필터뱅크 기반의 특징벡터 추출 기술 및 분류 성능이 뛰어난 심화신경망을 도입한다. 제안하는 Gabor 필터뱅크 기반의 특징벡터 추출 기법은 비음성 구간 검출기를 통하여 음성/비음성을 구분한 후에 비음성 구간에서 23차의 Mel-filter bank 계수를 추출한 후에 이로부터 Gabor 필터를 이용하여 주변상황 추정을 위한 특징벡터를 추출하고, 이로부터 학습된 심화신경망을 통하여 신고자의 장소적 정보를 추정한다. 제안된 기법은 여러 가지 시나리오 환경에서 평가되었으며, 우수한 분류성능을 보였다.

A Merging Algorithm with the Discrete Wavelet Transform to Extract Valid Speech-Sounds (이산 웨이브렛 변환을 이용한 유효 음성 추출을 위한 머징 알고리즘)

  • Kim, Jin-Ok;Hwang, Dae-Jun;Paek, Han-Wook;Chung, Chin-Hyun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.3
    • /
    • pp.289-294
    • /
    • 2002
  • A valid speech-sound block can be classified to provide important information for speech recognition. The classification of the speech-sound block comes from the MRA(multi-resolution analysis) property of the DWT(discrete wavelet transform), which is used to reduce the computational time for the pre-processing of speech recognition. The merging algorithm is proposed to extract valid speech-sounds in terms of position and frequency range. It needs some numerical methods for an adaptive DWT implementation and performs unvoiced/voiced classification and denoising. Since the merging algorithm can decide the processing parameters relating to voices only and is independent of system noises, it is useful for extracting valid speech-sounds. The merging algorithm has an adaptive feature for arbitrary system noises and an excellent denoising SNR(signal-to-nolle ratio).

A High Speed Pitch Extraction Method Based on Peak Detection and AMDF (Peak 검출과 AMDF에 의한 고속도 음성주기 추출방법)

  • 성원용;은종관
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.17 no.4
    • /
    • pp.38-44
    • /
    • 1980
  • We present a high speed pitch estimation algorithm that is based on peak detection and average magnitude difference function (AMDF). A few pitch candidates are first estimated from the low-pass filtered (800 Hz) speech by a peak detection algorithm. AMDF values of the pitch candidatestare then calculated, and the pitch candidate that yields the minimum AMDF value is chosen as the desired pitch period. The new method requires far less computation time than other pitch estimation algorithms, while it yields fairly accurate results.

  • PDF

A study of speaker dependent speech recognition using neural network (신경회로망을 이용한 화자종속 음성인식 성능에 관한 연구)

  • 윤지원;이종수
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.153-156
    • /
    • 2003
  • 본 연구는 화자종속 소어휘 음성인식의 성능을 개선하는 데 그 목적이 있다. 인식에 사용될 음성의 특징을 얻기 위해 Winer 필터와 LPC&Cepstrum을 이용하여 프레임 당 12차 패턴을 추출하였다. 추출된 특징패턴을 인식하는 인식부는 특히 소어휘 음성인식에 우수한 성능을 보이는 기존의 역전파 신경회로망(Backpropagation Neural Network)에 인식율 개선을 위하여 퍼지추론시스템을 결합한 형태로 구현되었다. 실험결과 신경망만을 사용한 경우에 비하여 인식율이 향상됨을 연구하였다.

  • PDF

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.

Channel-attentive MFCC for Improved Recognition of Partially Corrupted Speech (부분 손상된 음성의 인식 향상을 위한 채널집중 MFCC 기법)

  • 조훈영;지상문;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.315-322
    • /
    • 2003
  • We propose a channel-attentive Mel frequency cepstral coefficient (CAMFCC) extraction method to improve the recognition performance of speech that is partially corrupted in the frequency domain. This method introduces weighting terms both at the filter bank analysis step and at the output probability calculation of decoding step. The weights are obtained for each frequency channel of filter bank such that the more reliable channel is emphasized by a higher weight value. Experimental results on TIDIGITS database corrupted by various frequency-selective noises indicated that the proposed CAMFCC method utilizes the uncorrupted speech information well, improving the recognition performance by 11.2% on average in comparison to a multi-band speech recognition system.

An Emotion Recognition Technique Using Speech Signals (음성신호를 이용한 감정인식)

  • Jeong, Byeong-Uk;Cheon, Seong-Pyo;Kim, Yeon-Tae;Kim, Seong-Sin
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.123-126
    • /
    • 2007
  • 본 논문은 음성신호를 이용한 감정인식에 관한 연구이다. 감정인식에 관한 연구는 휴먼 인터페이스(Human Interface) 기술의 발전에서 인간과 기계의 상호작용을 위한 것이다. 본 연구에서는 음성신호를 이용하여 감정을 분석하고자 한다. 음성신호의 감정인식을 위해서 음성신호의 특정을 추출하여야한다. 본 논문에서는 개인에 따른 음성신호의 감정인식을 하고자하였다. 그래서 화자인식에 많이 사용되는 음성신호 분석기법인 Perceptual Linear Prediction(PLP) 분석을 이용하여 음성신호의 특정을 추출하였다. 본 연구에서는 PLP 분석을 통하여 개인화된 감정 패턴을 생성하여 간단하면서도 실시간으로 음성신호로부터 감정을 평가 할 수 있는 알고리즘을 만들었다.

  • PDF