• Title/Summary/Keyword: 음성추출

Search Result 987, Processing Time 0.029 seconds

A Study on Speech Period and Pitch Detection for Continuous Speech Recognition (연속음성인식을 위한 음성구간과 피치검출에 관한 연구)

  • Kim Tai Suk;Chang jong chil
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.1
    • /
    • pp.56-61
    • /
    • 2005
  • In this thesis, propose speech period and pitch detection for continuous speech recognition. This mathod is distinguishes between vowel and consonant to frame unit in continuous speech, for distinguishable voice. Powerful extraction of speech period could threshold energy make use of input signal to real noise environment. Also algorithm of this method distinguish between vowel and consonant at the same time in voice make use of zero crossing rate and short time energy to extractible speech period.

  • PDF

Speaker Recognition Technique by Extracting Speech Feature Vector using Wiener Filter Method (위너필터 방법을 사용한 음성 특징 벡터 추출에 의한 화자인식 기법)

  • Choi, Jae-seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.617-618
    • /
    • 2017
  • 음성인식의 적절한 성능을 구하기 위하여 잡음환경 하에서 최적인 음성의 특징 벡터를 선택할 필요가 있다. 본 논문에서는 위너필터 방법과 인간의 청각계의 특성을 활용한 멜 주파수 켑스트럼 계수를 사용한 음성인식 방법을 제안한다. 본 논문에서 제안하는 음성의 특징 벡터는 음성 중에서 배경잡음을 제거한 후에 깨끗한 음성신호의 벡터를 추출하는 방법이며, 다층 퍼셉트론 신경회로망에 멜 주파수 켑스트럼 계수를 입력하여 학습시킴으로써 음성인식을 구현한다. 본 실험에서는 멜 주파수 켑스트럼 계수의 특징 벡터를 사용하여 백색잡음이 혼합된 경우에 대하여 음성인식 실험을 실시하였다.

  • PDF

A Study on Pitch Extraction Method using FIR-STREAK Digital Filter (FIR-STREAK 디지털 필터를 사용한 피치추출 방법에 관한 연구)

  • Lee, Si-U
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.1
    • /
    • pp.247-252
    • /
    • 1999
  • In order to realize a speech coding at low bit rates, a pitch information is useful parameter. In case of extracting an average pitch information form continuous speech, the several pitch errors appear in a frame which consonant and vowel are coexistent; in the boundary between adjoining frames and beginning or ending of a sentence. In this paper, I propose an Individual Pitch (IP) extraction method using residual signals of the FIR-STREAK digital filter in order to restrict the pitch extraction errors. This method is based on not averaging pitch intervals in order to accomodate the changes in each pitch interval. As a result, in case of Ip extraction method suing FIR-STREAK digital filter, I can't find the pitch errors in a frame which consonant and vowel are consistent; in the boundary between adjoining frames and beginning or ending of a sentence. This method has the capability of being applied to many fields, such as speech coding, speech analysis, speech synthesis and speech recognition.

  • PDF

Speech Synthesis Based on CVC Speech Segments Extracted from Continuous Speech (연속 음성으로부터 추출한 CVC 음성세그먼트 기반의 음성합성)

  • 김재홍;조관선;이철희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.10-16
    • /
    • 1999
  • In this paper, we propose a concatenation-based speech synthesizer using CVC(consonant-vowel-consonant) speech segments extracted from an undesigned continuous speech corpus. Natural synthetic speech can be generated by a proper modelling of coarticulation effects between phonemes and the use of natural prosodic variations. In general, CVC synthesis unit shows smaller acoustic degradation of speech quality since concatenation points are located in the consonant region and it can properly model the coarticulation of vowels that are effected by surrounding consonants. In this paper, we analyze the characteristics and the number of required synthesis units of 4 types of speech synthesis methods that use CVC synthesis units. Furthermore, we compare the speech quality of the 4 types and propose a new synthesis method based on the most promising type in terms of speech quality and implementability. Then we implement the method using the speech corpus and synthesize various examples. The CVC speech segments that are not in the speech corpus are substituted by demonstrate speech segments. Experiments demonstrate that CVC speech segments extracted from about 100 Mbytes continuous speech corpus can produce high quality synthetic speech.

  • PDF

Feature Extraction from the Strange Attractor for Speaker Recognition (화자인식을 위한 어트랙터로 부터의 음성특징추출)

  • Kim, Tae-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.26-31
    • /
    • 1994
  • A new feature extraction technique utilizing strange attractor and artificial neural network for speaker recognition is presented. Since many signals change their characteristics over long periods of time, simple time-domain processing techniques should e capable of providing useful information of signal features. In many cases, normal time series can be viewed as a dynamical system with a low-dimensional attractor that can be reconstructed from the time series using time delay. The reconstruction of strange attractor is described. In the technique, the raw signal will be reproduced into a geometric three dimensional attractor. Classification decision for speaker recognition is based upon the processing or sets of feature vectors that are derived from the attractor. Three different methods for feature extraction will be discussed. The methods include box-counting dimension, natural measure with regular hexahedron and plank-type box. An artificial neural network is designed for training the feature data generated by the method. The recognition rates are about 82%-96% depending on the extraction method.

  • PDF

Modified Mel Frequency Cepstral Coefficient for Korean Children's Speech Recognition (한국어 유아 음성인식을 위한 수정된 Mel 주파수 캡스트럼)

  • Yoo, Jae-Kwon;Lee, Kyoung-Mi
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.3
    • /
    • pp.1-8
    • /
    • 2013
  • This paper proposes a new feature extraction algorithm to improve children's speech recognition in Korean. The proposed feature extraction algorithm combines three methods. The first method is on the vocal tract length normalization to compensate acoustic features because the vocal tract length in children is shorter than in adults. The second method is to use the uniform bandwidth because children's voice is centered on high spectral regions. Finally, the proposed algorithm uses a smoothing filter for a robust speech recognizer in real environments. This paper shows the new feature extraction algorithm improves the children's speech recognition performance.

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

  • Chung, Kyungyong;Oh, SangYeob
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.335-340
    • /
    • 2018
  • Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.

Performance of analysis and extraction of speech feature using characteristics of basilar membrane (기저막 특성을 이용한 새로운 음성 특징 추출 및 성능 분석)

  • 이철희;신유식;정성환;김종교
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.153-156
    • /
    • 2000
  • 본 논문에서는 음성 인식률 향상을 위한 여러 가지방법들 중에서 음성특징 파라미터 추출 방법에 관한 한가지 방법을 제시하였다. 본 논문에서는 청각 특성을 기반으로 한 MFCC(met frequency cepstrum coef-ficients)와 성능 향상을 위한 방법으로 GFCC (gamma-tone filter frequency cepstrum coefficients)를 제시하고 음성 인식을 수행하여 성능을 분석하였다. MFCC에서 일반적으로 사용하는 임계 대역 필터로 삼각 필터(triangular filter) 대신 청각 구조의 기저막(basilar membrane)특성을 묘사한 gammatone 대역 통과 필터를 이용하여 특징 파라미터를 추출하였다. DTW 알고리즘으로 인식률을 분석한 결과 삼각 대역 필터를 이용한 것보다 gammatone 대역 통과 필터를 이용한 추출법이 약 2∼3%의 성능 향상을 보였다.

  • PDF

A Study on the Adaptive Method for Extracting Optimum Features of Speech Signal (음성신호의 최적특징을 적응적으로 추출하는 방법에 관한 연구)

  • 장승관;차태호;최웅세;김창석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.2
    • /
    • pp.373-380
    • /
    • 1994
  • In this paper, we proposed a method of extracting optimum features of speech signal to adjust signal level. For extracting features of speech signal we used FRLS(Fast Recursive Least Square) algorithm, we adjusted each frames of equal to constant level, and extracted optimum features of speech signal by using equalized autocorrelation function proposed in this paper.

  • PDF

A study on pitch detection for RUI emotion classification based on voice (RUI용 음성신호기반의 감정분류를 위한 피치검출기에 관한 연구)

  • Byun, Sung-Woo;Lee, Seok-Pil
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.421-424
    • /
    • 2015
  • 컴퓨터 기술이 발전하고 컴퓨터 사용이 일반화 되면서 휴먼 인터페이스에 대한 많은 연구들이 진행되어 왔다. 휴먼 인터페이스에서 감정을 인식하는 기술은 컴퓨터와 사람간의 상호작용을 위해 중요한 기술이다. 감정을 인식하는 기술에서 분류 정확도를 높이기 위해 특징벡터를 정확하게 추출하는 것이 중요하다. 본 논문에서는 정확한 피치검출을 위하여 음성신호에서 음성 구간과 비 음성구간을 추출하였으며, Speech Processing 분야에서 사용되는 전 처리 기법인 저역 필터와 유성음 추출 기법, 후처리 기법인 Smoothing 기법을 사용하여 피치 검출을 수행하고 비교하였다. 그 결과, 전 처리 기법인 유성음 추출 기법과 후처리 기법인 Smoothing 기법은 피치 검출의 정확도를 높였고, 저역 필터를 사용한 경우는 피치 검출의 정확도가 떨어트렸다.

  • PDF