• Title/Summary/Keyword: 음성추출

Search Result 982, Processing Time 0.035 seconds

Normalized Recognition Method using Characteristic Vector of Speech Signal (음성의 특징벡터를 사용한 정규화 인식수법)

  • Choi, Jae-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.616-618
    • /
    • 2011
  • 본 논문에서는 음성의 특징벡터를 추출하여 음성인식을 위한 인식 알고리즘을 제안한다. 본 논문에서 제안하는 방법은 사람의 음성을 정규화하여 시간지연신경회로망을 사용하여 음성인식을 하는 인식 알고리즘이다. 본 논문에서는 시간지연신경회로망을 이용하여 입력되는 음성정보를 일정시간 동안 학습시킨 후에 새로이 입력되는 정보를 인식하는 수법이다. 본 실험에서는 음성인식률에 의하여 본 알고리즘의 유효성을 확인한다.

  • PDF

A New Feature for Speech Segments Extraction with Hidden Markov Models (숨은마코프모형을 이용하는 음성구간 추출을 위한 특징벡터)

  • Hong, Jeong-Woo;Oh, Chang-Hyuck
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.2
    • /
    • pp.293-302
    • /
    • 2008
  • In this paper we propose a new feature, average power, for speech segments extraction with hidden Markov models, which is based on mel frequencies of speech signals. The average power is compared with the mel frequency cepstral coefficients, MFCC, and the power coefficient. To compare performances of three types of features, speech data are collected for words with explosives which are generally known hard to be detected. Experiments show that the average power is more accurate and efficient than MFCC and the power coefficient for speech segments extraction in environments with various levels of noise.

Korean Single-Vowel Recognition Using Cumulants in Color Noisy Environment (유색 잡음 환경하에서 Cumulant를 이용한 한국어 단모음 인식)

  • Lee, Hyung-Gun;Yang, Won-Young;Cho, Yong-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2
    • /
    • pp.50-59
    • /
    • 1994
  • This paper presents a speech recognition method utilizing third-order cumulants as a feature vector and a neural network for recognition. The use of higher-order cumulants provides desirable uncoupling between the gaussian noise and speech, which enables us to estimate the coefficients of AR model without bias. Unlike the conventional method using second-order statistics, the proposed one exhibits low bias even in SNR as low as 0 dB at the expense of higher variance. It is confirmed through computer simulation that recognition rate of korean single-vowels with the cumulant-based method is much higher than the results with the conventional method even in low SNR.

  • PDF

Lip Detection using Color Distribution and Support Vector Machine for Visual Feature Extraction of Bimodal Speech Recognition System (바이모달 음성인식기의 시각 특징 추출을 위한 색상 분석자 SVM을 이용한 입술 위치 검출)

  • 정지년;양현승
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.403-410
    • /
    • 2004
  • Bimodal speech recognition systems have been proposed for enhancing recognition rate of ASR under noisy environments. Visual feature extraction is very important to develop these systems. To extract visual features, it is necessary to detect exact lip position. This paper proposed the method that detects a lip position using color similarity model and SVM. Face/Lip color distribution is teamed and the initial lip position is found by using that. The exact lip position is detected by scanning neighbor area with SVM. By experiments, it is shown that this method detects lip position exactly and fast.

Robust Distributed Speech Recognition under noise environment using MESS and EH-VAD (멀티밴드 스펙트럼 차감법과 엔트로피 하모닉을 이용한 잡음환경에 강인한 분산음성인식)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.1
    • /
    • pp.101-107
    • /
    • 2011
  • The background noises and distortions by channel are major factors that disturb the practical use of speech recognition. Usually, noise reduce the performance of speech recognition system DSR(Distributed Speech Recognition) based speech recognition also bas difficulty of improving performance for this reason. Therefore, to improve DSR-based speech recognition under noisy environment, this paper proposes a method which detects accurate speech region to extract accurate features. The proposed method distinguish speech and noise by using entropy and detection of spectral energy of speech. The speech detection by the spectral energy of speech shows good performance under relatively high SNR(SNR 15dB). But when the noise environment varies, the threshold between speech and noise also varies, and speech detection performance reduces under low SNR(SNR 0dB) environment. The proposed method uses the spectral entropy and harmonics of speech for better speech detection. Also, the performance of AFE is increased by precise speech detections. According to the result of experiment, the proposed method shows better recognition performance under noise environment.

The Recognition of Korean Syllables using Parameter Based on Principal Component Analysis (PCA 기반 파라메타를 이용한 숫자음 인식)

  • 박경훈;표창수;김창근;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.181-184
    • /
    • 2000
  • The new method of feature extraction is proposed, considering the statistic feature of human voice, unlike the conventional methods of voice extraction. PCA(principal Component Analysis) is applied to this new method. PCA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. Then the new method is applied to real voice recognition to assess performance. When results of the number recognition in this paper and the conventional Mel-Cepstrum of voice feature parameter are compared, there is 0.5% difference of recognition rate. Better recognition rate is expected than word or sentence recognition in that less convergence time than the conventional method in extracting voice feature. Also, better recognition tate is expected when the optimum vector is used by statistic feature of data.

  • PDF

A Study on Approximation-Synthesis of Transition Segment in Speech Signal (음성신호에서 천이구간의 근사합성에 관한 연구)

  • Lee See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.3
    • /
    • pp.167-173
    • /
    • 2005
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including Unvoiced Consonant) extraction method by using pitch pulses and Zero Crossing Rate in order to unexistent with a voiced and unvoiced consonants in a frame. And this paper present a TSIUVC approximate-synthesis method by using frequency band division. As a result, this method obtains a high quality approximation-synthesis waveform within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. And the TSIUVC extraction rate was $91\%$ for female voice and $96.2\%$ for male voice respectively This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis, and speech synthesis.

  • PDF

A Study on Multi-Pulse Speech Coding Method by using Selected Information in a Frequency Domain (주파수 영역의 선택정보를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • Lee See-Woo
    • Journal of Internet Computing and Services
    • /
    • v.7 no.4
    • /
    • pp.57-66
    • /
    • 2006
  • In this paper, I propose a new method of Multi-Pulse Speech Coding(FBD-MPC: Frequency Band Division MPC) by using TSIUVC(Transition Segment Including UnVoiced Consonant) searching, extraction and approximation-synthesis method in a frequency domain. As, a result. the extraction rates of TSIUVC are 84.8%(plosive), 94.9%(fricative) and 92.3%(affricative) in female voice, 88%(plosive), 94.9%(fricative) and 92.3%(affricative) in male voice respectively. Also, I obtain a high quality approximation-synthesis waveforms within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. I evaluate MPC by using switching information of voiced/unvoiced and FBD-MPC by using switching information of voiced/Silence/TSIUVC. As, a result, I knew that synthesis speech of FBD-MPC was better in speech quality than synthesis speech of the MPC.

  • PDF

Speech emotion recognition for affective human robot interaction (감성적 인간 로봇 상호작용을 위한 음성감정 인식)

  • Jang, Kwang-Dong;Kwon, Oh-Wook
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.555-558
    • /
    • 2006
  • 감정을 포함하고 있는 음성은 청자로 하여금 화자의 심리상태를 파악할 수 있게 하는 요소 중에 하나이다. 음성신호에 포함되어 있는 감정을 인식하여 사람과 로봇과의 원활한 감성적 상호작용을 위하여 특징을 추출하고 감정을 분류한 방법을 제시한다. 음성신호로부터 음향정보 및 운율정보인 기본 특징들을 추출하고 이로부터 계산된 통계치를 갖는 특징벡터를 입력으로 support vector machine (SVM) 기반의 패턴분류기를 사용하여 6가지의 감정- 화남(angry), 지루함(bored), 기쁨(happy), 중립(neutral), 슬픔(sad) 그리고 놀람(surprised)으로 분류한다. SVM에 의한 인식실험을 한 경우 51.4%의 인식률을 보였고 사람의 판단에 의한 경우는 60.4%의 인식률을 보였다. 또한 화자가 판단한 감정 데이터베이스의 감정들을 다수의 청자가 판단한 감정 상태로 변경한 입력을 SVM에 의해서 감정을 분류한 결과가 51.2% 정확도로 감정인식하기 위해 사용한 기본 특징들이 유효함을 알 수 있다.

  • PDF

Speech Signal Processing for Performance Improvement of Text-Based Video Segmentation (문자정보 기반 비디오 분할에서 성능 향상을 위한 음성신호처리)

  • 이용주;손종목;강경옥;배건성
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1999.11b
    • /
    • pp.187-191
    • /
    • 1999
  • 비디오 프로그램에서 영상 내에 포함되어 있는 문자정보는 동영상의 내용 검색 및 색인을 위한 비디오 분할에 사용될 수 있다. 일반적으로 장면 내에 포함되어 있는 문자들은 해상도가 낮고 글자 크기와 형태가 다양하기 때문에 추출과 인식이 어려울 뿐만 아니라 의도하지 않은 배경화면의 문자인 경우도 많기 때문에 내용기반 검색에는 사용되기가 어렵다. 그러나 비디오 내에 포함된 문자정보가 나타나는 시작 프레임과 끝나는 프레임을 검출하여 비디오 프로그램을 분할함으로써 내용기반요약정보를 만들 수 있으며, 동영상의 내용 검색 및 색인에 사용할 수 있다. 일반적으로 문자정보의 추출에 의해서 비디오를 분할할 때 음성정보는 전혀 고려되지 않으므로 분할된 비디오 정보를 재생할 경우음성신호가 단어 또는 어절/음절의 임의의 점에서 시작되고 끝나게 되어 듣기에 부자연스럽게 된다 따라서 본 논문에서는 뉴스방송의 비디오 프로그램에서 문자정보가 포함되어 는 비디오의 시작 프레임과 끝 프레임을 중심으로 그에 대응되는 구간의 음성신호를 검출한 후 이를 적절히 처리하여 분할 된 비디오를 재생할 때 음성신호가 보다 자연스럽게 들릴 수 있도록 하는 방법에 대해 연구하였다.

  • PDF