• Title/Summary/Keyword: speech rate characteristic

Search Result 36, Processing Time 0.023 seconds

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF

Speech Rate and Pause Characteristics in Patients with Parkinson's Disease (파킨슨병 환자의 말 속도와 쉼 특성)

  • Ko, Yol-Mae;Kim, Deog-Young;Choi, Yae-Lin;Kim, Hyang-Hee
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.173-184
    • /
    • 2010
  • The purpose of this study is to investigate the speech rate characteristics (whole speech rate, articulation speech rate, and articulation percentage) and the pause characteristics (pause duration, pause frequency, and pause percentage) of Korean-speaking patients with idiopathic Parkinson's disease (referred to as IPD hereafter). The study aims first to examine the differences between the patient group with IPD and the other group without IPD concerning those measurements, and secondly to investigate the relevant measurements of the two groups following the sentence length changes. There were two groups of subjects in this study. The first group consisted of 7 subjects between the ages of 50 and 60 who were diagnosed as IPD with mild severity, and the second group consisted of 13 subjects without IPD who matched the age and gender of those in the first group. Those two groups were asked to read 8 different sentences in length at habitual speed. Speech rate and pause characteristics of the two groups were measured and compared each other. The followings results were observed. First, in a study of speech rate characteristics, the whole speech rate and the articulation speech rate of the patient group scored within the normal range, which is same as the group without IPD. On the other hand, with regard to the pause characteristics, differences between two groups were shown; the patient group had shorter pause duration, lower pause frequency, lower pause percentage, and higher articulation percentage. Secondly, in a study of relevant measurements following the sentence length, both groups showed a tendency for whole speech rate and articulation rate to increase as the length of the sentence increased, but the result of pause characteristics showed a difference between two groups. While the group without IPD showed a longer pause duration, higher pause frequency, and higher pause percentage as the length of sentences increases, no differences were shown among the patient group concerning the length of sentences. This study suggests a result that the patients with IPD of mild severity retained a normal speech rate and examined pause characteristics of the patient group which showed a different result from the group without IPD in terms of quality. Future studies on the speech rate and pause characteristics of Korean-speaking patients with IPD in various severities.

  • PDF

Very Low Bit Rate Speech Coder of Analysis by Synthesis Structure Using ZINC Function Excitation (ZINC 함수 여기신호를 이용한 분석-합성 구조의 초 저속 음성 부호화기)

  • Seo, Sang-Won;Kim, Young-Jun;Kim, Jong-Hak;Kim, Young-Ju;Lee, In-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.349-350
    • /
    • 2006
  • This paper presents very low bit rate speech coder, ZFE-CELP(ZINC Function Excitation-Code Excited Linear Prediction). The ZFE-CELP speech codec is based on a ZINC function and CELP modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. And this paper suggest strategies to improve the speech quality of the very low bit rate speech coder.

  • PDF

Method of Speech Feature Parameter Extraction Using Modified-MFCC (Modified-MECC를 이용한 음성 특징 파라미터 추출 방법)

  • 이상복;이철희;정성환;김종교
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.269-272
    • /
    • 2001
  • In speech recognition technology, the utterance of every talker have special resonant frequency according to shape of talker's lip and to the motion of tongue. And utterances are different according to each talker. Accordingly, we need the superior moth-od of speech feature parameter extraction which reflect talker's characteristic well. This paper suggests the modified-MfCC combined existing MFCC with gammatone filter. We experimented with speech data from telephone and then we obtained results of enhanced speech recognition rate which is higher than that of the other methods.

  • PDF

Extraction of Speaker Recognition Parameter Using Chaos Dimension (카오스차원에 의한 화자식별 파라미터 추출)

  • Yoo, Byong-Wook;Kim, Chang-Seok
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.285-293
    • /
    • 1997
  • This paper was constructed to investigate strange attractor in considering speech which is regarded as chaos in that the random signal appears in the deterministic raising system. This paper searches for the delay time from AR model power spectrum for constructing fit attractor for speech signal. As a result of applying Taken's embedding theory to the delay time, an exact correlation dimension solution is obtained. As a result of this consideration of speech, it is found that it has more speaker recognition characteristic parameter, and gains a large speaker discrimination recognition rate.

  • PDF

Implementation of Wideband Waveform Interpolation Coder for TTS DB Compression (TTS DB 압축을 위한 광대역 파형보간 부호기 구현)

  • Yang, Hee-Sik;Hahn, Min-Soo
    • MALSORI
    • /
    • v.55
    • /
    • pp.143-158
    • /
    • 2005
  • The adequate compression algorithm is essential to achieve high quality embedded TTS system. in this paper, we Propose waveform interpolation coder for TTS corpus compression after many speech coder investigation. Unlike speech coders in communication system, compression rate and anality are more important factors in TTS DB compression than other performance criteria. Thus we select waveform interpolation algorithm because it provides good speech quality under high compression rate at the cost of complexity. The implemented coder has bit rate 6kbps with quality degradation 0.47. The performance indicates that the waveform interpolation is adequate for TTS DB compression with some further study.

  • PDF

Korean Speech Recognition Based on Syllable (음절을 기반으로한 한국어 음성인식)

  • Lee, Young-Ho;Jeong, Hong
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.1
    • /
    • pp.11-22
    • /
    • 1994
  • For the conventional systme based on word, it is very difficult to enlarge the number of vocabulary. To cope with this problem, we must use more fundamental units of speech. For example, syllables and phonemes are such units, Korean speech consists of initial consonants, middle vowels and final consonants and has characteristic that we can obtain syllables from speech easily. In this paper, we show a speech recognition system with the advantage of the syllable characteristics peculiar to the Korean speech. The algorithm of recognition system is the Time Delay Neural Network. To recognize many recognition units, system consists of initial consonants, middle vowels, and final consonants recognition neural network. At first, our system recognizes initial consonants, middle vowels and final consonants. Then using this results, system recognizes isolated words. Through experiments, we got 85.12% recognition rate for 2735 data of initial consonants, 86.95% recognition rate for 3110 data of middle vowels, and 90.58% recognition rate for 1615 data of final consonants. And we got 71.2% recognition rate for 250 data of isolated words.

  • PDF

Performance of GMM and ANN as a Classifier for Pathological Voice

  • Wang, Jianglin;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.151-162
    • /
    • 2007
  • This study focuses on the classification of pathological voice using GMM (Gaussian Mixture Model) and compares the results to the previous work which was done by ANN (Artificial Neural Network). Speech data from normal people and patients were collected, then diagnosed and classified into two different categories. Six characteristic parameters (Jitter, Shimmer, NHR, SPI, APQ and RAP) were chosen. Then the classification method based on the artificial neural network and Gaussian mixture method was employed to discriminate the data into normal and pathological speech. The GMM method attained 98.4% average correct classification rate with training data and 95.2% average correct classification rate with test data. The different mixture number (3 to 15) of GMM was used in order to obtain an optimal condition for classification. We also compared the average classification rate based on GMM, ANN and HMM. The proper number of mixtures on Gaussian model needs to be investigated in our future work.

  • PDF

Isolated-Word Speech Recognition in Telephone Environment Using Perceptual Auditory Characteristic (인지적 청각 특성을 이용한 고립 단어 전화 음성 인식)

  • Choi, Hyung-Ki;Park, Ki-Young;Kim, Chong-Kyo
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.2
    • /
    • pp.60-65
    • /
    • 2002
  • In this paper, we propose GFCC(gammatone filter frequency cepstrum coefficient) parameter which was based on the auditory characteristic for accomplishing better speech recognition rate. And it is performed the experiment of speech recognition for isolated word acquired from telephone network. For the purpose of comparing GFCC parameter with other parameter, the experiment of speech recognition are carried out using MFCC and LPCC parameter. Also, for each parameter, we are implemented CMS(cepstral mean subtraction)which was applied or not in order to compensate channel distortion in telephone network. Accordingly, we found that the recognition rate using GFCC parameter is better than other parameter in the experimental result.

Performance Improvement of Speech Recognizer in Noisy Environments Based on Auditory Modeling (청각 구조를 이용한 잡음 음성의 인식 성능 향상)

  • Jung, Ho-Young;Kim, Do-Yeong;Un, Chong-Kwan;Lee, Soo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.51-57
    • /
    • 1995
  • In this paper, we study a noise-robust feature extraction method of speech signal based on auditory modeling. The auditory model consists of a basilar membrane, a hair cell model and spectrum output stage. Basilar membrane model describes a response characteristic of membrane according to vibration in speech wave, and is represented as a band-pass filter bank. Hair cell model describes a neural transduction according to displacements of the basilar membrane. It responds adaptively to relative values of input and plays an important role for noise-robustness. Spectrum output stage constructs a mean rate spectrum using the average firing rate of each channel. And we extract feature vectors using a mean rate spectrum. Simulation results show that when auditory-based feature extraction is used, the speech recognition performance in noisy environments is improved compared to other feature extraction methods.

  • PDF