• Title/Summary/Keyword: Speech Data

Search Result 1,392, Processing Time 0.03 seconds

Parts-Based Feature Extraction of Spectrum of Speech Signal Using Non-Negative Matrix Factorization

  • Park, Jeong-Won;Kim, Chang-Keun;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • v.1 no.4
    • /
    • pp.209-212
    • /
    • 2003
  • In this paper, we proposed new speech feature parameter through parts-based feature extraction of speech spectrum using Non-Negative Matrix Factorization (NMF). NMF can effectively reduce dimension for multi-dimensional data through matrix factorization under the non-negativity constraints, and dimensionally reduced data should be presented parts-based features of input data. For speech feature extraction, we applied Mel-scaled filter bank outputs to inputs of NMF, than used outputs of NMF for inputs of speech recognizer. From recognition experiment result, we could confirm that proposed feature parameter is superior in recognition performance than mel frequency cepstral coefficient (MFCC) that is used generally.

Performance improvement of text-dependent speaker verification system using blind speech segmentation and energy weight (Blind speech segmentation과 에너지 가중치를 이용한 문장 종속형 화자인식기의 성능 향상)

  • Kim Jung-Gon;Kim Hyung Soon
    • MALSORI
    • /
    • no.47
    • /
    • pp.131-140
    • /
    • 2003
  • We propose a new method of generating client models for HMM based text-dependent speaker verification system with only a small amount of training data. To make a client model, statistical methods such as segmental K-means algorithm are widely used, but they do not guarantee the quality or reliability of a model when only limited data are avaliable. In this paper, we propose a blind speech segmentation based on level building DTW algorithm as an alternative method to make a client model with limited data. In addition, considering the fact that voiced sounds have much more speaker-specific information than unvoiced sounds and energy of the former is higher than that of the latter, we also propose a new score evaluation method using the observation probability raised to the power of weighting factor estimated from the normalized log energy. Our experiment shows that the proposed methods are superior to conventional HMM based speaker verification system.

  • PDF

A Longitudinal Case Study of Late Babble and Early Speech in Southern Mandarin

  • Chen, Xiaoxiang
    • Cross-Cultural Studies
    • /
    • v.20
    • /
    • pp.5-27
    • /
    • 2010
  • This paper studies the relation between canonical/variegated babble (CB/VB) and early speech in an infant acquiring Mandarin Chinese from 9 to 17 months. The infant was audio-and video-taped in her home almost every week. The data analyzed here come from 1,621 utterances extracted from 23 sessions ranging from 30 minutes to one hour, from age 00:09;07 to 01:05;27. The data was digitized, and segments from 23 sessions were transcribed in narrow IPA and coded for analysis. Babble was coded from age 00:09;07 to 01:00;00, and words were coded from 01:00;00 to 01:05;27, proto-words appeared at 11 months, and some babble was still present after 01:10;00. 3821 segments were counted in CB/VB utterances, plus the segments found in 899 word tokens. The data transcription was completed and checked by the author and was rechecked by two other researchers who majored in Chinese phonetics in order to ensure the reliability, we reached an agreement of 95.65%. Mandarin Chinese is phonetically very rich in consonants, especially affricates: it has aspirated and unaspirated stops in labial, alveolar, and velar places of articulation; affricates and fricatives in alveolar, retroflex, and palatal places; /f/; labial, alveolar, and velar nasals; a lateral;[h]; and labiovelar and palatal glides. In the child's pre-speech phonetic repertoire, 7 different consonants and 10 vowels were transcribed at 00:09;07. By 00:10;16, the number of phones was more than doubled (17 consonants, 25 vowels), but the rate of increase slowed after 11 months of age. The phones from babbling remained active throughout the child's early and subsequent speech. The rank order of the occurrence of the major class types for both CB and early speech was: stops, approximants, nasals, affricates, fricatives and lateral. As expected, unaspirated stops outnumbered aspirated stops, and front stops and nasals were more frequent than back sounds in both types of utterances. The fact that affricates outnumbered fricatives in the child's late babble indicates the pre-speech influence of the ambient language. The analysis of the data also showed that: 1) the phonetic characteristics of CB/VB and early meaningful speech are extremely similar. The similarities of CB/VB and speech prove that the two are deeply related; 2) The infant has demonstrated similar preferences for certain types of sounds in the two stages; 3) The infant's babbling was patterned at segmental level, and this regularity was similarly evident in the early speech of children. The three types being coronal plus front vowel; labial plus central and dorsal plus back vowel exhibited much overlap in the phonetic forms of CB/ VB and early speech. So the child's CB/ VB at this stage already shared the basic architecture, composition and representation of early speech. The evidence of similarity between CB/VB and early speech leaves no doubt that phones present in CB/VB are indeed precursors to early speech.

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

  • Chung, Kyungyong;Oh, SangYeob
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.335-340
    • /
    • 2018
  • Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.

Effect of Speech Tasks on Habitual Pitch (발화 유형에 따른 습관적 음도의 차이)

  • Lim, Hye-Jin;Han, Ji-Yeon
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.55-58
    • /
    • 2007
  • This study was investigated the effect of speech tasks on habitual pitch. Seven male and female young adult speakers participated in this study. The experiment consisted of seven different speech tasks: counting, reading, sustained phonation /a/, prolonged /i:/, answering /ne/. Data was analyzed via Visi-pitch IV. The results showed that there was no significant F0 difference among speech tasks.

  • PDF

A Study on Phoneme Recognition using Neural Networks and Fuzzy logic (신경망과 퍼지논리를 이용한 음소인식에 관한 연구)

  • Han, Jung-Hyun;Choi, Doo-Il
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2265-2267
    • /
    • 1998
  • This paper deals with study of Fast Speaker Adaptation Type Speech Recognition, and to analyze speech signal efficiently in time domain and time-frequency domain, utilizes SCONN[1] with Speech Signal Process suffices for Fast Speaker Adaptation Type Speech Recognition, and examined Speech Recognition to investigate adaptation of system, which has speech data input after speaker dependent recognition test.

  • PDF

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

  • Chung, Hoon;Lee, Sung Joo;Lee, Yun Keun
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.714-720
    • /
    • 2014
  • In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter (훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식)

  • Jung, Sung-Yun;Bae, Keun-Sung
    • MALSORI
    • /
    • no.53
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

Automated Speech Analysis Applied to Sasang Constitution Classification (음성을 이용한 사상체질 분류 알고리즘)

  • Kang, Jae-Hwan;Yoo, Jong-Hyang;Lee, Hae-Jung;Kim, Jong-Yeol
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.155-163
    • /
    • 2009
  • This paper introduces an automatic voice classification system for the diagnosis of individual constitution based on Sasang Constitutional Medicine (SCM) in Traditional Korean Medicine (TKM). For the developing of this algorithm, we used the voices of 473 speakers and extracted a total of 144 speech features from the speech data consisting of five sustained vowels and one sentence. The classification system, based on a rule-based algorithm that is derived from a non parametric statistical method, presents binary negative decisions. In conclusion, 55.7% of the speech data were diagnosed by this system, of which 72.8% were correct negative decisions.

  • PDF

Implementation of Dual Rate G.723 ADPCM Speech codec (16Kbps와 40Kbps의 Dual Rate G.723 ADPCM 음성 codec 구현)

  • Kim, Jae-Ohe;Han, Kyong-Ho
    • Proceedings of the KIEE Conference
    • /
    • 1998.07g
    • /
    • pp.2480-2482
    • /
    • 1998
  • In this paper, the implementation of dual rate ADPCM using G.723 16Kbps and 40Kbps speech codec algorithm is handled. For small signals, the low rate 16Kbps coding algorithm shows the same SNR as the high rate 40Kbps coding algorithm, while the low rate 16Kbps coding algorithm shows the lower SNR than the high rate 40Kbps coding algorithm for large signal. To obtain the good trade-off between the data rate and synthesized speech quality, we applied low rate 16Kbps for the small signal and high rate 40Kbps for the large signal. Various threshold values determining the rate are tested for good trade off data rate and speech quality. Also the low pass filter effect of speech input and output devices is simulated at several cut-off frequencies. To simulation result shows the good speech quality at a low rate comparing with 16Kbps & 40Kbps.

  • PDF