• Title/Summary/Keyword: TIMIT

Search Result 43, Processing Time 0.03 seconds

A corpus-based study on the effects of voicing and gender on American English Fricatives (성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구)

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.7-14
    • /
    • 2018
  • The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

  • Yoon, Tae-Jin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.117-124
    • /
    • 2015
  • This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.

On the Use of Various Resolution Filterbanks for Speaker Identification

  • Lee, Bong-Jin;Kang, Hong-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.80-86
    • /
    • 2007
  • In this paper, we utilize generalized warped filterbanks to improve the performance of speaker recognition systems. At first, the performance of speaker identification systems is analyzed by varying the type of warped filterbanks. Based on the results that the error pattern of recognition system is different depending on the type of filterbank used, we combine the likelihood values of the statistical models that consist of the features extracting from multiple warped filterbanks. Simulation results with TIMIT and NTIMIT database verify that the proposed system shows relative improvement of identification rate by 31.47% and 15.14% comparing it to the conventional system.

The bootstrap VQ model for automatic speaker recognition system (VQ 방식의 화자인식 시스템 성능 향상을 위한 부쓰트랩 방식 적용)

  • Kyung YounJeong;Lee Jin-Ick;Lee Hwang-Soo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.39-42
    • /
    • 2000
  • A bootstrap and aggregating (bagging) vector quantization (VQ) classifier is proposed for speaker recognition. This method obtains multiple training data sets by resampling the original training data set, and then integrates the corresponding multiple classifiers into a single classifier. Experiments involving a closed set, text-independent and speaker identification system are carried out using the TIMIT database. The proposed bagging VQ classifier shows considerably improved performance over the conventional VQ classifier.

  • PDF

A Study on the Simple Algorithm for Discrimination of Voiced Sounds (유성음 구간 검출을 위한 간단한 알고리즘에 관한 연구)

  • 장규철;우수영;박용규;유창동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.8
    • /
    • pp.727-734
    • /
    • 2002
  • A simple algorithm for discriminating voiced sounds in a speech is proposed in this paper. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy (웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.2
    • /
    • pp.139-145
    • /
    • 2014
  • In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.

Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • v.30 no.1
    • /
    • pp.89-100
    • /
    • 2008
  • We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable-feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable-feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable-feature selection achieves better performance than the conventional feature recombination system with reliable-feature selection.

  • PDF

Speaker Identification with Estimating the Number of Cluster Based on Boundary Subtractive Clustering (경계 차감 클러스터링에 기반한 클러스터 개수 추정 화자식별)

  • Lee, Youn-Jeong;Choi, Min-Jung;Seo, Chang-Woo;Hahn, Hern-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.199-206
    • /
    • 2007
  • In this paper we propose a new clustering algorithm that performs clustering the feature vectors for the speaker identification. Unlike typical clustering approaches, the proposed method performs the clustering without the initial guesses of locations of the cluster centers and a priori information about the number of clusters. Cluster centers are obtained incrementally by adding one cluster center at a time through the boundary subtractive clustering algorithm. The number of clusters is obtained from investigating the mutual relationship between clusters. The experimental results for artificial datum and TIMIT DB show the effectiveness of the proposed algorithm as compared with the conventional methods.

Nasal Place Detection with Acoustic Phonetic Parameters (음향음성학 파라미터를 사용한 비음 위치 검출)

  • Lee, Suk-Myung;Choi, Jeung-Yoon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.6
    • /
    • pp.353-358
    • /
    • 2012
  • This paper describes acoustic phonetic parameters for detecting nasal place in a knowledge-based speech recognition system. Initial acoustic phonetic parameters are selected by studying nasal production mechanisms which are radiation of the sound through the nasal cavity. Nasals are produced with differing articulatory configuration which can be classified by measuring acoustic phonetic parameters such as band energy ratio, band energy differences, formants and formant differences. These acoustic phonetic parameters were tested in a classification experiment among labial nasal, alveolar nasal and velar nasal. An overall classification rate of 57.5% is obtained using the proposed acoustic phonetic parameters on the TIMIT database.

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.