Search | Korea Science

A corpus-based study on the effects of voicing and gender on American English Fricatives (성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구)

Yoon, Tae-Jin
- Phonetics and Speech Sciences
- /
- v.10 no.2
- /
- pp.7-14
- /
- 2018
The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.
https://doi.org/10.13064/KSSS.2018.10.2.007 인용 PDF KSCI

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

Yoon, Tae-Jin
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.117-124
- /
- 2015
This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.
https://doi.org/10.13064/KSSS.2015.7.1.117 인용 PDF KSCI

On the Use of Various Resolution Filterbanks for Speaker Identification

Lee, Bong-Jin;Kang, Hong-Goo;Youn, Dae-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.3E
- /
- pp.80-86
- /
- 2007
In this paper, we utilize generalized warped filterbanks to improve the performance of speaker recognition systems. At first, the performance of speaker identification systems is analyzed by varying the type of warped filterbanks. Based on the results that the error pattern of recognition system is different depending on the type of filterbank used, we combine the likelihood values of the statistical models that consist of the features extracting from multiple warped filterbanks. Simulation results with TIMIT and NTIMIT database verify that the proposed system shows relative improvement of identification rate by 31.47% and 15.14% comparing it to the conventional system.
PDF KSCI

The bootstrap VQ model for automatic speaker recognition system (VQ 방식의 화자인식 시스템 성능 향상을 위한 부쓰트랩 방식 적용)

Kyung YounJeong;Lee Jin-Ick;Lee Hwang-Soo
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.39-42
- /
- 2000
A bootstrap and aggregating (bagging) vector quantization (VQ) classifier is proposed for speaker recognition. This method obtains multiple training data sets by resampling the original training data set, and then integrates the corresponding multiple classifiers into a single classifier. Experiments involving a closed set, text-independent and speaker identification system are carried out using the TIMIT database. The proposed bagging VQ classifier shows considerably improved performance over the conventional VQ classifier.
PDF

A Study on the Simple Algorithm for Discrimination of Voiced Sounds (유성음 구간 검출을 위한 간단한 알고리즘에 관한 연구)

장규철;우수영;박용규;유창동
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.8
- /
- pp.727-734
- /
- 2002
A simple algorithm for discriminating voiced sounds in a speech is proposed in this paper. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.
PDF KSCI

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy (웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별)

Koo, Boneung
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.2
- /
- pp.139-145
- /
- 2014
In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.
https://doi.org/10.7776/ASK.2014.33.2.139 인용 PDF KSCI

Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection

Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
- ETRI Journal
- /
- v.30 no.1
- /
- pp.89-100
- /
- 2008
We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable-feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable-feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable-feature selection achieves better performance than the conventional feature recombination system with reliable-feature selection.
PDF

Nasal Place Detection with Acoustic Phonetic Parameters (음향음성학 파라미터를 사용한 비음 위치 검출)

Lee, Suk-Myung;Choi, Jeung-Yoon;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.6
- /
- pp.353-358
- /
- 2012
This paper describes acoustic phonetic parameters for detecting nasal place in a knowledge-based speech recognition system. Initial acoustic phonetic parameters are selected by studying nasal production mechanisms which are radiation of the sound through the nasal cavity. Nasals are produced with differing articulatory configuration which can be classified by measuring acoustic phonetic parameters such as band energy ratio, band energy differences, formants and formant differences. These acoustic phonetic parameters were tested in a classification experiment among labial nasal, alveolar nasal and velar nasal. An overall classification rate of 57.5% is obtained using the proposed acoustic phonetic parameters on the TIMIT database.
https://doi.org/10.7776/ASK.2012.31.6.353 인용 PDF KSCI

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

Lee, Suk-Myung;Choi, Jeung-Yoon
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.2
- /
- pp.167-173
- /
- 2013
This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.
https://doi.org/10.7776/ASK.2013.32.2.167 인용 PDF KSCI

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises (비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지)

Koo, Boneung
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.4
- /
- pp.310-315
- /
- 2015
A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.
https://doi.org/10.7776/ASK.2015.34.4.310 인용 PDF KSCI

Search Result 43, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)