Search | Korea Science

A Comparison of Front-Ends for Robust Speech Recognition

Kim, Doh-Suk;Jeong, Jae-Hoon;Lee, Soo-Young;Kil, Rhee M.
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.3E
- /
- pp.3-11
- /
- 1998
Zero-crossings with Peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features form speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.
PDF

Noise Spectrum Estimation Using Line Spectral Frequencies for Robust Speech Recognition

Jang, Gil-Jin;Park, Jeong-Sik;Kim, Sang-Hun
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.3
- /
- pp.179-187
- /
- 2012
This paper presents a novel method for estimating reliable noise spectral magnitude for acoustic background noise suppression where only a single microphone recording is available. The proposed method finds noise estimates from spectral magnitudes measured at line spectral frequencies (LSFs), under the observation that adjacent LSFs are near the peak frequencies and isolated LSFs are close to the relatively flattened valleys of LPC spectra. The parameters used in the proposed method are LPC coefficients, their corresponding LSFs, and the gain of LPC residual signals, so it suits well to LPC-based speech coders.
https://doi.org/10.7776/ASK.2012.31.3.179 인용 PDF KSCI

Implementation of the Auditory Sense for the Smart Robot: Speaker/Speech Recognition (로봇 시스템에의 적용을 위한 음성 및 화자인식 알고리즘)

Jo, Hyun;Kim, Gyeong-Ho;Park, Young-Jin
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2007.05a
- /
- pp.1074-1079
- /
- 2007
We will introduce speech/speaker recognition algorithm for the isolated word. In general case of speaker verification, Gaussian Mixture Model (GMM) is used to model the feature vectors of reference speech signals. On the other hand, Dynamic Time Warping (DTW) based template matching technique was proposed for the isolated word recognition in several years ago. We combine these two different concepts in a single method and then implement in a real time speaker/speech recognition system. Using our proposed method, it is guaranteed that a small number of reference speeches (5 or 6 times training) are enough to make reference model to satisfy 90% of recognition performance.
PDF

On Detecting the Transition Regions of Phonemes by Using the Asymmetrical Rate of Speech Waveforms (음성파형의 비대칭율을 이용한 음소의 전이구간 검출)

Bae, Myung-Jin;Lee, Eul-jae;Ann, Sou-Guil
- The Journal of the Acoustical Society of Korea
- /
- v.9 no.4
- /
- pp.55-65
- /
- 1990
To recognize continued speech, it is necessary to segment the connected acoustic signal into phonetic units, In this paper, as a parameter to detect transition regions in continued speech, we propose a new asymmetrical rate. The suggested rate represents a change rate of magnitude of speech signals. As comparing this rate with other rate in adjacent frame, the state of the frame can be distinguished between steady state and transient state.
PDF

On Detcdting the Steady State Segments of Speech Waveform by using the Normalized AMDF (규준화된 AMDF 이용한 음성파형의안정상태 구간검출)

Bae, Myung-Jin;Kim, Ul-Je;Ahn, Sou-Guil
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.3
- /
- pp.44-50
- /
- 1991
To recognize continued speech, it is necessary to segment the connected acoustic signal into phonetic units. In this paper, as a parameter to detect the transition regions in continued speech, we propose a new noramlized AMDF. The suggested parameter represents a change rate of magnitude of speech signals. As comparing this value with the adjactent frames value the state of the frames can be distinguished as a level between the steady state and transient state.
PDF

Performance Analysis of A Variable Bit Rate Speech Coder (가변 비트율 음성 부호화기의 성능분석)

Iem, Byeong-Gwan
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.62 no.12
- /
- pp.1750-1754
- /
- 2013
A variable bit rate speech coder is presented. The coder is based on the observation that a speech signal can be viewed as a combination of piecewise linear signals in a short time period. The encoder detects the sample points where the slope of the signal changes, which are called the inflection points in this paper. The coder transmits the location and value for the detected inflection sample, but only the location information for the noninflection samples. In the decoder, the noninflection samples are estimated with interpolation of the received information. Several factors affecting the performance of the coder have been tested through simulation. Simulation results show that the linear interpolation produces 1 ~ 5 dB improvement over the cubic spline interpolation. And the -law companding does not provide any benefit when it is applied before the inflection detection. With low threshold values in the inflection point detection, the coder shows better MOS and more than 16 dB improvement in SNR compared to the continuously variable slope delta modulation (CVSDM).
https://doi.org/10.5370/KIEE.2013.62.12.1750 인용 PDF KSCI KPUBS HTML

Analysis of Indirect Uses of Interrogative Sentences Carrying Anger

Min, Hye-Jin;Park, Jong-C.
- Proceedings of the Korean Society for Language and Information Conference
- /
- 2007.11a
- /
- pp.311-320
- /
- 2007
Interrogative sentences are generally used to perform speech acts of directly asking a question or making a request, but they are also used to convey such speech acts indirectly. In the utterances, such indirect uses of interrogative sentences usually carry speaker's emotion with a negative attitude, which is close to an expression of anger. The identification of such negative emotion is known as a difficult problem that requires relevant information in syntax, semantics, discourse, pragmatics, and speech signals. In this paper, we argue that the interrogatives used for indirect speech acts could serve as a dominant marker for identifying the emotional attitudes, such as anger, as compared to other emotion-related markers, such as discourse markers, adverbial words, and syntactic markers. To support such an argument, we analyze the dialogues collected from the Korean soap operas, and examine individual or cooperative influences of the emotion-related markers on emotional realization. The user study shows that the interrogatives could be utilized as a promising device for emotion identification.
PDF

A corpus-based study on the effects of voicing and gender on American English Fricatives (성대진동 및 성별이 미국영어 마찰음에 미치는 효과에 관한 코퍼스 기반 연구)

Yoon, Tae-Jin
- Phonetics and Speech Sciences
- /
- v.10 no.2
- /
- pp.7-14
- /
- 2018
The paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of voicing in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2,342 different sentences, and comprises more than five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender, voicing, and place of articulation as independent factors. The results of the acoustic analyses revealed that acoustic signals interact in a complex way to signal the gender, place, and voicing of fricatives. Classification experiments using a multiclass support vector machine (SVM) revealed that 78.7% of fricatives are correctly classified. The majority of errors stem from the misclassification of /θ/ as [f] and /ʒ/ as [z]. The average accuracy of gender classification is 78.7%. Most errors result from the classification of female speakers as male speakers. The paper contributes to the understanding of the effects of voicing and gender on fricatives in a large-scale speech corpus.
https://doi.org/10.13064/KSSS.2018.10.2.007 인용 PDF KSCI

Computer-Based Fluency Evaluation of English Speaking Tests for Koreans (한국인을 위한 영어 말하기 시험의 컴퓨터 기반 유창성 평가)

Jang, Byeong-Yong;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.6 no.2
- /
- pp.9-20
- /
- 2014
In this paper, we propose an automatic fluency evaluation algorithm for English speaking tests. In the proposed algorithm, acoustic features are extracted from an input spoken utterance and then fluency score is computed by using support vector regression (SVR). We estimate the parameters of feature modeling and SVR using the speech signals and the corresponding scores by human raters. From the correlation analysis results, it is shown that speech rate, articulation rate, and mean length of runs are best for fluency evaluation. Experimental results show that the correlation between the human score and the SVR score is 0.87 for 3 speaking tests, which suggests the possibility of the proposed algorithm as a secondary fluency evaluation tool.
https://doi.org/10.13064/KSSS.2014.6.2.009 인용 PDF KSCI

Review of Standard Sound Quality Assessment Methods for the Transmitted and Processed Sounds (음질 평가법의 표준과 연구 동향 - 전송 처리음 분야)

Oh, Wongeun
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.3
- /
- pp.214-226
- /
- 2013
Assessing the quality of audio signals is an important consideration in making high quality sounds and various methods have been developed. This paper provides a general framework of sound quality and a technical overview of the international standard methods which are described in ITU-T, ITU-R, IEC and ANSI Recommendations in the speech intelligibility, speech quality, and audio quality areas. In addition, some recent findings and future works are included.
https://doi.org/10.7776/ASK.2013.32.3.214 인용 PDF KSCI

Search Result 498, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)