• Title/Summary/Keyword: cepstral distance

Search Result 41, Processing Time 0.02 seconds

A Method of Evaluating Korean Articulation Quality for Rehabilitation of Articulation Disorder in Children

  • Lee, Keonsoo;Nam, Yunyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.8
    • /
    • pp.3257-3269
    • /
    • 2020
  • Articulation disorders are characterized by an inability to achieve clear pronunciation due to misuse of the articulators. In this paper, a method of detecting such disorders by comparing to the standard pronunciations is proposed. This method defines the standard pronunciations from the speeches of normal children by clustering them with three features which are the Linear Predictive Cepstral Coefficient (LPCC), the Mel-Frequency Cepstral Coefficient (MFCC), and the Relative Spectral Analysis Perceptual Linear Prediction (RASTA-PLP). By calculating the distance between the centroid of the standard pronunciation and the inputted pronunciation, disordered speech whose features locates outside the cluster is detected. 89 children (58 of normal children and 31 of children with disorders) were recruited. 35 U-TAP test words were selected and each word's standard pronunciation is made from normal children and compared to each pronunciation of children with disorders. In the experiments, the pronunciations with disorders were successfully distinguished from the standard pronunciations.

Speech/Music Discrimination Using Multi-dimensional MMCD (다차원 MMCD를 이용한 음성/음악 판별)

  • Choi, Mu-Yeol;Song, Hwa-Jeon;Park, Seul-Han;Kim, Hyung-Soon
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.142-145
    • /
    • 2006
  • Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of it is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDs with different ranges of candidate frames. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.

  • PDF

Locating the damaged storey of a building using distance measures of low-order AR models

  • Xing, Zhenhua;Mita, Akira
    • Smart Structures and Systems
    • /
    • v.6 no.9
    • /
    • pp.991-1005
    • /
    • 2010
  • The key to detecting damage to civil engineering structures is to find an effective damage indicator. The damage indicator should promptly reveal the location of the damage and accurately identify the state of the structure. We propose to use the distance measures of low-order AR models as a novel damage indicator. The AR model has been applied to parameterize dynamical responses, typically the acceleration response. The premise of this approach is that the distance between the models, fitting the dynamical responses from damaged and undamaged structures, may be correlated with the information about the damage, including its location and severity. Distance measures have been widely used in speech recognition. However, they have rarely been applied to civil engineering structures. This research attempts to improve on the distance measures that have been studied so far. The effect of varying the data length, number of parameters, and other factors was carefully studied.

Performance Comparison for Objective Measures of Speech Quality Evaluation in PCS Wireless Telephone Network (PCS 이동전화망에서의 객관적인 음질평가척도별 성능비교)

  • Kim Nag-Cheol;Kim Kwang-Soo;Jung Ho-Youl;Chung Hyun-Yeol
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.48-51
    • /
    • 1999
  • 본 연구에서는 PCS 이동전화의 객관적 통화품질평가 척도개발을 위한 기초연구로 기존의 CD(Cepstral Distance), MSD (Mel Spectral Distance), BSD(Bark Spectral Distance), PSQM (Perceptual Speech Quality Measure) 척도를 적용하여 그 성능을 비교 분석하였다. 이 척도들을 실제환경에서 수집된 PCS 음성데이터에 대해서 적용하였고 이 결과치와 청취자들의 평가 반응에 의해 얻어진 MOS 결과치와의 상관성을 조사하였다. 실험 결과, BSD와 PSQM 척도의 상관성이 0.81, 0.84로 나타나 CD, MSD보다 성능이 더 우수함을 보였다.

  • PDF

Performance Comparison of Objective Measures for Speech Quality for Evaluation in CDMA Mobile Telephone (CDMA 이동전화 통화품질평가를 위한 객관적 음질평가척도별 성능 비교)

  • 이준희;김광수;윤정오
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2001.05a
    • /
    • pp.256-260
    • /
    • 2001
  • 본 논문에서는 디지털 이동전화(CDMA) 채널환경을 통과한 왜곡된 전화음성에 대해 객관적 음질평가 척도의 개발을 위한 기초 연구로서 기존의 CD(Cepstral Distance), MSD(Mel Spectral Distance), BSD(Bark Spectral Distance), Modified BSD, PSQM(Perceptual Speech Quality Measure)를 대상으로 객관척도 알고리즘을 성능평가 하였다. 이 척도들은 실제 이동전화 환경에서 수집된 PCS 음성데이터에 대해서 적용하였으며 이 결과치를 주관적 음질평가 방법인 MU와 상관성을 비교 조사하였다. 실험 결과, BSD와 MBSD, 그리고 PSQM 척도의 상관성이 각각 0.80, 0.85, 0.84로 나타났으며 CD, MSD 보다 성능이 상대적으로 더 우수함을 보였다.

  • PDF

Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum (켑스트럼으로부터 변환된 로그 스펙트럼을 이용한 포먼트 평활화 켑스트럴 평균 차감법)

  • 김유진;정혜경;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2002
  • In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.

Robust Endpoint Detection Algorithm For Speaker Verification (화자인식을 위한 강인한 끝점 검출 알고리즘)

  • Jung Dae Sung;Kim Jung Gon;Kim Hyung Soon
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.137-140
    • /
    • 2003
  • In this paper, we propose a robust endpoint detection algorithm for speaker verification. Proposed algorithm uses energy and cepstral distance parameters, and it replaces the detected endpoints with endpoints of voiced speech, when the estimated signal-to-noise ratio (SNR) is low. Experimental results show that proposed algorithm is superior to energy-based endpoint detection algorithm.

  • PDF

Matching Pursuit Sinusoidal Modeling with Damping Factor (Damping 요소를 첨가한 매칭 퍼슈잇 정현파 모델링)

  • Jeong, Gyu-Hyeok;Kim, Jong-Hark;Lim, Joung-Woo;Joo, Gi-Ho;Lee, In-Sung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.1
    • /
    • pp.105-113
    • /
    • 2007
  • In this paper, we propose the matching pursuit with damping factors, a new sinusoidal model improving the matching pursuit, for the codecs based on sinusoidal model. The proposed model defines damping factors by using a correlativity of parameters between the current and adjacent frame, and estimates sinusoidal parameters more accurately in analysis frame by using the matching pursuit according to damping factor, and synthesizes the final signal. Then it is possible to model efficiently without interpolation schemes. The proposed sinusoidal model shows a better speech quality without an additional delay than the conventional sinusoidal model with interpolation methods. Through the SNR(signal to noise ratio), the MOS(Mean Opinion Score), LR(Itakura-Saito likelihood ratio), and CD(cepstral distance), we compare the performance of our model with that of matching pursuit using interpolation methods.

Classification of Underwater Transient Signals Using MFCC Feature Vector (MFCC 특징 벡터를 이용한 수중 천이 신호 식별)

  • Lim, Tae-Gyun;Hwang, Chan-Sik;Lee, Hyeong-Uk;Bae, Keun-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8C
    • /
    • pp.675-680
    • /
    • 2007
  • This paper presents a new method for classification of underwater transient signals, which employs frame-based decision with Mel Frequency Cepstral Coefficients(MFCC). The MFCC feature vector is extracted frame-by-frame basis for an input signal that is detected as a transient signal, and Euclidean distances are calculated between this and all MFCC feature. vectors in the reference database. Then each frame of the detected input signal is mapped to the class having minimum Euclidean distance in the reference database. Finally the input signal is classified as the class that has maximum mapping rate in the reference database. Experimental results demonstrate that the proposed method is very promising for classification of underwater transient signals.

ON IMPROVING THE PERFORMANCE OF CODED SPECTRAL PARAMETERS FOR SPEECH RECOGNITION

  • Choi, Seung-Ho;Kim, Hong-Kook;Lee, Hwang-Soo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.250-253
    • /
    • 1998
  • In digital communicatioin networks, speech recognition systems conventionally reconstruct speech followed by extracting feature [parameters. In this paper, we consider a useful approach by incorporating speech coding parameters into the speech recognizer. Most speech coders employed in the networks represent line spectral pairs as spectral parameters. In order to improve the recognition performance of the LSP-based speech recognizer, we introduce two different ways: one is to devise weighed distance measures of LSPs and the other is to transform LSPs into a new feature set, named a pseudo-cepstrum. Experiments on speaker-independent connected-digit recognition showed that the weighted distance measures significantly improved the recognition accuracy than the unweighted one of LSPs. Especially we could obtain more improved performance by using PCEP. Compared to the conventional methods employing mel-frequency cepstral coefficients, the proposed methods achieved higher performance in recognition accuracies.

  • PDF