• Title/Summary/Keyword: 켑스트럼 분석

Search Result 58, Processing Time 0.02 seconds

Estimation scatterer spacing with spectral response (주파수 응답특성을 이용한 산란체 간격 추정)

  • Kim Eunhye;Yoon Kwan-seob;Na Jungyul
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.447-450
    • /
    • 2002
  • 음향수조 내에 일정한 간격으로 배열된 단순 모양의 산란체들로부터 획득된 후방산란 신호를 분석하여 산란체 간격(scatterer spacing)을 추측 할 수 있는 방법을 연구하였다. 수신신호의 산란특성을 켑스트럼 피크(cepstral peaks)를 이용하여 산란체 간격으로 해석하였다. 임펄스 응답신호를 이용한 수치계산으로 산란체 간격 추정방법을 검증한 후, 수조 실험으로 획득한 후방 산란 신호에 적용해 그 결과를 비교해 보았다.

  • PDF

Comparison of Vowel and Text-Based Cepstral Analysis in Dysphonia Evaluation (발성장애 평가 시 /a/ 모음연장발성 및 문장검사의 켑스트럼 분석 비교)

  • Kim, Tae Hwan;Choi, Jeong Im;Lee, Sang Hyuk;Jin, Sung Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.26 no.2
    • /
    • pp.117-121
    • /
    • 2015
  • Background : Cepstral analysis which is obtained from Fourier transformation of spectrum has been known to be effective indicator to analyze the voice disorder. To evaluate the voice disorder, phonation of sustained vowel /a/ sound or continuous speech have been used but the former was limited to capture hoarseness properly. This study is aimed to compare the effectiveness in analysis of cepstrum between the sustained vowel /a/ sound and continuous speech. Methods : From March 2012 to December 2014, total 72 patients was enrolled in this study, including 24 unilateral vocal cord palsy, vocal nodule and vocal polyp patients, respectively. The entire patient evaluated their voice quality by VHI (Voice Handicap Index) before and after treatment. Phonation of sustained vowel /a/ sample and continuous speech using the first sentence of autumn paragraph was subjected by cepstral analysis and compare the pre-treatment group and post-treatment group. Results : The measured values of pre and post treatment in CPP-a (cepstral peak prominence in /a/ vowel sound) was 13.80, 13.91 in vocal cord palsy, 16.62, 17.99 in vocal cord nodule, 14.19, 18.50 in vocal cord polyp respectively. Values of CPP-s (cepstral peak prominence in text-based speech) in pre and post treatment was 11.11, 12.09 in vocal cord palsy, 12.11, 14.09 in vocal cord nodule, 12.63, 14.17 in vocal cord polyp. All 72 patients showed subjective improvement in VHI after treatment. CPP-a showed statistical improvement only in vocal polyp group, but CPP-s showed statistical improvement in all three groups (p<0.05). Conclusion : In analysis of cepstrum, text-based analysis is more representative in voice disorder than vowel sound speech. So when the acoustic analysis of voice by cepstrum, both phonation of sustained vowel /a/ sound and text based speech should be performed to obtain more accurate result.

  • PDF

Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm (켑스트럼 변수와 랜덤포레스트 알고리듬을 이용한 MTD(근긴장성 발성장애) 여성화자 음성과 정상음성 분류)

  • Yun, Joowon;Shim, Heejeong;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.91-98
    • /
    • 2020
  • This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSVTM. The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.

Effects of vowel types and sentence positions in standard passage on auditory and cepstral and spectral measures in patients with voice disorders (모음 유형과 표준문단의 문장 위치가 음성장애 환자의 청지각적 및 켑스트럼 및 스펙트럼 분석에 미치는 효과)

  • Mi-Hyeon Choi;Seong Hee Choi
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.81-90
    • /
    • 2023
  • Auditory perceptual assessment and acoustic analysis are commonly used in clinical practice for voice evaluation. This study aims to explore the effects of speech task context on auditory perceptual assessment and acoustic measures in patients with voice disorders. Sustained vowel phonations (/a/, /e/, /i/, /o/, /u/, /ɯ/, /ʌ/) and connected speech (a standardized paragraph 'kaeul' and nine sub-sentences) were obtained from a total of 22 patients with voice disorders. GRBAS ('G', 'R', 'B', 'A', 'S') and CAPE-V ('OS', 'R', 'B', 'S', 'P', 'L') auditory-perceptual assessment were evaluated by two certified speech language pathologists specializing in voice disorders using blind and random voice samples. Additionally, spectral and cepstral measures were analyzed using the analysis of dysphonia in speech and voice model (ADSV).When assessing voice quality with the GRBAS scale, it was not significantly affected by the vowel type except for 'B', while the 'OS', 'R' and 'B' in CAPE-V were affected by the vowel type (p<.05). In addition, measurements of CPP and L/H ratio were influenced by vowel types and sentence positions. CPP values in the standard paragraph showed significant negative correlations with all vowels, with the highest correlation observed for /e/ vowel (r=-.739). The CPP of the second sentence had the strongest correlation with all vowels. Depending on the speech stimulus, CAPE-V may have a greater impact on auditory-perceptual assessment than GRBAS, vowel types and sentence position with consonants influenced the 'B' scale, CPP, and L/H ratio. When using vowels in the voice assessment of patients with voice disorders, it would be beneficial to use not only /a/, but also the vowel /i/, which is acoustically highly correlated with 'breathy'. In addition, the /e/ vowel was highly correlated acoustically with the standardized passage and sub-sentences. Furthermore, given that most dysphonic signals are aperiodic, 2nd sentence of the 'kaeul' passage, which is the most acoustically correlated with all vowels, can be used with CPP. These results provide clinical evidence of the impact of speech tasks on auditory perceptual and acoustic measures, which may help to provide guidelines for voice evaluation in patients with voice disorders.

A New Power Spectrum Warping Approach to Speaker Warping (화자 정규화를 위한 새로운 파워 스펙트럼 Warping 방법)

  • 유일수;김동주;노용완;홍광석
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.4
    • /
    • pp.103-111
    • /
    • 2004
  • The method of speaker normalization has been known as the successful method for improving the accuracy of speech recognition at speaker independent speech recognition system. A frequency warping approach is widely used method based on maximum likelihood for speaker normalization. This paper propose a new power spectrum warping approach to making improvement of speaker normalization better than a frequency warping. Th power spectrum warping uses Mel-frequency cepstrum analysis(MFCC) and is a simple mechanism to performing speaker normalization by modifying the power spectrum of Mel filter bank in MFCC. Also, this paper propose the hybrid VTN combined the Power spectrum warping and a frequency warping. Experiment of this paper did a comparative analysis about the recognition performance of the SKKU PBW DB applied each speaker normalization approach on baseline system. The experiment results have shown that a frequency warping is 2.06%, the power spectrum is 3.06%, and hybrid VTN is 4.07% word error rate reduction as of word recognition performance of baseline system.

A Study on Spoken Digits Analysis and Recognition (숫자음 분석과 인식에 관한 연구)

  • 김득수;황철준
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.3
    • /
    • pp.107-114
    • /
    • 2001
  • This paper describes Connected Digit Recognition with Considering Acoustic Feature in Korea. The recognition rate of connected digit is usually lower than word recognition. Therefore, speech feature parameter and acoustic feature are employed to make robust model for digit, and we could confirm the effect of Considering. Acoustic Feature throughout the experience of recognition. We used KLE 4 connected digit as database and 19 continuous distributed HMM as PLUs(Phoneme Like Units) using phonetical rules. For recognition experience, we have tested two cases. The first case, we used usual method like using Mel-Cepstrum and Regressive Coefficient for constructing phoneme model. The second case, we used expanded feature parameter and acoustic feature for constructing phoneme model. In both case, we employed OPDP(One Pass Dynamic Programming) and FSA(Finite State Automata) for recognition tests. When appling FSN for recognition, we applied various acoustic features. As the result, we could get 55.4% recognition rate for Mel-Cepstrum, and 67.4% for Mel-Cepstrum and Regressive Coefficient. Also, we could get 74.3% recognition rate for expanded feature parameter, and 75.4% for applying acoustic feature. Since, the case of applying acoustic feature got better result than former method, we could make certain that suggested method is effective for connected digit recognition in korean.

  • PDF

Time-Synchronization Method for Dubbing Signal Using SOLA (SOLA를 이용한 더빙 신호의 시간축 동기화)

  • 이기승;지철근;차일환;윤대희
    • Journal of Broadcast Engineering
    • /
    • v.1 no.2
    • /
    • pp.85-95
    • /
    • 1996
  • The purpose of this paper Is to propose a dubbed signal time-synchroniztion technique based on the SOLA(Synchronized Over-Lap and Add) method which has been widely used to modify the time scale of speech signal. In broadcasting audio recording environments, the high degree of background noise requires dubbing process. Since the time difference between the original and the dubbed signal ranges about 200mili seconds, process is required to make the dubbed signal synchronize to the corresponding image. The proposed method finds he starting point of the dubbing signal using the short-time energy of the two signals. Thereafter, LPC cepstrum analysis and DTW(Dynamic Time Warping) process are applied to synchronize phoneme positions of the two signals. After determining the matched point by the minimum mean square error between orignal and dubbed LPC cepstrums, the SOLA method is applied to the dubbed signal, to maintain the consistency of the corresponding phase. Effectiveness of proposed method is verified by comparing the waveforms and the spectrograms of the original and the time synchronized dubbing signal.

  • PDF

Fault Analysis of the Wind Turbine Drive Train in the Quefrency Region (큐프렌시 영역 해석을 통한 드라이브 트레인 결함 분석)

  • Park, Yong-Hui;Shi, Wei;Park, Hyun-Chul
    • New & Renewable Energy
    • /
    • v.9 no.3
    • /
    • pp.5-13
    • /
    • 2013
  • In the previous research, dynamic results have been analyzed in the time and frequency regions. Time and frequency region can be transformed by the Fourier transform. This transform is very useful about analyzing system behaviors. However, because of coupling, it cannot give clear results in the real system including lots of defects. In this paper, we introduced the analysis based on quefrency region to represent physical means clearly from complicated results. We simulated the drive train system which has defects, and compared between frequency and quefrency region to show its excellence. To do this process, We established mathematical model. The equation of motion was derived by the Lagrange equation and constraint equations. The constraint equation included relationships about gear mesh, flexibility of shaft. About numerical analysis, the Newmark beta method was used to get results. And FFT (Fast Fourier Transform) which converts results from time domain to frequency, qufrequency was used.

Voice Personality Transformation Using a Multiple Response Classification and Regression Tree (다중 응답 분류회귀트리를 이용한 음성 개성 변환)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.253-261
    • /
    • 2004
  • In this paper, a new voice personality transformation method is proposed. which modifies speaker-dependent feature variables in the speech signals. The proposed method takes the cepstrum vectors and pitch as the transformation paremeters, which represent vocal tract transfer function and excitation signals, respectively. To transform these parameters, a multiple response classification and regression tree (MR-CART) is employed. MR-CART is the vector extended version of a conventional CART, whose response is given by the vector form. We evaluated the performance of the proposed method by comparing with a previously proposed codebook mapping method. We also quantitatively analyzed the performance of voice transformation and the complexities according to various observations. From the experimental results for 4 speakers, the proposed method objectively outperforms a conventional codebook mapping method. and we also observed that the transformed speech sounds closer to target speech.

Cepstral and spectral analysis of voices with adductor spasmodic dysphonia (내전형연축성 발성장애 음성에 대한 켑스트럼과 스펙트럼 분석)

  • Shim, Hee Jeong;Jung, Hun;Lee, Sue Ann;Choi, Byung Heun;Heo, Jeong Hwa;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.73-80
    • /
    • 2016
  • The purpose of this study was to analyze perceptual and spectral/cepstral measurements in patients with adductor spasmodic dysphonia(ADSD). Sixty participants with gender and age matched individuals(30 ADSD and 30 controls) were recorded in reading a sentence and sustained the vowel /a/. Acoustic data were analyzed acoustically by measuring CPP, L/H ratio, mean CPP F0 and CSID, and auditory-perceptual ratings were measured using GRBAS. The main results can be summarized as below: (a) the CSID for the connected speech was significantly higher than for the sustained vowel (b) the G, R and S for the connected speech were significantly higher than for the sustained vowel (c) Spectral/cepstral parameters were significantly correlated with the perceptual parameters, and (d) the ROC analysis showed that the threshold of 13.491 for the CSID achieved a good classification for ADSD, with 86.7% sensitivity and 96.7% specificity. Spectral and cepstral analysis for the connected speech is especially meaningful on cases where perceptual analysis and clinical evaluation alone are insufficient.