• Title/Summary/Keyword: cepstral

Search Result 297, Processing Time 0.02 seconds

Isolated-Word Speech Recognition in Telephone Environment Using Perceptual Auditory Characteristic (인지적 청각 특성을 이용한 고립 단어 전화 음성 인식)

  • Choi, Hyung-Ki;Park, Ki-Young;Kim, Chong-Kyo
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.2
    • /
    • pp.60-65
    • /
    • 2002
  • In this paper, we propose GFCC(gammatone filter frequency cepstrum coefficient) parameter which was based on the auditory characteristic for accomplishing better speech recognition rate. And it is performed the experiment of speech recognition for isolated word acquired from telephone network. For the purpose of comparing GFCC parameter with other parameter, the experiment of speech recognition are carried out using MFCC and LPCC parameter. Also, for each parameter, we are implemented CMS(cepstral mean subtraction)which was applied or not in order to compensate channel distortion in telephone network. Accordingly, we found that the recognition rate using GFCC parameter is better than other parameter in the experimental result.

Music Genre Classification System Using Decorrelated Filter Bank (Decorrelated Filter Bank를 이용한 음악 장르 분류 시스템)

  • Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.100-106
    • /
    • 2011
  • Music recordings have been digitalized such that huge size of music database is available to the public. Thus, the automatic classification system of music genres is required to effectively manage the growing music database. Mel-Frequency Cepstral Coefficient (MFCC) is a popular feature vector for genre classification. In this paper, the combined super-vector with Decorrelated Filter Bank (DFB) and Octave-based Spectral Contrast (OSC) using texture windows is processed by Support Vector Machine (SVM) for genre classification. Even with the lower order of the feature vector, the proposed super-vector produces 4.2 % improved classification accuracy compared with the conventional Marsyas system.

Features for Figure Speech Recognition in Noise Environment (잡음환경에서의 숫자음 인식을 위한 특징파라메타)

  • Lee, Jae-Ki;Koh, Si-Young;Lee, Kwang-Suk;Hur, Kang-In
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.473-476
    • /
    • 2005
  • This paper is proposed a robust various feature parameters in noise. Feature parameter MFCC(Mel Frequency Cepstral Coefficient) used in conventional speech recognition shows good performance. But, parameter transformed feature space that uses PCA(Principal Component Analysis)and ICA(Independent Component Analysis) that is algorithm transformed parameter MFCC's feature space that use in old for more robust performance in noise is compared with the conventional parameter MFCC's performance. The result shows more superior performance than parameter and MFCC that feature parameter transformed by the result ICA is transformed by PCA.

  • PDF

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

Performance Improvement of EMG-Pattern Recognition Using MFCC-HMM-GMM (MFCC-HMM-GMM을 이용한 근전도(EMG)신호 패턴인식의 성능 개선)

  • Choi, Heung-Ho;Kim, Jung-Ho;Kwon, Jang-Woo
    • Journal of Biomedical Engineering Research
    • /
    • v.27 no.5
    • /
    • pp.237-244
    • /
    • 2006
  • This study proposes an approach to the performance improvement of EMG(Electromyogram) pattern recognition. MFCC(Mel-Frequency Cepstral Coefficients)'s approach is molded after the characteristics of the human hearing organ. While it supplies the most typical feature in frequency domain, it should be reorganized to detect the features in EMG signal. And the dynamic aspects of EMG are important for a task, such as a continuous prosthetic control or various time length EMG signal recognition, which have not been successfully mastered by the most approaches. Thus, this paper proposes reorganized MFCC and HMM-GMM, which is adaptable for the dynamic features of the signal. Moreover, it requires an analysis on the most suitable system setting fur EMG pattern recognition. To meet the requirement, this study balanced the recognition-rate against the error-rates produced by the various settings when loaming based on the EMG data for each motion.

Implementation of Speaker Independent Speech Recognition System Using Independent Component Analysis based on DSP (독립성분분석을 이용한 DSP 기반의 화자 독립 음성 인식 시스템의 구현)

  • 김창근;박진영;박정원;이광석;허강인
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.2
    • /
    • pp.359-364
    • /
    • 2004
  • In this paper, we implemented real-time speaker undependent speech recognizer that is robust in noise environment using DSP(Digital Signal Processor). Implemented system is composed of TMS320C32 that is floating-point DSP of Texas Instrument Inc. and CODEC for real-time speech input. Speech feature parameter of the speech recognizer used robust feature parameter in noise environment that is transformed feature space of MFCC(met frequency cepstral coefficient) using ICA(Independent Component Analysis) on behalf of MFCC. In recognition result in noise environment, we hew that recognition performance of ICA feature parameter is superior than that of MFCC.

Effects of Semi-Occluded Vocal Tract Exercise in Patients with Functional Aphonia (반폐쇄성도훈련이 기능적 실성증 환자의 음성 개선에 미치는 효과)

  • Chae, Hye Rim;Kim, Ji sung;Lee, Dong Wook;Choi, Soeng Hee
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.30 no.1
    • /
    • pp.48-52
    • /
    • 2019
  • Background and Objectives : Functional aphonia is characterized by incomplete closure of the vocal folds. Semi-occluded vocal tract exercise (SOVTE) allows smoothly vocal folds collision without damage to the vocal folds tissues to produce normal vocal intensity. The purpose of this study is to report the effect of SOVTE in patients with functional aphonia. Materials and Method : Seven patients diagnosed with functional aphonia were treated with 1-3 voice therapy sessions using voiced lip-trill, humming, Lax Vox in SOVTE. To assess the effectiveness of semi-occluded vocal tract exercise, cepstral analysis and auditory perceptual assessment were performed before and after voice therapy. Results : F0 (fundamental frequency), CPP (cepstral peak prominence) and L/H ratio (low/high spectral ratio) were significantly increased, while CPP Standard deviation, L/H ratio Standard deviation were decreased. In addition, 'Grade', 'Breathiness' and 'Asthenia' were significantly decreased in the GRBAS scale after SOVTE (p<0.05). Conclusion : In our study, SOVTE seemed to be effective to elicit voice quickly and promote vocal folds vibration without muscular effort in patients with functional aphonia.

The effect of the Modified Voiced Lip Trill (MVoLT) training on vocal changes of musical theater students (응용 입술 트릴 훈련이 뮤지컬 전공 학생의 음성 변화에 미치는 효과)

  • Lee, Seung Jin;Choi, Hong-Shik;Lim, Jae-Yol;Lee, Kwang Yong
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.135-146
    • /
    • 2018
  • The Modified Voiced Lip Trill (MVoLT) training is a variant of voiced lip-till training characterized by increased loudness, lowered laryngeal position, and lip contact facilitated with fingers. The purpose of the current study was to assess the effect of the MVoLT training program on vocal changes of musical singing theater students. A total of 32 musical theater students (17 males and 15 females, age ranging from 18 to 29) participated in the study. For about three months, each participant was tutored using a systematic program focussing on the MVoLT training, accompanied by certain facilitating strategies. Pre- & post-training multi-dimensional vocal characteristics were assesed and compared. Results showed that cepstral peak prominence during vowel phonation increased after training, while its standard deviation and Cepstral Spectral Index of Dysphonia decreased. When an aerodynamic assessment was performed, maximum phonation time, subglottal pressure, mean airflow rate increased, while electroglottographic measures did not change. In addition, decreased psychometric measures, higher maximum pitch, and increased vocal range were noted after training. In conclusion, the MVoLT was proven to have a potential as an effective and safe training method for musical theater singing.

Gender Classification of Speakers Using SVM

  • Han, Sun-Hee;Cho, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.59-66
    • /
    • 2022
  • This research conducted a study classifying gender of speakers by analyzing feature vectors extracted from the voice data. The study provides convenience in automatically recognizing gender of customers without manual classification process when they request any service via voice such as phone call. Furthermore, it is significant that this study can analyze frequently requested services for each gender after gender classification using a learning model and offer customized recommendation services according to the analysis. Based on the voice data of males and females excluding blank spaces, the study extracts feature vectors from each data using MFCC(Mel Frequency Cepstral Coefficient) and utilizes SVM(Support Vector Machine) models to conduct machine learning. As a result of gender classification of voice data using a learning model, the gender recognition rate was 94%.

GMM-Based Gender Identification Employing Group Delay (Group Delay를 이용한 GMM기반의 성별 인식 알고리즘)

  • Lee, Kye-Hwan;Lim, Woo-Hyung;Kim, Nam-Soo;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.243-249
    • /
    • 2007
  • We propose an effective voice-based gender identification using group delay(GD) Generally, features for speech recognition are composed of magnitude information rather than phase information. In our approach, we address a difference between male and female for GD which is a derivative of the Fourier transform phase. Also, we propose a novel way to incorporate the features fusion scheme based on a combination of GD and magnitude information such as mel-frequency cepstral coefficients(MFCC), linear predictive coding (LPC) coefficients, reflection coefficients and formant. The experimental results indicate that GD is effective in discriminating gender and the performance is significantly improved when the proposed feature fusion technique is applied.