• Title/Summary/Keyword: Voice classification

Search Result 150, Processing Time 0.031 seconds

Performance of GMM and ANN as a Classifier for Pathological Voice

  • Wang, Jianglin;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.151-162
    • /
    • 2007
  • This study focuses on the classification of pathological voice using GMM (Gaussian Mixture Model) and compares the results to the previous work which was done by ANN (Artificial Neural Network). Speech data from normal people and patients were collected, then diagnosed and classified into two different categories. Six characteristic parameters (Jitter, Shimmer, NHR, SPI, APQ and RAP) were chosen. Then the classification method based on the artificial neural network and Gaussian mixture method was employed to discriminate the data into normal and pathological speech. The GMM method attained 98.4% average correct classification rate with training data and 95.2% average correct classification rate with test data. The different mixture number (3 to 15) of GMM was used in order to obtain an optimal condition for classification. We also compared the average classification rate based on GMM, ANN and HMM. The proper number of mixtures on Gaussian model needs to be investigated in our future work.

  • PDF

Voice-Pishing Detection Algorithm Based on Minimum Classification Error Technique (최소 분류 오차 기법을 이용한 보이스 피싱 검출 알고리즘)

  • Lee, Kye-Hwan;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.3
    • /
    • pp.138-142
    • /
    • 2009
  • We propose an effective voice-phishing detection algorithm based on discriminative weight training. The detection of voice phishing is performed based on a Gaussian mixture model (GMM) incorporaiting minimum classification error (MCE) technique. Actually, the MCE technique is based on log-likelihood from the decoding parameter of the SMV(Selectable Mode Vocoder) directly extracted from the decoding process in the mobile phone. According to the experimental result, the proposed approach is found to be effective for the voice phishing detection.

Packet Loss Concealment Algorithm Based on Robust Voice Classification in Noise Environment (잡음환경에 강인한 음성분류기반의 패킷손실 은닉 알고리즘)

  • Kim, Hyoung-Gook;Ryu, Sang-Hyeon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.75-80
    • /
    • 2014
  • The quality of real-time Voice over Internet Protocol (VoIP) network is affected by network impariments such as delays, jitters, and packet loss. This paper proposes a packet loss concealment algorithm based on voice classification for enhancing VoIP speech quality. In the proposed method, arriving packets are classified by an adaptive thresholding approach based on the analysis of multiple features of short signal segments. The excellent classification results are used in the packet loss concealment. Additionally, linear prediction-based packet loss concealment delivers high voice quality by alleviating the metallic artifacts due to concealing consecutive packet loss or recovering lost packet.

Artificial Intelligence for Clinical Research in Voice Disease (후두음성 질환에 대한 인공지능 연구)

  • Jungirl, Seok;Tack-Kyun, Kwon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

Voice Personality Transformation Using an Optimum Classification and Transformation (최적 분류 변환을 이용한 음성 개성 변환)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.400-409
    • /
    • 2004
  • In this paper. a voice personality transformation method is proposed. which makes one person's voice sound like another person's voice. To transform the voice personality. vocal tract transfer function is used as a transformation parameter. Comparing with previous methods. the proposed method makes transformed speech closer to target speaker's voice in both subjective and objective points of view. Conversion between vocal tract transfer functions is implemented by classification of entire vector space followed by linear transformation for each cluster. LPC cepstrum is used as a feature parameter. A joint classification and transformation method is proposed, where optimum clusters and transformation matrices are simultaneously estimated in the sense of a minimum mean square error criterion. To evaluate the performance of the proposed method. transformation rules are generated from 150 sentences uttered by three male and on female speakers. These rules are then applied to another 150 sentences uttered by the same speakers. and objective evaluation and subjective listening tests are performed.

An Aerodynamic and Acoustic Analysis of the Breathy Voice of Thyroidectomy Patients (갑상선 수술 후 성대마비 환자의 기식 음성에 대한 공기역학적 및 음향적 분석)

  • Kang, Young-Ae;Yoon, Kyu-Chul;Kim, Jae-Ock
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.95-104
    • /
    • 2012
  • Thyroidectomy patients may have vocal paralysis or paresis, resulting in a breathy voice. The aim of this study was to investigate the aerodynamic and acoustic characteristics of a breathy voice in thyroidectomy patients. Thirty-five subjects who have vocal paralysis after thyroidectomy participated in this study. According to perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 14). Aerodynamic analysis was conducted by three tasks (Voicing Efficiency, Maximum Sustained Phonation, Vital Capacity) and acoustic analysis was measured during Maximum Sustained Phonation task. The breathy voice group had significantly higher subglottal pressure and more pathological voice characteristics than the non breathy voice group. Showing 94.1% classification accuracy in result logistic regression of aerodynamic analysis, the predictor parameters for breathiness were maximum sound pressure level, sound pressure level range, phonation time of Maximum Sustained Phonation task and Pitch range, peak air pressure, and mean peak air pressure of Voicing Efficiency task. Classification accuracy of acoustic logistic regression was 88.6%, and five frequency perturbation parameters were shown as predictors. Vocal paralysis creates air turbulence at the glottis. It fluctuates frequency-related parameters and increases aspiration in high frequency areas. These changes determine perceptual breathiness.

Improvement of Speech/Music Classification Based on RNN in EVS Codec for Hearing Aids (EVS 코덱에서 보청기를 위한 RNN 기반의 음성/음악 분류 성능 향상)

  • Kang, Sang-Ick;Lee, Sang Min
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.2
    • /
    • pp.143-146
    • /
    • 2017
  • In this paper, a novel approach is proposed to improve the performance of speech/music classification using the recurrent neural network (RNN) in the enhanced voice services (EVS) of 3GPP for hearing aids. Feature vectors applied to the RNN are selected from the relevant parameters of the EVS for efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and large speech/music data. The proposed algorithm yields better results compared with the conventional scheme implemented in the EVS.

A Study on the Gender and Age Classification of Speech Data Using CNN (CNN을 이용한 음성 데이터 성별 및 연령 분류 기술 연구)

  • Park, Dae-Seo;Bang, Joon-Il;Kim, Hwa-Jong;Ko, Young-Jun
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.11
    • /
    • pp.11-21
    • /
    • 2018
  • Research is carried out to categorize voices using Deep Learning technology. The study examines neural network-based sound classification studies and suggests improved neural networks for voice classification. Related studies studied urban data classification. However, related studies showed poor performance in shallow neural network. Therefore, in this paper the first preprocess voice data and extract feature value. Next, Categorize the voice by entering the feature value into previous sound classification network and proposed neural network. Finally, compare and evaluate classification performance of the two neural networks. The neural network of this paper is organized deeper and wider so that learning is better done. Performance results showed that 84.8 percent of related studies neural networks and 91.4 percent of the proposed neural networks. The proposed neural network was about 6 percent high.

Exploiting Korean Language Model to Improve Korean Voice Phishing Detection (한국어 언어 모델을 활용한 보이스피싱 탐지 기능 개선)

  • Boussougou, Milandu Keith Moussavou;Park, Dong-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.437-446
    • /
    • 2022
  • Text classification task from Natural Language Processing (NLP) combined with state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms as the core engine is widely used to detect and classify voice phishing call transcripts. While numerous studies on the classification of voice phishing call transcripts are being conducted and demonstrated good performances, with the increase of non-face-to-face financial transactions, there is still the need for improvement using the latest NLP technologies. This paper conducts a benchmarking of Korean voice phishing detection performances of the pre-trained Korean language model KoBERT, against multiple other SOTA algorithms based on the classification of related transcripts from the labeled Korean voice phishing dataset called KorCCVi. The results of the experiments reveal that the classification accuracy on a test set of the KoBERT model outperforms the performances of all other models with an accuracy score of 99.60%.

Aerodynamic Evaluation of Voice Changes in Thyroid Surgery Extent (갑상선 수술 범위와 공기역학적 음성 지표 변화)

  • Jeong, Hee Seok;Kim, Joong Sun;Lee, Chang-Yoon;Son, Hee Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.29 no.1
    • /
    • pp.24-29
    • /
    • 2018
  • Background and Objectives : The purpose of this study was to evaluate the impact of surgical extent on voice using acoustic and aerodynamic measurements in a serially followed thyroidectomy patients. Materials and Method : From October 2015 to January 2017, 108 patients who had undergone thyroid surgery and voice test for preoperative, 2, 3, and 6 months postoperatively were classified into five operative types. The radiological stage preoperatively and histopathological stage postoperatively were classified according to the invasion of thyroid capsule and surrounding tissue. For each classification, the results of the voice analysis according to the period were compared and analyzed. Results : The difference of voice according to surgical extent, radiological stage, and histopathologic stage showed significant difference only with Maximal phonation time (MPT) over time. However, in the analysis of interaction between each classification and period, Phonation threshold pressure (PTP) only showed significant results. Conclusion : Differences in imaging and histopathologic stages have no significant effect on recovery of voice symptoms after thyroid surgery. As the extent of operation increases, the pressure to start vocalization is relatively higher, which also varies with time after surgery.