• Title/Summary/Keyword: Voice classification

Search Result 150, Processing Time 0.033 seconds

Classification of pathological and normal voice based on dimension reduction of feature vectors (피처벡터 축소방법에 기반한 장애음성 분류)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.123-126
    • /
    • 2007
  • This paper suggests a method to improve the performance of the pathological/normal voice classification. The effectiveness of the mel frequency-based filter bank energies using the fisher discriminant ratio (FDR) is analyzed. And mel frequency cepstrum coefficients (MFCCs) and the feature vectors through the linear discriminant analysis (LDA) transformation of the filter bank energies (FBE) are implemented. This paper shows that the FBE LDA-based GMM is more distinct method for the pathological/normal voice classification than the MFCC-based GMM.

  • PDF

A Voice Controlled Service Robot Using Support Vector Machine

  • Kim, Seong-Rock;Park, Jae-Suk;Park, Ju-Hyun;Lee, Suk-Gyu
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1413-1415
    • /
    • 2004
  • This paper proposes a SVM(Support Vector Machine) training algorithm to control a service robot with voice command. The service robot with a stereo vision system and dual manipulators of four degrees of freedom implements a User-Dependent Voice Control System. The training of SVM algorithm that is one of the statistical learning theories leads to a QP(quadratic programming) problem. In this paper, we present an efficient SVM speech recognition scheme especially based on less learning data comparing with conventional approaches. SVM discriminator decides rejection or acceptance of user's extracted voice features by the MFCC(Mel Frequency Cepstrum Coefficient). Among several SVM kernels, the exponential RBF function gives the best classification and the accurate user recognition. The numerical simulation and the experiment verified the usefulness of the proposed algorithm.

  • PDF

A study on the voice command recognition at the motion control in the industrial robot (산업용 로보트의 동작제어 명령어의 인식에 관한 연구)

  • 이순요;권규식;김홍태
    • Journal of the Ergonomics Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.3-10
    • /
    • 1991
  • The teach pendant and keyboard have been used as an input device of control command in human-robot sustem. But, many problems occur in case that the usef is a novice. So, speech recognition system is required to communicate between a human and the robot. In this study, Korean voice commands, eitht robot commands, and ten digits based on the broad phonetic analysis are described. Applying broad phonetic analysis, phonemes of voice commands are divided into phoneme groups, such as plosive, fricative, affricative, nasal, and glide sound, having similar features. And then, the feature parameters and their ranges to detect phoneme groups are found by minimax method. Classification rules are consisted of combination of the feature parameters, such as zero corssing rate(ZCR), log engery(LE), up and down(UD), formant frequency, and their ranges. Voice commands were recognized by the classification rules. The recognition rate was over 90 percent in this experiment. Also, this experiment showed that the recognition rate about digits was better than that about robot commands.

  • PDF

A Study on Sasang Constitution Classification Using Voice Analysis (음성 분석을 이용한 사상 체질 분류에 관한 연구)

  • Cho Dong-Uk;Kim Bong-Hyun;Lee Se-Hwan
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.513-517
    • /
    • 2005
  • Our country tradition medicine and competitive peculiar constitution medicine are Sasang medicine in world medicine market. Sasang medicine can approach easily and enjoy healthful life with correct life habit in constitution. In this paper, propose a classification method of Sasang constitution through voice analysis. For this, examined qualities that are appeared in each Sasang constitution by voice analysis, and through this, describe about methods that assort Sasang constitution only with human voice.

  • PDF

A Proposal of Sasang Constitution Classification in Middle-aged Women Using Image and Voice Signals Process (영상 및 음성 신호 처리를 이용한 장년기 여성의 사상체질 분류 방법의 제안)

  • Lee, Se-Hwan;Kim, Bong-Hyun;Ka, Min-Kyoung;Cho, Dong-Uk;Kwak, Ji-Hyun;Oh, Sang-Young;J.Bae, Young-Lae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.5
    • /
    • pp.1210-1217
    • /
    • 2008
  • Sasang medicine is our country's unique traditional medicine based on the classification of individual physical constitution. In the Sasang medicine, what is considered the most important task is to categorize Sasang constitution exactly. Therefore, security of objective elements and diagnosis index is a problem awaiting solution in Sasang constitution classification. To this the paper abstracted result value from objectification, visualization, a fixed quantity of Sasang constitution to analyze face image signals and voice signals. And, comparing the differences constitution would like to develop system classification of Sasang constitution. Specially, image and voice signals are different because of gender, age, region so it composed Sasang constitution group to 40-50 years women in Seoul. To extract of these image and voice signals wanted to perform comparison, analysis in constitution. Finally, it would like to prove a significance of research result through experiment.

Voice Personality Transformation Using a Multiple Response Classification and Regression Tree (다중 응답 분류회귀트리를 이용한 음성 개성 변환)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.253-261
    • /
    • 2004
  • In this paper, a new voice personality transformation method is proposed. which modifies speaker-dependent feature variables in the speech signals. The proposed method takes the cepstrum vectors and pitch as the transformation paremeters, which represent vocal tract transfer function and excitation signals, respectively. To transform these parameters, a multiple response classification and regression tree (MR-CART) is employed. MR-CART is the vector extended version of a conventional CART, whose response is given by the vector form. We evaluated the performance of the proposed method by comparing with a previously proposed codebook mapping method. We also quantitatively analyzed the performance of voice transformation and the complexities according to various observations. From the experimental results for 4 speakers, the proposed method objectively outperforms a conventional codebook mapping method. and we also observed that the transformed speech sounds closer to target speech.

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

Discriminative Feature Vector Selection for Emotion Classification Based on Speech (음성신호기반의 감정분석을 위한 특징벡터 선택)

  • Choi, Ha-Na;Byun, Sung-Woo;Lee, Seok-Pil
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.9
    • /
    • pp.1363-1368
    • /
    • 2015
  • Recently, computer form were smaller than before because of computing technique's development and many wearable device are formed. So, computer's cognition of human emotion has importantly considered, thus researches on analyzing the state of emotion are increasing. Human voice includes many information of human emotion. This paper proposes a discriminative feature vector selection for emotion classification based on speech. For this, we extract some feature vectors like Pitch, MFCC, LPC, LPCC from voice signals are divided into four emotion parts on happy, normal, sad, angry and compare a separability of the extracted feature vectors using Bhattacharyya distance. So more effective feature vectors are recommended for emotion classification.

Voice Analysis of Highest Falsetto and Lowest Modal Voice (가성구와 흉성구의 객관적인 음성분석)

  • 진성민;송윤경;권기환;이경철;반재호
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.13 no.2
    • /
    • pp.151-154
    • /
    • 2002
  • Background and Objectives : The pitch range of the human voice is variable, extending from chest register to falsetto register. Although numerous studies have investigated after laryngeal mechanism description of falsetto tone, systematic and objective studies were lack. The purpose of this study was to systematically analyze and compare modal with falsetto voice. Materials and Methods : Seven adult baritones were selected from a larger population of volunteers at choir. Simultaneous measurements of acoustic, electroglottographic and aerodynamic study were made during /e/ sustained in two vocal registers, lowest modal and highest falsetto. Statistical analysis was performed using Wilkoxson signed rankes test. Results : In the acoustic analysis, shimmer was increased in flasetto voice(p<0.05). In the electroglottographic analysis, closed quotient(CQ), speed quotient(SQ) at the modal voice were higher than at the falsetto voice(p<0.05). In the aerodynamic analysis, and airflow rate(MFR) of falsetto voice was higher than modal voice(p<0.05). Conclusions : In the results of the study indicate that, falsetto register ineffective, inefficient, generally unpleasant because it was produced by incomplete clousure of true vocal cord. We anticipated that further study with large samples can provide an objective criteria for status and classification of singer's modal and falsetto voice.

  • PDF

Classification of Pathological Voice Signal with Severe Noise Component

  • Li, Ta-O;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.107-115
    • /
    • 2003
  • In this paper we tried to classify the pathological voice signal with severe noise component based on two different parameters, the spectral slope and the ratio of energies in the harmonic and noise components (HNR), The spectral slope is obtained by using a curve fitting method and the HNR is computed in cepstrum quefrency domain. Speech data from normal peoples and patients are collected, diagnosed and divided into three different classes (normal, relatively less noisy and severely noisy data), The mean values and the standard deviations of the spectral slope and the HNR are computed and compared with in the three kinds of data to characterize and classify the severely noisy pathological voice signals from others.

  • PDF