• Title/Summary/Keyword: voice quality features

Search Result 43, Processing Time 0.019 seconds

Qualitative Classification of Voice Quality of Normal Speech and Derivation of its Correlation with Speech Features (정상 음성의 목소리 특성의 정성적 분류와 음성 특징과의 상관관계 도출)

  • Kim, Jungin;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.71-76
    • /
    • 2014
  • In this paper voice quality of normal speech is qualitatively classified by five components of breathy, creaky, rough, nasal, and thin/thick voice. To determine whether a correlation exists between a subjective measure of voice and an objective measure of voice, each voice is perceptually evaluated using the 1/2/3 scale by speech processing specialists and acoustically analyzed using speech analysis tools such as the Praat, MDVP, and VoiceSauce. The speech parameters include features related to speech source and vocal tract filter. Statistical analysis uses a two-independent-samples non-parametric test. Experimental results show that statistical analysis identified a significant correlation between the speech feature parameters and the components of voice quality.

Analysis of Voice Quality Features and Their Contribution to Emotion Recognition (음성감정인식에서 음색 특성 및 영향 분석)

  • Lee, Jung-In;Choi, Jeung-Yoon;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.771-774
    • /
    • 2013
  • This study investigates the relationship between voice quality measurements and emotional states, in addition to conventional prosodic and cepstral features. Open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy were analyzed as voice quality features, and prosodic features related to fundamental frequency and energy are also examined. ANOVA tests and Sequential Forward Selection are used to evaluate significance and verify performance. Classification experiments show that using the proposed features increases overall accuracy, and in particular, errors between happy and angry decrease. Results also show that adding voice quality features to conventional cepstral features leads to increase in performance.

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

Analysis of the Voice Quality in Emotional Speech Using Acoustical Parameters (음향 파라미터에 의한 정서적 음성의 음질 분석)

  • Jo, Cheol-Woo;Li, Tao
    • MALSORI
    • /
    • v.55
    • /
    • pp.119-130
    • /
    • 2005
  • The aim of this paper is to investigate some acoustical characteristics of the voice quality features from the emotional speech database. Six different parameters are measured and compared for 6 different emotions (normal, happiness, sadness, fear, anger, boredom) and from 6 different speakers. Inter-speaker variability and intra-speaker variability are measured. Some intra-speaker consistency of the parameter change across the emotions are observed, but inter-speaker consistency are not observed.

  • PDF

Screening of Voice Disorder using Source Parameter Model and Artificial Neural Network (음원 파라미터 모델과 인공신경망을 이용한 음성장애 검출)

  • Chytil, Pavel;Jo, Cheol-Woo;Pavel, Misha
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.89-97
    • /
    • 2008
  • There is a number of clinical conditions that affect directly or indirectly the physical properties of the vocal folds and thereby the pressure waveforms of elicited sounds. If the relationships between the clinical conditions and the voice quality are sufficiently reliable, it should be possible to detect these diseases or disorders. The focus of this paper is to determine the set of features and their values that would characterize the speaker's state of vocal folds. To the extent that these features can capture the anatomical, physiological, and neurological aspects of the speaker they can be potentially used to mediate an unobtrusive approach to diagnosis. We will show a new approach to this problem supported with results obtained from two disordered voice corpora.

  • PDF

The Vocalization for Korean Traditional Song "Pansori" (국악(판소리) 발성법)

  • Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.22 no.2
    • /
    • pp.111-114
    • /
    • 2011
  • All singers can often develop voice trouble secondary to vocal abuse and overuse, but it is well known that traditional Korean singer (pansori) develop voice disorders more frequently than western style sunger. While laryngological concern for voice disorders arising in professional singers has received some attention, empirically motivated investigations of the underlying acoustic features of the singing voice have been relatively limited. Since all singers have a good knowledge of the voice and voice training, they would hardly give consent for treatment to a doctor unless he understood their desire to maximize their voice quality. The components of this report are composed of breathing, basic ekement, and vocalization, essencial fact, for getting a perfect voice for pansori. The breathing is based on hypogastric breathing. The main functions of breathing are energy and power of utterence, tempo of rhythm and seperating paragraph and controlling feelings according to dramatic situation. Vocalization is based on general vocalization. Main uses of it are maintaining singer's tone and harmony of cosmetic dual force.

  • PDF

Packet Loss Concealment Algorithm Based on Robust Voice Classification in Noise Environment (잡음환경에 강인한 음성분류기반의 패킷손실 은닉 알고리즘)

  • Kim, Hyoung-Gook;Ryu, Sang-Hyeon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.75-80
    • /
    • 2014
  • The quality of real-time Voice over Internet Protocol (VoIP) network is affected by network impariments such as delays, jitters, and packet loss. This paper proposes a packet loss concealment algorithm based on voice classification for enhancing VoIP speech quality. In the proposed method, arriving packets are classified by an adaptive thresholding approach based on the analysis of multiple features of short signal segments. The excellent classification results are used in the packet loss concealment. Additionally, linear prediction-based packet loss concealment delivers high voice quality by alleviating the metallic artifacts due to concealing consecutive packet loss or recovering lost packet.

The Comparison of the Acoustic and Aerodynamic Characteristics of $PROVOX^{(R)}$ Voice and Esophageal Voice Produced by the Same Laryngectomee (동일 후적자가 산출하는 기관식도 발성($PROVOX^{(R)}$ 발성)과 식도 발성에 대한 음향학적 및 공기역학적 특성 비교)

  • Pyo, H.Y.;Choi, H.S.;Lim, S.E.;Choi, S.H.
    • Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.121-139
    • /
    • 1999
  • Our experimental subject was a laryngectomee who had undergone total laryngectomy with $PROVOX^{(R)}$ insertion, and learned esophageal speech after the surgery, so he could produce both $PROVOX^{(R)}$ voice and esophageal voice. With this subject's production of $PROVOX^{(R)}$ and esophageal voice, we are to compare the acoustic and aerodynamic characteristics of the two voices, under the same physical conditions of the same person. As a result, the fundamental frequency of esophageal voice was 137.2 Hz, and that of $PROVOX^{(R)}$ was 97.5 Hz. $PROVOX^{(R)}$ voice showed lower jitter, shimmer and NHR than esophageal voice, which means that $PROVOX^{(R)}$ voice showed better voice quality than esophageal voice. In spectrographic analysis, the formation of formants and pseudoformants were more distinct in esophageal voice and several temporal aspects of acoutic features such as VOT and closure duration were more similar with normal voice in $PROVOX^{(R)}$ voice. During the sentence utterance, esophageal voice showed longer pause or silence duration than $PROVOX^{(R)}$ voice. Maximum phonation time and mean flow rate of $PROVOX^{(R)}$ voice were much longer and larger than esophageal voice, but mean and range of sound pressure level, subglottic pressure and voice efficiency were similar in the two voices. Glottal resistance of esophageal voice was much larger than $PROVOX^{(R)}$ voice which showed still larger glottal resistance than normal voice.

  • PDF

A Study on the Perceptual Aspects of an Emotional Voice Using Prosody Transplantation (운율이식을 통해 나타난 감정인지 양상 연구)

  • Yi, So-Pae
    • MALSORI
    • /
    • no.62
    • /
    • pp.19-32
    • /
    • 2007
  • This study investigated the perception of emotional voices by transplanting some or all of the prosodic aspects, i.e. pitch, duration, and intensity, of the utterances produced with emotional voices onto those with normal voices and vice versa. Listening evaluation by 24 raters revealed that prosodic effect was greater than segmental & vocal quality effect on the preception of the emotion. The degree of influence of prosody and that of segments & vocal quality varied according to the type of emotion. As for fear, prosodic elements had far greater influence than segmental & vocal quality elements whereas segmental and vocal elements had as much effect as prosody on the perception of happy voices. Different amount of contribution to the perception of emotion was found among prosodic features with the descending order of pitch, duration and intensity. As for the length of the utterances, the perception of emotion was more effective with long utterances than with short utterances.

  • PDF

GMM Based Voice Conversion Using Kernel PCA (Kernel PCA를 이용한 GMM 기반의 음성변환)

  • Han, Joon-Hee;Bae, Jae-Hyun;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.167-180
    • /
    • 2008
  • This paper describes a novel spectral envelope conversion method based on Gaussian mixture model (GMM). The core of this paper is rearranging source feature vectors in input space to the transformed feature vectors in feature space for the better modeling of GMM of source and target features. The quality of statistical modeling is dependent on the distribution and the dimension of data. The proposed method transforms both of the distribution and dimension of data and gives us the chance to model the same data with different configuration. Because the converted feature vectors should be on the input space, only source feature vectors are rearranged in the feature space and target feature vectors remain unchanged for the joint pdf of source and target features using KPCA. The experimental result shows that the proposed method outperforms the conventional GMM-based conversion method in various training environment.

  • PDF