• Title/Summary/Keyword: Voice classification

Search Result 150, Processing Time 0.028 seconds

Vocal Characteristics and Differences in Gender and Voice Classification among Classical Singers (성악가의 성별 및 성종에 따른 발성적 특징과 차이)

  • Nam, Do-Hyun;Kim, Wha-Soak
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.163-171
    • /
    • 2009
  • This study attempted to investigate vocal characteristics and differences in gender and voice classification among classical singers. Twenty-three female singers (M = 23.1 yrs, SD = 3.6 yrs, average 6.3 yrs singing experience, all classified as sopranos) and twenty male singers (M = 25.2 yrs, SD= 3.6 yrs, average 6. 3 yrs singing experience, 8 tenors, 12 baritones) were recruited to participate in the present study. Speaking fundamental frequency (FO), closed quotient (CQ), MPT (Maximum Phonation Time), breathing types, maximum inspiratory pressure (MIP), maximum expiratory pressure (MEP), and singers' formants were measured. In addition, vibratory patterns were observed using stroboscopy. Sfo, singing CQ, breathing types, formant frequency in singers' formants, MIP, MEP, and MPT were significantly different from gender to gender. Generally, singers' formants were observed in male singers and also the pattern of singers' formants was different between tenors and baritones. Lower singing CQ values were observed than speaking CQ values in the female singers (P<.001). Furthermore, MEP, MIP, and singing CQ were significantly lower for female singers than for males singers (P<.001). MPT and speaking FO, however, were not significantly different between tenors and baritones.

  • PDF

Discriminative Weight Training for a Statistical Model-Based Voice Activity Detection (통계적 모델 기반의 음성 검출기를 위한 변별적 가중치 학습)

  • Kang, Sang-Ick;Jo, Q-Haing;Park, Seung-Seop;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.194-198
    • /
    • 2007
  • In this paper, we apply a discriminative weight training to a statistical model-based voice activity detection(VAD). In our approach, the VAD decision rule is expressed as the geometric mean of optimally weighted likelihood ratios(LRs) based on a minimum classification error(MCE) method which is different from the previous works in that different weights are assigned to each frequency bin which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for the statistical model-based VAD using the LR test.

Voice Activity Detection Based on Discriminative Weight Training with Feedback (궤환구조를 가지는 변별적 가중치 학습에 기반한 음성검출기)

  • Kang, Sang-Ick;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.443-449
    • /
    • 2008
  • One of the key issues in practical speech processing is to achieve robust Voice Activity Deteciton (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ equally weighted likelihood ratios (LRs), which, however, deviates from the real observation. Furthermore voice activities in the adjacent frames have strong correlation. In other words, the current frame is highly correlated with previous frame. In this paper, we propose the effective VAD approach based on a minimum classification error (MCE) method which is different from the previous works in that different weights are assigned to both the likelihood ratio on the current frame and the decision statistics of the previous frame.

Korean Voice Phishing Text Classification Performance Analysis Using Machine Learning Techniques (머신러닝 기법을 이용한 한국어 보이스피싱 텍스트 분류 성능 분석)

  • Boussougou, Milandu Keith Moussavou;Jin, Sangyoon;Chang, Daeho;Park, Dong-Joo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.297-299
    • /
    • 2021
  • Text classification is one of the popular tasks in Natural Language Processing (NLP) used to classify text or document applications such as sentiment analysis and email filtering. Nowadays, state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms are the core engine used to perform these classification tasks with high accuracy, and they show satisfying results. This paper conducts a benchmarking performance's analysis of multiple SOTA algorithms on the first known labeled Korean voice phishing dataset called KorCCVi. Experimental results reveal performed on a test set of 366 samples reveal which algorithm performs the best considering the training time and metrics such as accuracy and F1 score.

A Study on Sasang Constitutional Classification Factor using Sasang Constitutional Analysis Tool 2 (사상체질진단툴 2를 활용한 사상체질 분류 인자 연구)

  • Kim, Eun-Ju;Seo, Seung-Ho;Park, Seong-Eun;Na, Chang-Su;Son, Hong-Seok
    • Journal of Sasang Constitutional Medicine
    • /
    • v.30 no.3
    • /
    • pp.40-47
    • /
    • 2018
  • Objectives The purpose of this study is to analyze the factors contributing to the classification of Sasang Constitution using Sasang Constitutional Analysis Tool 2. Methods A total of 99 subjects were assessed for the classification of Sasang Constitution using four measurement factors (face, voice, body shape, and questionnaire information) of Sasang Constitutional Analysis Tool 2. Results Taeeumin had significantly higher body weight and BMI. In the result of the agreement between the judgment of the four measurement factors and the final judgment of Sasang Constitution, the agreement degree of Soeumin was the highest value of 2.6. Taeeumin, Soeumin, and Soyangin showed the highest agreement with the individual judgment of face, body shape and questionnaire, and body shape, respectively. Conclusions It is difficult to conclude that any individual factor contributes significantly to the classification of Sasang Constitution. Further study on Sasang Constitutional Analysis Tool 2 involving more peoples is needed in order to determine the factors contributing to the classification of Sasang Constitution.

Greeting, Function, and Music: How Users Chat with Voice Assistants

  • Wang, Ji;Zhang, Han;Zhang, Cen;Xiao, Junjun;Lee, Seung Hee
    • Science of Emotion and Sensibility
    • /
    • v.23 no.2
    • /
    • pp.61-74
    • /
    • 2020
  • Voice user interface has become a commercially viable and extensive interaction mechanism with the development of voice assistants. Despite the popularity of voice assistants, the academic community does not utterly understand about what, when, and how users chat with them. Chatting with a voice assistant is crucial as it defines how a user will seek the help of the assistant in the future. This study aims to cover the essence and construct of conversational AI, to develop a classification method to deal with user utterances, and, most importantly, to understand about what, when, and how Chinese users chat with voice assistants. We collected user utterances from the real conventional database of a commercial voice assistant, NetEase Sing in China. We also identified different utterance categories on the basis of previous studies and real usage conditions and annotated the utterances with 17 labels. Furthermore, we found that the three top reasons for the usage of voice assistants in China are the following: (1) greeting, (2) function, and (3) music. Chinese users like to interact with voice assistants at night from 7 PM to 10 PM, and they are polite toward the assistants. The whole percentage of negative feedback utterances is less than 6%, which is considerably low. These findings appear to be useful in voice interaction designs for intelligent hardware.

Classification of Pathological Voice from ARS using Neural Network (신경회로망을 이용한 ARS 장애음성의 식별에 관한 연구)

  • Jo, C.W.;Kim, K.I.;Kim, D.H.;Kwon, S.B.;Kim, K.R.;Kim, Y.J.;Jun, K.R.;Wang, S.G.
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.61-71
    • /
    • 2001
  • Speech material, which is collected from ARS(Automatic Response System), was analyzed and classified into disease and non-disease state. The material include 11 different kinds of diseases. Along with ARS speech, DAT(Digital Audio Tape) speech is collected in parallel to give the bench mark. To analyze speech material, analysis tools, which is developed local laboratory, are used to provide an improved and robust performance to the obtained parameters. To classify speech into disease and non-disease class, multi-layered neural network was used. Three different combinations of 3, 6, 12 parameters are tested to obtain the proper network size and to find the best performance. From the experiment, the classification rate of 92.5% was obtained.

  • PDF

Classification of Pathological Voice Using Artigicial Neural Network with Normalized Parameters

  • Li, Tao;Bak, Il-Suh;Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.21-29
    • /
    • 2004
  • In this paper we examined the effect of normalization on discriminating the pathological voice into normal and abnormal classes using artificial neural network. Average values per each parameter were used to normalize each set of parameter values. Artificial neural networks were used as classifiers. And the effect of normalization was evaluated by comparing the discrimination results between original and normalized parameter sets.

  • PDF