• 제목/요약/키워드: normal voices

검색결과 46건 처리시간 0.023초

한국 성인 음성의 음도인식에 관한 연구 (A Study on Pitch Perception of Normal Korean)

  • 정옥란;김형순;김영태;서장수
    • 음성과학
    • /
    • 제1권
    • /
    • pp.315-323
    • /
    • 1997
  • This study attempts to determine the fundamental frequency level of male and female voices that Koreans perceive as normal. Seventy-three college students majoring in Speech Pathology participated in the study on a voluntary basis. The subjects listened to a male voice with fundamental frequency of 60 Hz, 80 Hz, 100 Hz, 120 Hz, 140 Hz, 160 Hz, 180 Hz, and 200 Hz, and a female voice with fundamental frequency of 140 Hz, 160 Hz, 180 Hz, 200 Hz, 220 Hz, 240 Hz, 260 Hz, and 280 Hz. The PSOLA (Pitch Synchronous Overlap). method and harmonic modeling method of speech signal were used to change pitch in the 20 Hz interval. The voices were presented in a random order to prevent listener bias. The results were as follows; Firstly, $46.6\%$ judged male voice with 120 Hz as normal, and $19.2\%$ judged 140 Hz as normal, and another $19.2\%$ judged 160 Hz as normal. Secondly, $50.7\%$ perceived female voice with 220 Hz as normal, and $32.9\%\;and\;30.1\%$ responded to 200 Hz and 240 Hz, respectively. The problems and recommendations for a future investigation are discussed.

  • PDF

운율 변조 양상에 따른 청자의 연령 지각 (Listener's Age Estimation by Prosody Manipulation)

  • 김지연;성철재
    • 말소리와 음성과학
    • /
    • 제6권2호
    • /
    • pp.81-88
    • /
    • 2014
  • The normal aging process on speech production and these changes are perceived by listeners. This study examined whether age perception changed under various conditions of prosodic manipulations in normal listeners, comparing the prosodic changes according to age and sex in adulthood. The older and younger voices were resynthesized by manipulation of the speaking rate and pitch to shift the perceived age of the groups toward each other. Two-way repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant shift in perceived age of young and old voices. The manipulation of the speaking rate resulted in a significant shift in perceived age for the older and younger groups. A significant shift in age estimates was not observed for the younger male group when pitch was manipulated. There were significant gender-by-age group interactions for prosodic manipulation type. Age-related changes in the prosodic properties of speech may ultimately influence speech perception.

장애 음성 판별을 위한 의료/전자 융복합 소프트웨어 개발 (Development of medical/electrical convergence software for classification between normal and pathological voices)

  • 문지혜;이지연
    • 디지털융복합연구
    • /
    • 제13권12호
    • /
    • pp.187-192
    • /
    • 2015
  • 장애음성을 판별할 수 있는 소프트웨어가 개발 될 경우, 원격의료와 언어치료 등 여러 융복합 분야에서의 활용도가 매우 높다. 본 논문은 성대 진동에 대한 변화율을 나타내는 의료정보인 음향학적 파라미터와 신호처리 기반 고차 통계량에 기반을 둔 파라미터를 융합하여, CART(Classification And Regression Trees) 분석을 통해서 정상/장애음성 판별 프로그램을 구현하였다. 사용된 음향학적 파라미터는 Jitter(%)와 shimmer(%)이다. 그리고 본 연구에서 제안된 고차통계량 기반 파라미터는 왜도(Skewness)와 첨도(Kurtosis)의 평균과 분산이다. Kay Elemetrics의 데이터베이스에서 무작위로 발췌된 정상음성 53명, 장애 음성 173명의 /아/ 발화를 이용하여 결정트리(Decision tree) 기반장애음성 판별을 위해 평균적으로 83.15%의 성능을 보이는 알고리즘을 구현하였다. 그 결과를 바탕으로 추후 상용화를 고려하여 사용자 친화적인 프레임 워크에 의해 컨텐츠를 생성하는 융복합형 기능이 포함된 장애음성 판별 프로그램을 개발하였다.

양성후두 질환의 지속모음을 대상으로 한 기존 피치 추정 방법들의 성능 비교 분석 (Comparative Analysis of Performance of Established Pitch Estimation Methods in Sustained Vowel of Benign Vocal Fold Lesions)

  • 장승진;김효민;최성희;박영철;최홍식;윤영로
    • 음성과학
    • /
    • 제14권4호
    • /
    • pp.179-200
    • /
    • 2007
  • In voice pathology, various measurements calculated from pitch values are proposed to show voice quality. However, those measurements frequently seem to be inaccurate and unreliable because they are based on some wrong pitch values determined from pathological voice data. In order to solve the problem, we compared several pitch estimation methods to propose a better one in pathological voices. From the database of 99 pathological voice and 30 normal voice data, errors derived from pitch estimation were analyzed and compared between pathological and normal voice data or among the vowels produced by patients with benign vocal fold lesions. Results showed that gross pitch errors were observed in the cases of pathological voice data. From the types of pathological voices classified by the degree of aperiodicity in the speech signals, we found that pitch errors were closely related to the number of aperiodic segments. Also, the autocorrelation approach was found to be the most robust pitch estimation in the pathological voice data. It is desirable to conduct further research on the more severely pathological voice data in order to reduce pitch estimation errors.

  • PDF

한국 표준어 연속음성에서의 억양구와 강세구 자동 검출 (Automatic Detection of Intonational and Accentual Phrases in Korean Standard Continuous Speech)

  • 이기영;송민석
    • 음성과학
    • /
    • 제7권2호
    • /
    • pp.209-224
    • /
    • 2000
  • This paper proposes an automatic detection method of intonational and accentual phrases in Korean standard continuous speech. We use the pause over 150 msec for detecting intonational phrases, and extract accentual phrases from the intonational phrases by analyzing syllables and pitch contours. The speech data for the experiment are composed of seven male voices and two female voices which read the texts of the fable 'the ant and the grasshopper' and a newspaper article 'manmulsang' in normal speed and in Korean standard variation. The results of the experiment shows that the detection rate of intonational phrases is 95% on the average and that of accentual phrases is 73%. This detection rate implies that we can segment the continuous speech into smaller units(i.e. prosodic phrases) by using the prosodic information and so the objects of speech recognition can narrow down to words or phrases in continuous speech.

  • PDF

연령증가에 따른 정상 노인의 음향분석학적 특징 (Acoustic and Stroboscopic Characteristics of Normal Person's Voices with Advancing Age)

  • 진성민;권기환;강현국
    • 대한후두음성언어의학회지
    • /
    • 제8권1호
    • /
    • pp.44-48
    • /
    • 1997
  • Anatomic and physiological changes of the larynx with advancing age result in morphologic changes of the vocal fold and reduced control of the phonatory mechanism in elderly individuals and are reflected in increased unstability of fundamental frequency (Fo). The purpose of this study is to increase current understanding of acoustic and stroboscopic characteristics of normal elderly persons voices. First, phonated /a/ vowel productions by 40 normal adults (20 to 40 years, 20 men and 20 women) and 40 normal elderly persons (60 to 80 years,20 men and 20 women) were analyzed, using CSL (model 4300B) acoustic analysis software, to obtain acoustic measures related to fundamental frequency stability nd vocal resonance characteristics. Second, stroboscopic images of the vocal fold behavior in all subjects were analyzed by experienced specialists. In the men, fundamental frequency variation (vFe) (p<0.01), jitter. (p<0.05), and shimmer (p<0.05) for the older group were significantly higher than the value for the adult group. In the stroboscopic findings, edema of vocal fold had a significant finding in aged men (15%). In the women, vFo (p<0.05), jitter (p<0.05), and noise to harmonic ratio (NHR) (p<0.05) for the older group were significantly higher than the value for e adult group and first formant frequency (F1) (p<0.01) and second formant frequency (F2) (p<0.01) for. the older group were significantly lower than the value for the adult group. In the stroboscopic findings, vocal fold atrophy had a significant finding in aged women (25%). Frequency stability, as reflected by vFo, jitter, shimmer, and NHR, decreases with advancing age in men and women and spectral analysis of phonated /a/ vowel productions reveals the lowering of the frequency of F1 and second F2 with advancing age, especially in aged women. Change in the mass of vocal folds, due to atrophy or edema, is considered to be the greatest factor in these acoustic changes.

  • PDF

켑스트럼 파라미터를 이용한 후두암 검진 (Laryngeal Cancer Screening using Cepstral Parameters)

  • 이원범;전경명;권순복;전계록;김수미;김형순;양병곤;조철우;왕수건
    • 대한후두음성언어의학회지
    • /
    • 제14권2호
    • /
    • pp.110-116
    • /
    • 2003
  • Background and Objectives : Laryngeal cancer discrimination using voice signals is a non-invasive method that can carry out the examination rapidly and simply without giving discomfort to the patients. n appropriate analysis parameters and classifiers are developed, this method can be used effectively in various applications including telemedicine. This study examines voice analysis parameters used for laryngeal disease discrimination to help discriminate laryngeal diseases by voice signal analysis. The study also estimates the laryngeal cancer discrimination activity of the Gaussian mixture model (GMM) classifier based on the statistical modelling of voice analysis parameters. Materials and Methods : The Multi-dimensional voice program (MDVP) parameters, which have been widely used for the analysis of laryngeal cancer voice, sometimes fail to analyze the voice of a laryngeal cancer patient whose cycle is seriously damaged. Accordingly, it is necessary to develop a new method that enables an analysis of high reliability for the voice signals that cannot be analyzed by the MDVP. To conduct the experiments of laryngeal cancer discrimination, the authors used three types of voices collected at the Department of Otorhinorlaryngology, Pusan National University Hospital. 50 normal males voice data, 50 voices of males with benign laryngeal diseases and 105 voices of males laryngeal cancer. In addition, the experiment also included 11 voices data of males with laryngeal cancer that cannot be analyzed by the MDVP, Only monosyllabic vowel /a/ was used as voice data. Since there were only 11 voices of laryngeal cancer patients that cannot be analyzed by the MDVP, those voices were used only for discrimination. This study examined the linear predictive cepstral coefficients (LPCC) and the met-frequency cepstral coefficients (MFCC) that are the two major cepstrum analysis methods in the area of acoustic recognition. Results : The results showed that this met frequency scaling process was effective in acoustic recognition but not useful for laryngeal cancer discrimination. Accordingly, the linear frequency cepstral coefficients (LFCC) that excluded the met frequency scaling from the MFCC was introduced. The LFCC showed more excellent discrimination activity rather than the MFCC in predictability of laryngeal cancer. Conclusion : In conclusion, the parameters applied in this study could discriminate accurately even the terminal laryngeal cancer whose periodicity is disturbed. Also it is thought that future studies on various classification algorithms and parameters representing pathophysiology of vocal cords will make it possible to discriminate benign laryngeal diseases as well, in addition to laryngeal cancer.

  • PDF

병적 음성과 정상 음성의 음향학적 파라미터 분포에 대한 통계적 분석 (An analysis of a statistical difference of acoustic Parameters' distribution between normal voice and pathological voice)

  • 김용주;권순복;김기련;신민철;조철우;왕수건
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(4)
    • /
    • pp.249-252
    • /
    • 2001
  • The most basic means of communication among humans is a voice. Without speaking of voice technologies, we found it is important and convenient to use a voice in everyday life. But. in consideration to speech recognition systems, we can't always desire a normal voice input as input signal to the system. Generally speaking. a pathological voice as against a normal which is a voice with a problem in the larynx. could be also special case of input voice. Of course, but the distortion of a speech signal by environmental effects i.e., noise or transmission channel was a raised problem. we will take up a pathological voices with laryngeal disease which is essential distortion factor in voice. Also, we are to find out the difference of acoustic parameters distribution between normal and pathological voice by a statistical method in our research.

  • PDF

정상인과 식도발성 음성에서의 공기역학적 비교 연구 (The Aerodynamic Analysis between Normal Voice and Esophageal Voice)

  • 박국진;최홍식;정형진;유신영;박준호;김한수
    • 대한후두음성언어의학회지
    • /
    • 제9권1호
    • /
    • pp.5-10
    • /
    • 1998
  • Voice rehabilitation is very important concerning in laryngectomees. Esophageal speech is a common and widely used method of voice restoration. But, until now there is no reliable data which shows the aerodynamic characteristics of esophageal speech. In order to evaluate the vocal quality of normal laryngeal and esophageal speech, several aerodynamic parameters were measured in 13 adults with normal laryngeal voice and 2 excellent esophageal speakers using Aerophone II voice function analyzer. The examined parameters were maximal flow rate, mean airflow rate, subglottic pressure, vocal efficiency, glottic resistance, maximal phonation time and mean sound pressure level. In vocal efficiency, there is no difference between two groups, but in other parameters, marked differences were showed in esophageal speakers, especially mean resistance. Results indicates that esophageal speakers make the efficient voices with poor aerodynamic condition, comparing with normal laryngeal speakers.

  • PDF

개별화자의 음성파라미터 추출에 관한 연구: 음성파라미터의 상관관계를 중심으로 (A Study of Extracting Acoustic Parameters for Individual Speakers)

  • 고도흥
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.129-143
    • /
    • 2003
  • Fundamental frequency (Fo), jitter, shimmer, and harmonics-to-noise ratio (NHR) have been measured to see their interactions between the parameters using Multi-Dimensional Voice Program (MDVP). 100 Korean normal adults (50 males and 50 females) ranging from their early 20's to their early 30's produced the eight sustained vowels including /a/, /i/, /u/, /c/, /e/,/$\varepsilon$/, /i/, and /e/. The subjects were asked to read the above vowels five times in isolation with the interval of five seconds, respectively. Male voices, on the average, showed 130.7 Hz in Fo, 0.6696% in jitter, 1.8151% in shimmer, and 0.12 in NHR, while female voices showed 232.8 Hz in Fo, 0.9222% in jitter, 1.9199% in shimmer, and 0.1098 in NHR. As to the correlation coefficient, it was found that for male speakers jitter vs. shimmer, shimmer vs. NHR, Fo vs. shimmer, and Fo vs. NHR are statistically significant. It was found that for female subjects jitter vs. shimmer and Fo vs. shimmer are statistically significant. However, it is concluded that the correlation coefficient in females are not meaningful in a practical way though they are all statistically significant.

  • PDF