Search | Korea Science

A Study of Extracting Acoustic Parameters for Individual Speakers (개별화자의 음성파라미터 추출에 관한 연구: 음성파라미터의 상관관계를 중심으로)

Ko, Do-Heung
- Speech Sciences
- /
- v.10 no.2
- /
- pp.129-143
- /
- 2003
Fundamental frequency (Fo), jitter, shimmer, and harmonics-to-noise ratio (NHR) have been measured to see their interactions between the parameters using Multi-Dimensional Voice Program (MDVP). 100 Korean normal adults (50 males and 50 females) ranging from their early 20's to their early 30's produced the eight sustained vowels including /a/, /i/, /u/, /c/, /e/,/$\varepsilon$/, /i/, and /e/. The subjects were asked to read the above vowels five times in isolation with the interval of five seconds, respectively. Male voices, on the average, showed 130.7 Hz in Fo, 0.6696% in jitter, 1.8151% in shimmer, and 0.12 in NHR, while female voices showed 232.8 Hz in Fo, 0.9222% in jitter, 1.9199% in shimmer, and 0.1098 in NHR. As to the correlation coefficient, it was found that for male speakers jitter vs. shimmer, shimmer vs. NHR, Fo vs. shimmer, and Fo vs. NHR are statistically significant. It was found that for female subjects jitter vs. shimmer and Fo vs. shimmer are statistically significant. However, it is concluded that the correlation coefficient in females are not meaningful in a practical way though they are all statistically significant.
PDF

A Corpus-based study on the Effects of Gender on Voiceless Fricatives in American English

Yoon, Tae-Jin
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.117-124
- /
- 2015
This paper investigates the acoustic characteristics of English fricatives in the TIMIT corpus, with a special focus on the role of gender in rendering fricatives in American English. The TIMIT database includes 630 talkers and 2342 different sentences, comprising over five hours of speech. Acoustic analyses are conducted in the domain of spectral and temporal properties by treating gender as an independent factor. The results of acoustic analyses revealed that the most acoustic properties of voiceless sibilants turned out to be different between male and female speakers, but those of voiceless non-sibilants did not show differences. A classification experiment using linear discriminant analysis (LDA) revealed that 85.73% of voiceless fricatives are correctly classified. The sibilants are 88.61% correctly classified, whereas the non-sibilants are only 57.91% correctly classified. The majority of the errors are from the misclassification of /ɵ/ as [f]. The average accuracy of gender classification is 77.67%. Most of the inaccuracy results are from the classification of female speakers in non-sibilants. The results are accounted for by resorting to biological differences as well as macro-social factors. The paper contributes to the understanding of the role of gender in a large-scale speech corpus.
https://doi.org/10.13064/KSSS.2015.7.1.117 인용 PDF KSCI

Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition (남성과 여성의 음성 특징 비교 및 성별 음성인식에 의한 인식 성능의 향상)

Lee, Chang-Young
- The Journal of the Korea institute of electronic communication sciences
- /
- v.5 no.6
- /
- pp.568-574
- /
- 2010
In an effort to improve the speech recognition rate, we investigated performance comparison between speaker-independent and gender-specific speech recognitions. For this purpose, 20 male and 20 female speakers each pronounced 300 isolated Korean words and the speeches were divided into 4 groups: female, male, and two mixed genders. To examine the validity for the gender-specific speech recognition, Fourier spectrum and MFCC feature vectors averaged over male and female speakers separately were examined. The result showed distinction between the two genders, which supports the motivation for the gender-specific speech recognition. In experiments of speech recognition rate, the error rate for the gender-specific case was shown to be less than50% compared to that of the speaker-independent case. From the obtained results, it might be suggested that hierarchical recognition of gender and speech recognition might yield better performance over the current method of speech recognition.
PDF KSCI

A Study on Speaker Identification by Difference Sum and Correlation Coefficient of Intensity Levels from Band-pass Filtered Sounds (대역별로 여과한 음성 강도의 차이값과 상관계수에 의한 화자확인 연구)

Yang, Byung-Gon
- Speech Sciences
- /
- v.10 no.2
- /
- pp.249-258
- /
- 2003
This study attempted to examine a speaker identification method using difference sum and correlation coefficient determined from a pair of intensity level matrices of band-pass-filtered numeric sounds produced by ten female speakers of similar age and height. Subjects recorded three digit numbers at a quiet room at a sampling rate of 22 kHz on a personal computer. Collected data were band-pass-filtered at five different band ranges. Then, matrices of five intensity levels at 100 proportional time points were obtained. Pearson correlation coefficients and the sum of absolute intensity differences between a pair of given matrices were determined within and across the speakers. Results showed that very high correlation coefficient and small difference sum generally occurred within each speaker but some individual variation was also observed. Thus, the matrix pair with a higher coefficient and a smaller difference sum was averaged to form each individual's model. Comparison among the speakers yielded generally low coefficients and large differences, which suggests successful speaker identification, but among them there were a few cases with very high coefficients and small differences. Future studies will focus on finer band ranges and additional spectral parameters at some peak points of the intensity contour at a low frequency band.
PDF

The Study of Breath Group Based on Oral Airflow in Reading by Healthy Speakers (구강기류 분석에 근거한 정상 성인의 문단 읽기 시 호흡그룹의 특징)

Han, Ji-Yeon;Lee, Ok-Bun;Shim, Lee-Seul
- Speech Sciences
- /
- v.15 no.4
- /
- pp.135-146
- /
- 2008
Breath group generally refers to one of units of speech production. It is an integral component of structural and contextual features of utterances with some responsibility for fluctuations in speech intelligibility. The purpose of this study was to know the characteristics of breath group in reading passages spoken by healthy speakers, specifically in view of aerodynamic aspects. Eighteen female speakers aged from 20 to 30 years old without communication problems and in healthy condition were participated in this study. PAS (Phonatory Aerodynamic System) was used for aerodynamic measurement of breath group. Results showed that the mean value of breath group in reading tasks was 16.03 per minute (SD=3.1), and the spoken syllables per one breath group were 17.95. And the mean time (m) of breath group was 3.06 (SD=0.62), and the ratio of exhalation and inhalation was appeared in the 1:5. The results need to be discussed in values of normality of breath group and clinical viewpoint, especially their potential implication from speech intelligibility caused by brain damage.
PDF

The effects of length of residence (LOR) on voice onset time (VOT)

Kim, Mi-Ryoung
- Phonetics and Speech Sciences
- /
- v.12 no.4
- /
- pp.9-17
- /
- 2020
Changes in the first language (L1) sound system as a result of acquiring a second language (L2) (i.e., phonetic drift) have received considerable attention from a variety of speakers, settings, and environments. Less attention has been given to phonetic drift in adult speakers' L2 learning as their length of residence in America (LOR) increases. This study examines the effects of LOR on voice onset time (VOT) in L1 Korean stops. Three different groups of Korean adult learners of L2 English were compared to assess how malleable their L1 representations are in terms of LOR and whether there is any relationship between L1 change and L2 acquisition. The results showed that the effect of LOR was linguistically unimportant in the production of Korean stops. However, VOT merger as evidence of sound change in Korean stops were robust in the speech production of most of the female speakers across the groups. The results suggest that L2 English may not be the primary cause of L1 sound change. For generalizability, further study is necessary to see whether other acoustic cues show a similar pattern.
https://doi.org/10.13064/KSSS.2020.12.4.009 인용 PDF KSCI

The Characteristics and Significance of 'Nim' Texts in the Late Chason Period: Focused on Saseol-sijo and Chap-ga (조선후기 '님' 담론의 특성과 그 의미 : 사설시조와 잡가를 중심으로)

Shin Eun-Kyung
- Sijohaknonchong
- /
- v.20
- /
- pp.113-139
- /
- 2004
This article intends to illuminate how the men. leading agents in Saseol-sijo - musical performers. writers of lyrics, patrons. composers. compilers of Sijo anthologies, audience. etc. - In the Late Choson period, viewed or recognized women and how their understanding of women was reflected in the texts. Working with texts with the theme of 'Love,' this article starts with categorizing two types of love: the first type, 'lovelorn heart' focusing on unilateral pining for a single lover who is absent now and the second type. 'physical love' concentrating on bilateral sexual intercourse. In addition to the types of love, the gender of poetic speakers, distinct from real poets is vital to characterize the discourse of love. According to these two factors. texts in question fall into four groups: texts that a female speaker displays her lovelorn heart('Type 1'), those where she speaks about her sexual experiences('Type 2'), those where a male speaker sings his lovelorn heart('Type 3'), and those where he describes his sexual experiences('Type 4'). Of these. 'Type 2' and 'Type 3' are key to understanding of the men's view of women. With respect to the configuration of the theme of 'Love,' it should be noted that in Korean literary history, the nim or a 'sweetheart' had signified the totality of value or a perfect entity which makes one's life meaningful and that 'Type 1,' the pattern that a female subject expresses her love toward male min, had constituted a traditional way to convey the theme of 'Love.' In terms of this connotation of min. a remarkable increase of 'Type 3' implying the increase of male speakers, reveals the extent to which women, the male speakers' min, accomplished their entry into a 'sacred area' -the position of mm-in which only men had occupied; females are focused and centralized. This article considers this phenomenon as an exhibition of the upgrade of women's significance and weight in the Late Choson society and as an index of 'modernity.' Meanwhile, given that most of the Saseol-sijo poets are men, the emergence of the 'Type 2' texts in which male poets have female speakers disclose their sexual experiences, demonstrates a representative example that women are degraded to be a means of men's pleasure; for this situation gives men more pleasure than when male speakers reveal their sexual experiences. Not only 'Type 2,' but texts group which basically belongs to 'Type I' and conveys the theme of 'Loyalty' through the female voice by substituting rulers-subjects relation for men-women relation, also falls under the same case. For men employ female voice as a poetic device in order to stress the theme of 'Loyalty' This article regards this phenomenon as an index of 'pre-modernity,' in the sense that in a pre-modem society, specifically in Early Choson, male-oriented value system dominates, thereby alienating women. As it is well known, the Late Choson is marked by a transitional period from a pre-modem society to a modem society. Therefore the ambivalence of the premodern and the modem can be found mixed in every segment of the society. The dual aspects of the masculine view of women in Saseol-sijo constitutes one example. The significance of the Saseol-sijo in Korean literary history can be found in this phenomenon.
PDF

The fundamental frequency (f0) distribution of Korean speakers in a dialogue corpus using Praat and R (Praat과 R로 분석한 한국인 대화 음성 말뭉치의 fundamental frequency(f0)값 분포)

Byunggon Yang
- Phonetics and Speech Sciences
- /
- v.15 no.3
- /
- pp.17-25
- /
- 2023
This study examines the fundamental frequency(f0) distribution of 2,740 Korean speakers in a dialogue speech corpus. Praat and R were used for the collection and analysis of acoustical f0 data after removing extreme values considering the interquartile f0 range of the intonational phrases produced by each individual speaker. Results showed that the average f0 value of all speakers was 185 Hz and the median value was 187 Hz. The f0 data showed a positively skewed distribution of 0.11, and the kurtosis was -0.09, which is close to the normal distribution. The pitch values of daily conversations varied in the range of 238 Hz. Further examination of the male and female groups showed distinct median f0 values: 114 Hz for males and 199 Hz for females. A t-test between the two groups yielded a significant difference. The skewness representing the distribution shape was 1.24 for the male group and 0.58 for the female group. The kurtosis was 5.21 and 3.88 for the male and female groups, and the male group values appeared leptokurtic. A regression analysis between the median f0 and age yielded a slope of 0.15 for the male group and -0.586 for the female group, which indicated a divergent relationship. In conclusion, a normative f0 distribution of different Korean age and sex groups can be examined in the conversational speech corpus recorded by a massive number of participants. However, more rigorous data might be required to define a relation between age and f0 values.
https://doi.org/10.13064/KSSS.2023.15.3.017 인용 PDF

Acoustic Cues in Spoken French for the Pronunciation Assessment Multimedia System (발음평가용 멀티미디어 시스템 구현을 위한 구어 프랑스어의 음향학적 단서)

Lee, Eun-Yung;Song, Mi-Young
- Speech Sciences
- /
- v.12 no.3
- /
- pp.185-200
- /
- 2005
The objective of this study is to examine acoustic cues in spoken French for the assessment of pronunciation which is necessary to realization of the multimedia system. The corpus is composed of simple expressions which consist of the French phonological system include all phonemes. This experiment was made on 4 male and female French native speakers and on 20 Korean speakers, university students who had learned the French language more than two years. We analyzed the recorded data by using spectrograph and measured comparative features by the numerical values. First of all, we found the mean and the deviation of all phonemes, and then chose features which had high error frequency and great differences between French and Korean pronunciations. The selected data were simplified and compared among them. After we judged whether the problems of pronunciation in each Korean speaker were either the utterance mistake or the interference of mother tongue, in terms of articulatory and auditory aspects, we tried to find acoustic features as simplified as possible. From this experiment, we could extract acoustic cues for the construction of the French pronunciation training system.
PDF

The Effects of the Methods of Disguised Voice on the Aural Decision (위장 발화 방법의 차이가 청취 판단에 미치는 영향)

Song Min-Chang;Shin Jiyoung;Kang SunMee
- MALSORI
- /
- no.46
- /
- pp.25-35
- /
- 2003
This study deals with the disguised voice (or voice disguise) in the field of forensic phonetics. We especially studied the effects of the methods of disguised voice on the aural decision. Within the nonelectronic-deliberate voice disguise area, the methods of disguised voice include use of lowered pitch, pinched nostrils, falsetto, and whisper. Ten (male:5, female:5) Seoul speakers made a recording of 16 sentences. In the aural test, 30 subjects listened normal and disguised voice. And they were asked to make a decision whether speakers identified or not. The result is as follows: The speaker verification of the falsetto and whisper was more difficult than the lowered pitch and pinched nostrils.
PDF

Search Result 124, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)