Search | Korea Science

Voice Personality Transformation Using an Optimum Classification and Transformation (최적 분류 변환을 이용한 음성 개성 변환)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.400-409
- /
- 2004
In this paper. a voice personality transformation method is proposed. which makes one person's voice sound like another person's voice. To transform the voice personality. vocal tract transfer function is used as a transformation parameter. Comparing with previous methods. the proposed method makes transformed speech closer to target speaker's voice in both subjective and objective points of view. Conversion between vocal tract transfer functions is implemented by classification of entire vector space followed by linear transformation for each cluster. LPC cepstrum is used as a feature parameter. A joint classification and transformation method is proposed, where optimum clusters and transformation matrices are simultaneously estimated in the sense of a minimum mean square error criterion. To evaluate the performance of the proposed method. transformation rules are generated from 150 sentences uttered by three male and on female speakers. These rules are then applied to another 150 sentences uttered by the same speakers. and objective evaluation and subjective listening tests are performed.
PDF KSCI

The pattern of use by gender and age of the discourse markers 'a', 'eo', and 'eum' (담화표지 '아', '어', '음'의 성별과 연령별 사용 양상)

Song, Youngsook;Shim, Jisu;Oh, Jeahyuk
- Phonetics and Speech Sciences
- /
- v.12 no.4
- /
- pp.37-45
- /
- 2020
This paper quantitatively calculated the speech frequency of the discourse markers 'a', 'eo', and 'eum' and the speech duration of these discourse markers using the Seoul Corpus, a spontaneous speech corpus. The sound durations were confirmed with Praat, the Seoul Corpus was analyzed with Emeditor, and the results were presented by statistical analysis with R. Based on the corpus analysis, the study investigated whether a particular factor is preferred by speakers of particular categories. The most prominent feature of the corpus is that the sound durations of female speakers were longer than those of men when using the 'eum' discourse marker in a final position. In age-related variables, teenagers uttered 'a' more than 'eo' in an initial position when compared to people in their 40s. This study is significant because it has quantitatively analyzed the discourse markers 'a', 'eo', and 'eum' by gender and age. In order to continue the discussion, more precise research should be conducted considering the context. In addition, similarities can be found in "e" and "ma" in Japanese(Watanabe & Ishi, 2000) and 'uh', 'um' in English(Gries, 2013). afterwards, a study to identify commonalities and differences can be predicted by using the cross-linguistic analysis of the discourse.
https://doi.org/10.13064/KSSS.2020.12.4.037 인용 PDF KSCI

Perceptual cues for /o/ and /u/ in Seoul Korean (서울말 /?/와 /?/의 지각특성)

Byun, Hi-Gyung
- Phonetics and Speech Sciences
- /
- v.12 no.3
- /
- pp.1-14
- /
- 2020
Previous studies have confirmed that /o/ and /u/ in Seoul Korean are undergoing a merger in the F1/F2 space, especially for female speakers. As a substitute parameter for formants, it is reported that female speakers use phonation (H1-H2) differences to distinguish /o/ from /u/. This study aimed to explore whether H1-H2 values are being used as perceptual cues for /o/-/u/. A perception test was conducted with 35 college students using /o/ and /u/ spoken by 41 females, which overlap considerably in the vowel space. An acoustic analysis of 182 stimuli was also conducted to see if there is any correspondence between production and perception. The identification rate was 89% on average, 86% for /o/, and 91% for /u/. The results confirmed that when /o/ and /u/ cannot be distinguished in the F1/F2 space because they are too close, H1-H2 differences contribute significantly to the separation of the two vowels. However, in perception, this was not the case. H1-H2 values were not significantly involved in the identification process, and the formants (especially F2) were still dominant cues. The study also showed that even though H1-H2 differences are apparent in females' production, males do not use H1-H2 in their production, and both females and males do not use H1-H2 in their perception. It is presumed that H1-H2 has not yet been developed as a perceptual cue for /o/ and /u/.
https://doi.org/10.13064/KSSS.2020.12.3.001 인용 PDF KSCI

Impact of face masks on spectral and cepstral measures of speech: A case study of two Korean voice actors (한국어 스펙트럼과 캡스트럼 측정시 안면마스크의 영향: 남녀 성우 2인 사례 연구)

Wonyoung Yang;Miji Kwon
- The Journal of the Acoustical Society of Korea
- /
- v.43 no.4
- /
- pp.422-435
- /
- 2024
This study intended to verify the effects of face masks on the Korean language in terms of acoustic, aerodynamic, and formant parameters. We chose all types of face masks available in Korea based on filter performance and folding type. Two professional voice actors (a male and a female) with more than 20 years of experience who are native Koreans and speak standard Korean participated in this study as speakers of voice data. Face masks attenuated the high-frequency range, resulting in decreased Vowel Space Area (VSA) and Vowel Articulation Index (VAI)scores and an increased Low-to-High spectral ratio (L/H ratio) in all voice samples. This can result in lower speech intelligibility. However, the degree of increment and decrement was based on the voice characteristics. For female speakers, the Speech Level (SL) and Cepstral Peak Prominence (CPP) increased with increasing face mask thickness. In this study, the presence or filter performance of a face mask was found to affect speech acoustic parameters according to the speech characteristics. Face masks provoked vocal effort when the vocal intensity was not sufficiently strong, or the environment had less reverberance. Further research needs to be conducted on the vocal efforts induced by face masks to overcome acoustic modifications when wearing masks.
https://doi.org/10.7776/ASK.2024.43.4.422 인용 PDF

Phonation Threshold Flow and Phonation Threshold Pressure in Patients with Adductor Spasmodic Dysphonia

Choi, Seong-Hee;Jiang, Jack J.;Yun, Bo-Ram;Lee, Ji-Yeoun;Lim, Sung-Eun;Choi, Hong-Shik
- Phonetics and Speech Sciences
- /
- v.2 no.3
- /
- pp.157-164
- /
- 2010
This study investigated the characteristics of two aerodynamic indices, PTP (Phonation threshold pressure) and PTF (Phonation threshold flow) in patients with ADSD (adductor spasmodic dysphonia) and to see if two new aerodynamic indices can differentiate between normal and ADSD group. Additionally, PTP and PTF values were compared in terms of overall severity of ADSD in the patient group. The severity of ADSD was rated on a 7-point rating scale by two experienced speech language pathologists. The Kay Elemetrics Phonatory Aerodynamic System (PAS) (Kay Elemetrics Corp., Lincoln Park, NJ) was used to collect PTP and PTF measurements from 16 female normal subjects, 31 female patients with ADSD. Significantly lower PTF values (P< 0.05) were observed in ADSD when compared to those of normal control. Also, significantly lower PTF values in severe ADSD patients (P<.001). However, PTP could not distinguish patients with ADSD from control groups (P=0.119) and among the ADSD groups according to the severity (P=0.177). Consequently, PTF was more sensitive than PTP which might differentiate between normal speakers and ADSD and among different levels of severity within ADSD, suggesting that PTF could be a useful diagnostic parameter to measure the aerodynamic function of ADSD and provide the neurolaryngeal dysfunction in patients with ADSD.
PDF

Displacement of Modernism: Edna St. Vincent Millay's Rewriting Carpe Diem Tradition (모더니즘의 일탈 -에드나 세인트 빈센 밀레이의 카르페 디엠 전통 다시 쓰기)

Park, Jooyoung
- Journal of English Language & Literature
- /
- v.56 no.5
- /
- pp.797-821
- /
- 2010
This paper aims to explore how Millay's love sonnets rewrite the carpe diem tradition in the complicated ways. This paper redirects critical attention away from Millay's individual experience and inner self toward the scene of literary history, suggesting that there may be more historical consciousness in Millay's sentimental and feminine "gesture." Rewriting the carpe diem tradition, Millay's sonnets reveal an awareness of the dependence of the carpe diem poems' discursive logic on the woman's coyness, its inability to accomplish its triumph over woman or time (death) without her posited reluctance. Contrary to Andrew Marvel's "To His Coy Mistress," the speakers of Millay's sonnets could never be accused of the sexual coyness; they are outspoken in their defiance of both death and lovers whose possessiveness resembles death's embrace. Moreover, as Stacy Carson Hubbard points out, by converting female sexual experience from its status as a onetime closural event to repeatable one, hence an opportunity for the general and emotional irritability productive of narrative, Millay seizes for the woman the power of "dilation" in both its sexual and its verbal forms. Furthermore, this paper argues that the woman's sex no longer invites analogies to things secret and sealed, preserved or ruined in Millay's sonnets. The woman's promiscuity implies a rejection of monumentalizing love, as well as a refusal of the fixing inherent in the carpe diem's fearful invocation of the movement of time. Throughout the love sonnets, the speaker's sexualized body produces nothing but ephemera. For Millay, this body spends its powers in hopes of having them, and the force of this spending is a perpetual and willful forgetting, which makes possible the repetition of love's story. Ultimately, Milly disturbs our critical categories by rendering permeable boundaries between modern literature and dead form of classic literature, the female speaker and male speaker.

Recurrent Neural Network with Backpropagation Through Time Learning Algorithm for Arabic Phoneme Recognition

Ismail, Saliza;Ahmad, Abdul Manan
- 제어로봇시스템학회:학술대회논문집
- /
- 2004.08a
- /
- pp.1033-1036
- /
- 2004
The study on speech recognition and understanding has been done for many years. In this paper, we propose a new type of recurrent neural network architecture for speech recognition, in which each output unit is connected to itself and is also fully connected to other output units and all hidden units [1]. Besides that, we also proposed the new architecture and the learning algorithm of recurrent neural network such as Backpropagation Through Time (BPTT, which well-suited. The aim of the study was to observe the difference of Arabic's alphabet like "alif" until "ya". The purpose of this research is to upgrade the people's knowledge and understanding on Arabic's alphabet or word by using Recurrent Neural Network (RNN) and Backpropagation Through Time (BPTT) learning algorithm. 4 speakers (a mixture of male and female) are trained in quiet environment. Neural network is well-known as a technique that has the ability to classified nonlinear problem. Today, lots of researches have been done in applying Neural Network towards the solution of speech recognition [2] such as Arabic. The Arabic language offers a number of challenges for speech recognition [3]. Even through positive results have been obtained from the continuous study, research on minimizing the error rate is still gaining lots attention. This research utilizes Recurrent Neural Network, one of Neural Network technique to observe the difference of alphabet "alif" until "ya".
PDF

Perception of Transplanted English Prosody by American and Korean Listeners

Yi, So-Pae
- Speech Sciences
- /
- v.14 no.1
- /
- pp.73-89
- /
- 2007
This study explored the perception of transplanted English prosody by thirty American and Korean, male and female listeners. The English utterances of various sentence types produced by Korean and American male speakers were employed to transplant the American prosody contours to Korean English utterances. Then, the thirty subjects were instructed to rate the transplanted prosodic components. Results showed that the interactions between the three factors (e.g., rater groups & transplantation types; transplantation types & sentence types; rater groups & transplantation types & sentence types) turned out to be meaningful. Both Americans and Koreans perceived the effectiveness of the combined effect of transplanted duration and pitch or duration and pitch and intensity. However, when perceiving individual prosodic components, Americans and Koreans showed different perceptual ratings. As for the overall prosody change, Americans perceived the change of intensity in a significant way but Koreans did not because intensity is not a crucial semantic factor in Korean. Americans rated the transplantation of duration alone as ineffective while Koreans rated otherwise. This was explained by the difference between English and Korean. The difference of perspective was also significant with different sentence types, especially with the three sentence types that had speech rates slower than other sentence types. A slower speech rate intensified the mismatch between the transplanted duration and the original pitch causing a negative impression on American listeners whereas this did not affect Korean listeners. Pedagogical implications of the findings are discussed.
PDF

Formant Measurements of Complex Waves and Vowels Produced by Students (복합음과 대학생이 발음한 모음 포먼트 측정)

Yang, Byung-Gon
- Speech Sciences
- /
- v.15 no.3
- /
- pp.39-51
- /
- 2008
Formant measurements are one of the most important factors to objectively test cross-linguistic differences among vowels produced by speakers of any given languages. However, many speech analysis softwares present erroneous estimates and some researchers use them without any verification procedures. The purposes of this paper are to examine formant measurements of complex waves which were synthesized from the average formant values of five Korean vowels using three default methods in Praat and to verify the measured values of the five vowels produced by 20 students using one of the methods. Variances along the time axis are discussed after determining absolute difference sum from the 1/3 vowel duration point. Results show that there were smaller measurement errors by the burg method. Also, greater errors were observed in the sl or lpc methods mostly caused by the inappropriate formant settings. Formant measurement deviations were greater in those vowels produced by the female students than those of the male students, which were mostly attributed to the settings for the vowels /o, u/. Formant settings can best be corrected by changing the number of formants to the number of visible dark bands on the spectrogram. Those results suggest that researchers should check the validity of the estimates from the speech analysis software. Further studies are recommended on the perception test of the original sound with the synthesized sound by the estimated formant values.
PDF

The Effects of Pitch Increasing Training (PIT) on Voice and Speech of a Patient with Parkinson's Disease: A Pilot Study

Lee, Ok-Bun;Jeong, Ok-Ran;Shim, Hong-Im;Jeong, Han-Jin
- Speech Sciences
- /
- v.13 no.1
- /
- pp.95-105
- /
- 2006
The primary goal of therapeutic intervention in dysarthric speakers is to increase the speech intelligibility. Decision of critical features to increase the intelligibility is very important in speech therapy. The purpose of this study is to know the effects of pitch increasing training (PIT) on speech of a subject with Parkinson's disease (PD). The PIT program is focused on increasing pitch while a vowel is sustained with the same loudness. The loudness level is somewhat higher than that of the habitual loudness. A 67-year-old female with PD participated in the study. Speech therapy was conducted for 4 sessions (200 minutes) for one week. Before and after the treatment, acoustic, perceptual and speech naturalness evaluation was peformed for data analysis. Speech and voice satisfaction index (SVSI) was obtained after the treatment. Results showed Improvements in voice quality and speech naturalness. In addition, the patient's satisfaction ratings (SVSI) indicated a positive relationship between improved speech production and their (the patient and care-givers) satisfaction.
PDF

Search Result 124, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)