• Title/Summary/Keyword: casual speech

Search Result 12, Processing Time 0.025 seconds

The perception of clear and casual English speech under different speed conditions (다른 발화 속도의 또렷한 음성과 대화체로 발화한 영어문장 인지)

  • Yi, So Pae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.33-37
    • /
    • 2018
  • Korean students with much exposure to the relatively slow and clear speech used in most English classes in Korea can be expected to have difficulty understanding the casual style that is common in the everyday speech of English speakers. This research attempted to investigate an effective way to utilize casual speech in English education, by exploring the way different speech styles (clear vs. casual) affect Korean learners' comprehension of spoken English. Twenty Korean university students and two native speakers of English participated in a listening session. The English utterances were produced in different speech styles (clear slow, casual slow, clear fast, and casual fast). The Korean students were divided into two groups by English proficiency level. The results showed that the Korean students achieved 69.4% comprehension accuracy, while the native speakers of English demonstrated almost perfect results. The Korean students (especially the low-proficiency group) had more problems perceiving function words than they did perceiving content words. Responding to the different speech styles, the high-proficiency group had more difficulty listening to utterances with phonological variation than they did listening to utterances produced at a faster speed. The low-proficiency group, however, struggled with utterances produced at a faster speed more than they did with utterances with phonological variation. The pedagogical implications of the results are discussed in the concluding section.

Korean speakers hyperarticulate vowels in polite speech

  • Oh, Eunhae;Winter, Bodo;Idemaru, Kaori
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.15-20
    • /
    • 2021
  • In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.

Google speech recognition of an English paragraph produced by college students in clear or casual speech styles (대학생들이 또렷한 음성과 대화체로 발화한 영어문단의 구글음성인식)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.43-50
    • /
    • 2017
  • These days voice models of speech recognition software are sophisticated enough to process the natural speech of people without any previous training. However, not much research has reported on the use of speech recognition tools in the field of pronunciation education. This paper examined Google speech recognition of a short English paragraph produced by Korean college students in clear and casual speech styles in order to diagnose and resolve students' pronunciation problems. Thirty three Korean college students participated in the recording of the English paragraph. The Google soundwriter was employed to collect data on the word recognition rates of the paragraph. Results showed that the total word recognition rate was 73% with a standard deviation of 11.5%. The word recognition rate of clear speech was around 77.3% while that of casual speech amounted to 68.7%. The reasons for the low recognition rate of casual speech were attributed to both individual pronunciation errors and the software itself as shown in its fricative recognition. Various distributions of unrecognized words were observed depending on each participant and proficiency groups. From the results, the author concludes that the speech recognition software is useful to diagnose each individual or group's pronunciation problems. Further studies on progressive improvements of learners' erroneous pronunciations would be desirable.

An aerodynamic and acoustic characteristics of Clear Speech in patients with Parkinson's disease (파킨슨 환자의 클리어 스피치 전후 음향학적 공기역학적 특성)

  • Shin, Hee Baek;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.67-74
    • /
    • 2017
  • An increase in speech intelligibility has been found in Clear Speech compared to conversational speech. Clear Speech is defined by decreased articulation rates and increased frequency and length of pauses. The objective of the present study was to investigate improvement in immediate speech intelligibility in 10 patients with Parkinson's disease (age range: 46 to 75 years) using Clear Speech. This experiment has been performed using the Phonatory Aerodynamic System 6600 after the participants read the first sentence of a Sanchaek passage and the "List for Adults 1" in the Sentence Recognition Test (SRT) using casual speech and Clear Speech. Acoustic and aerodynamic parameters that affect speech intelligibility were measured, including mean F0, F0 range, intensity, speaking rate, mean airflow rate, and respiratory rate. In the Sanchaek passage, use of Clear Speech resulted in significant differences in mean F0, F0 range, speaking rate, and respiratory rate, compared with the use of casual speech. In the SRT list, significant differences were seen in mean F0, F0 range, and speaking rate. Based on these findings, it is claimed that speech intelligibility can be affected by adjusting breathing and tone in Clear Speech. Future studies should identify the benefits of Clear Speech through auditory-perceptual studies and evaluate programs that use Clear Speech to increase intelligibility.

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

  • Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.77-84
    • /
    • 2018
  • The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.

An Analysis of Phonetic Parameters for Individual Speakers (개별화자 음성의 특징 파라미터 분석)

  • Ko, Do-Heung
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.177-189
    • /
    • 2000
  • This paper investigates how individual speakers' speech can be distinguished using acoustic parameters such as amplitude, pitch, and formant frequencies. Word samples from fifteen male speakers in their 20's in three different regions were recorded in two different modes (i.e., casual and clear speech) in quiet settings, and were analyzed with a Praat macro scrip. In order to determine individual speakers' acoustical values, the total duration of voicing segments was measured in five different timepoints. Results showed that a high correlation coefficient between $F_1\;and\;F_2$ in formant frequency was found among the speakers although there was little correlation coefficient between amplitude and pitch. Statistical grouping shows that individual speakers' voices were not reflected in regional dialects for both casual and clear speech. In addition, the difference of maximum and minimum in amplitude was about 10 dB which indicates a perceptually audible degree. These acoustic data can give some meaningful guidelines for implementing algorithms of speaker identification and speaker verification.

  • PDF

Deep Level Situation Understanding for Casual Communication in Humans-Robots Interaction

  • Tang, Yongkang;Dong, Fangyan;Yoichi, Yamazaki;Shibata, Takanori;Hirota, Kaoru
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.1
    • /
    • pp.1-11
    • /
    • 2015
  • A concept of Deep Level Situation Understanding is proposed to realize human-like natural communication (called casual communication) among multi-agent (e.g., humans and robots/machines), where the deep level situation understanding consists of surface level understanding (such as gesture/posture understanding, facial expression understanding, speech/voice understanding), emotion understanding, intention understanding, and atmosphere understanding by applying customized knowledge of each agent and by taking considerations of thoughtfulness. The proposal aims to reduce burden of humans in humans-robots interaction, so as to realize harmonious communication by excluding unnecessary troubles or misunderstandings among agents, and finally helps to create a peaceful, happy, and prosperous humans-robots society. A simulated experiment is carried out to validate the deep level situation understanding system on a scenario where meeting-room reservation is done between a human employee and a secretary-robot. The proposed deep level situation understanding system aims to be applied in service robot systems for smoothing the communication and avoiding misunderstanding among agents.

Speech Recognition of the Korean Vowel 'ㅗ' Based on Time Domain Waveform Patterns (시간 영역 파형 패턴에 기반한 한국어 모음 'ㅗ'의 음성 인식)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.583-590
    • /
    • 2016
  • Recently, the rapidly increasing interest in IoT in almost all areas of casual human life has led to wide acceptance of speech recognition as a means of HCI. Simultaneously, the demand for speech recognition systems for mobile environments is increasing rapidly. The server-based speech recognition systems are typically fast and show high recognition rates; however, an internet connection is necessary, and complicated server computation is required since a voice is recognized by units of words that are stored in server databases. In this paper, we present a novel method for recognizing the Korean vowel 'ㅗ', as a part of a phoneme based Korean speech recognition system. The proposed method involves analyses of waveform patterns in the time domain instead of the frequency domain, with consequent reduction in computational cost. Elementary algorithms for detecting typical waveform patterns of 'ㅗ' are presented and combined to make final decisions. The experimental results show that the proposed method can achieve 89.9% recognition accuracy.

An Analysis of Short and Long Syllables of Sino-Korean Words Produced by College Students with Kyungsang Dialect (경상방언 대학생들이 발음한 국어 한자어 장단음 분석)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.131-138
    • /
    • 2015
  • The initial syllables of a pair of Sino-Korean words are generally differentiated in their meaning by either short or long durations. They are realized differently by the dialect and generation of speakers. Recent research has reported that the temporal distinction has gradually faded away. The aim of this study is to examine whether college students with Kyungsang dialect made the distinction temporally using a statistical method of Mixed Effects Model. Thirty students participated in the recording of five pairs of Korean words in clear or casual speaking styles. Then, the author measured the durations of the initial syllables of the words and made a descriptive analysis of the data followed by applying Mixed Effects Models to the data by setting gender, length, and style as fixed effects, and subject and syllable as random effects, and tested their effects on the initial syllable durations. Results showed that college students with Kyungsang dialect did not produce the long and short syllables distinctively with any statistically significant difference between them. Secondly, there was a significant difference in the duration of the initial syllables between male and female students. Thirdly, there was also a significant difference in the duration of the initial syllables produced in the clear or casual styles. The author concluded that college students with Kyungsang dialect do not produce long and short Sino-Korean syllables distinctively, and any statistical analysis on the temporal aspect should be carefully made considering both fixed and random effects. Further studies would be desirable to examine production and perception of the initial syllables by speakers with various dialect, generation, and age groups.

Speech Recognition of the Korean Vowel 'ㅜ' Based on Time Domain Bulk Indicators (시간 영역 벌크 지표에 기반한 한국어 모음 'ㅜ'의 음성 인식)

  • Lee, Jae Won
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.591-600
    • /
    • 2016
  • Computing technologies are increasingly applied to most casual human environment networks, as computing technologies are further developed. In addition, the rapidly increasing interest in IoT has led to the wide acceptance of speech recognition as a means of HCI. In this study, we present a novel method for recognizing the Korean vowel 'ㅜ', as a part of a phoneme based Korean speech recognition system. The proposed method involves analyses of bulk indicators calculated in the time domain instead of analysis in the frequency domain, with consequent reduction in the computational cost. Four elementary algorithms for detecting typical waveform patterns of 'ㅜ' using bulk indicators are presented and combined to make final decisions. The experimental results show that the proposed method can achieve 90.1% recognition accuracy, and recognition speed of 0.68 msec per syllable.