• Title/Summary/Keyword: Reading Speech

Search Result 203, Processing Time 0.023 seconds

A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System (가변 Break를 이용한 코퍼스 기반 일본어 음성 합성기의 성능 향상 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.155-163
    • /
    • 2009
  • In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.

A Study of Acoustic Analysis in the Chinese' Korean Language Learners (중국인 한국어 학습자 음성의 음향학적 특성 연구)

  • Kim, Hyun-Ji;You, Jae-Yeon
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.75-80
    • /
    • 2010
  • The present research investigated the characteristics of voice between genders and nationalities by measuring the acoustic parameter values of Korean and Chinese students. Sound Forge was used to collect voice samples and Praat was used to measure and analyze jitter, shimmer, NHR, $sF_0$, and pitch range. The results of this research are a follows. First, during prolongation of the vowels, there was no significant difference in $F_0$ between Korean and Chinese males and Korean and Chinese females. Korean males and females had higher $F_0$ values than Chinese males and females. Secondly, during sentence reading, there was no significant difference between Korean and Chinese males in $sF_0$. But between female groups, there was a significant difference in $sF_0$. Thirdly, during sentence reading, the pitch range in Korean males was found to be narrower compared to Korean and Chinese females who had wider pitch range, showing a significant difference. Fourthly, jitter in the five vowels /a, i, u, e, o/ was found to be higher in Chinese than Korean subjects. In the vowels /a, e, u/ females were higher than males showing a significant difference. Fifthly, shimmer in the vowels /a, e, u/ was found to be higher in Chinese than Korean subjects showing a significant difference. Finally, NHR in the vowels /a, u, o/ was found to be higher in Chinese than Korean subjects showing a significant difference.

  • PDF

Table Structure Recognition in Images for Newspaper Reader Application for the Blind (시각 장애인용 신문 구독 프로그램을 위한 이미지에서 표 구조 인식)

  • Kim, Jee Woong;Yi, Kang;Kim, Kyung-Mi
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.11
    • /
    • pp.1837-1851
    • /
    • 2016
  • Newspaper reader mobile applications using text-to-speech (TTS) function enable blind people to read newspaper contents. But, tables cannot be easily read by the reader program because most of the tables are stored as images in the contents. Even though we try to use OCR (Optical character reader) programs to recognize letters from the table images, it cannot be simply applied to the table reading function because the table structure is unknown to the readers. Therefore, identification of exact location of each table cell that contains the text of the table is required beforehand. In this paper, we propose an efficient image processing algorithm to recognize all the cells in tables by identifying columns and rows in table images. From the cell location data provided by the table column and row identification algorithm, we can generate table structure information and table reading scenarios. Our experimental results with table images found commonly in newspapers show that our cell identification approach has 100% accuracy for simple black and white table images and about 99.7% accuracy for colored and complicated tables.

The Prosodic Characteristics of Korean Read Sentences in Dicourse Context (한국어 낭독체 담화문의 운율적 특징 - 단독발화문과 연속발화문의 비교를 통하여 -)

  • Seong Cheol-Jae
    • MALSORI
    • /
    • no.35_36
    • /
    • pp.1-12
    • /
    • 1998
  • This study aims to investigate the prosodic characteristics of Korean discourse sentences, especially focusing the initial and final part of a sentence. 50 disourse sentences were read in two different styles; one, sentence by sentence, the other, continuous of all 50's. First, we tried to get two kinds of ratios from the acoustic results: first, ratio of the final syllable to the initial syllable in first word in a sentence; second, ratio of the final syllable to the initial syllable in last word in a sentence. We, then, calculated statistical values of the ratios including mean, standard deviation, minimum, maximum, and p-values in t-test. With respect to duration, there were little difference between two different styles. If any, we could see tiny unharmonious durational aspect in the initial of continuous reading. More concisely, there could be observed some deviation from standard. In case of F0, there was prominent statistical difference between ratios of last words in two styles. This difference might play a role as a prosodic feature. Energy seems to show similar pattern with that of F0. The results showed that final syllable in last word was pronounced with about 85 % of initial syllable in the same context and the last words in continuous speech were strongly articulated compared with those of sentence by sentence reading.

  • PDF

Usefulness of Cepstral Peak Prominence (CPP) in Unilateral Vocal Fold Paralysis Dysphonia Evaluation (일측성 성대마비 환자 평가에서 Cepstral Peak Prominence의 유용성)

  • Lee, Chang-Yoon;Jeong, Hee Seok;Son, Hee Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.28 no.2
    • /
    • pp.84-88
    • /
    • 2017
  • Background and Objectives : The purpose of this study was to compare the usefulness of Cepstral peak prominence (CPP) with parameter of Multiple Dimensional Voice Program (MDVP) in evaluating unilateral vocal fold paraylsis patients with subjective voice impairment. Materials and Methods : From July 2014 to August 2016, 37 patients with unilateral vocal fold paralysis who had been diagnosed with unilateral vocal fold paralysis and had received two or more voice tests before and after the diagnosis were evaluated for maximum phonation time (MPT), MDVP and CPP. Respectively. Voice tests were performed with short vowel /a/ and paragraph reading. Results : The CPP-a (CPP with vowel /a/) and CPP-s (CPP with paragraph reading) of the Cepstrum were statistically negatively correlated with G, R, B, and A before the voice therapy. Jitter, Shimmer, and NHR of MDVP were positively correlated with G, R, B. Jitter, Shimmer, and NHR of the MDVP were significantly correlated with the Cepstrum index. G, B, A and CPP-a and CPP-s showed a statistically significant negative correlation and a somewhat higher correlation coefficient between 0.5 and 0.78. On the other hand, in MDVP index, there was a positive correlation with G and B only with Jitter of 0.4. Conclusion : CPP can be an important evaluation tool in the evaluation of speech in the unilateral vocal cord paralysis when speech energy changes or the cycle is not constant during speech.

  • PDF

A comparison of Korean vowel formants in conditions of chanting and reading utterances (챈트 및 읽기 발화조건에 따른 한국어 모음 포먼트 비교)

  • Park, Jihye;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.85-94
    • /
    • 2020
  • Vowel articulation in subjects related to speech disorders seems to be difficult. A chant method that properly reflects the characteristics of language could be used as an effective way of addressing the difficulties. The purpose of this study was to find out whether the chant method is effective as a means of enhancing vowel articulation. The subjects of this study were 60 normal adults (30 males and 30 females) in their 20s and 30s whose native language is Korean. Eight utterance conditions including chanting and reading conditions were recorded and their acoustic data were analyzed. The results of the analysis of the acoustic variables related to the formant confirmed that the F1 and F2 values of the vowel formants are increased and the direction of movement of the center of gravity of the vowel triangle is statistically significantly forwarded and lowered in the chant method in both the word and the phrase context. The results also proved that accent is the most influential musical factor in chant. There was no significant difference between four repeated tokens, which increased the reliability of the results. In other words, chanting is an effective way to shift the center of gravity of the vowel triangle, which suggests that it can help to improve speech intelligibility by forming a desirable place for articulation.

Prediction of speaking fundamental frequency using the voice and speech range profiles in normal adults (정상 성인에서 음성 및 말소리 범위 프로파일을 이용한 발화 기본주파수 예측)

  • Lee, Seung Jin;Kim, Jaeock
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.49-55
    • /
    • 2019
  • This study sought to investigate whether mean speaking fundamental frequency (SFF) can be predicted by parameters of voice and speech range profile (VRP and SRP) in Korean normal adults. Moreover, it explored whether gender differences exist in the absolute differences between the SFF and estimated SFF (ESFF) predicted by the VRP and SRP. A total of 85 native Korean speakers with normal voice participated in the study. Each participant was asked to perform the VRP task using the vowel /a/ and the SRP task using the first sentence of a Korean standard passage "Ga-eul". In addition, the SFF was measured with electroglottography during a passage reading task. Predictive factors of the SFF were explored and the absolute difference between the SFF and the ESFF (DSFF) was compared between gender groups. Results indicated that predictive factors were age, gender, minimum pitch and pitch range for the VRP (adjusted $R^2=.931$), and pitch range (in semi-tones) and maximum pitch for the SRP (adjusted $R^2=.963$), respectively. The SFF and ESFF predicted by the VRP and SRP showed a strong positive correlation. The DSFF of the VRP and SRP, as well as their sum did not differ by gender. In conclusion, the SFF during a passage reading task could be successfully predicted by the parameters of the VRP and SRP tasks. In further studies, clinical implications need to be explored in patients who may exhibit deviations in SFF.

Evaluation of the readability of self-reported voice disorder questionnaires (자기보고식 음성장애 설문지 문항의 가독성 평가)

  • HyeRim Kwak;Seok-Chae Rhee;Seung Jin Lee;HyangHee Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.41-48
    • /
    • 2024
  • The significance of self-reported voice assessments concerning patients' chief complaints and quality of life has increased. Therefore, readability assessments of questionnaire items are essential. In this study, readability analyses were performed based on text grade and complexity, vocabulary frequency and grade, and lexical diversity of the 11 Korean versions of self-reported voice disorder questionnaires (KVHI, KAVI, KVQOL, K-SVHI, K-VAPP, K-VPPC, TVSQ, K-VDCQ, K-VFI, K-VTDS, and K-VoiSS). Additionally, a comparative readability assessment was conducted on the original versions of these questionnaires to discern the differences between their Korean counterparts and the questionnaires for children. Consequently, it was determined that voice disorder questionnaires could be used without difficulty for populations with lower literacy levels. Evaluators should consider subjects' reading levels when conducting assessments, and future developments and revisions should consider their reading difficulties.

A Study on the Effect of Reading Activities Making a Podcast: Focusing on the Reading Attitude and the Communication Skills (팟캐스트 제작 독서 활동 프로그램의 효과에 관한 연구 - 독서 태도와 의사소통 능력을 중심으로 -)

  • Hwang, Jeong-Eui;Cho, Miah
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.3
    • /
    • pp.73-99
    • /
    • 2021
  • The purpose of this study is to investigate the effects of podcast-produced reading activity programs on reading attitudes and communication skills of high-grade elementary school students. To that end, 20 fifth and sixth graders applied for a reading culture program conducted at N Public Library located in Seoul, Korea, were assigned as experimental and control groups. The total research period is from 8 October 2018 to 11 February 2019. and the class was conducted once a week for 90 minutes, six times for each group, and a total of 12 sessions. The results of this study can be summarized as follows; First, it was found that the reading activity program produced by podcast had a positive effect in all four areas: value domain, motivation·interest domain, habit·attitude, and expectation domain in the reading attitude of high-grade in elementary school. Second, communication skills were found to have a positive effect in all seven sub-areas, such as information collection and listening, overcoming stereotyped thinking, creative communication, self-disclosure, leading communication, and understanding of others' perspectives. In conclusion, the podcast-produced reading activity program had a positive effect on the reading attitude, communication ability, and media ability of high-grade in elementary school, suggesting that it is effective in applying various educational fields.

Prosodic pattern of the children with high-functioning autism spectrum disorder according to sentence type (문장유형에 따른 고기능 자폐스펙트럼장애 아동의 운율 특성)

  • Shin, Hee Baek;Choi, Jieun;Lee, YoonKyoung
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.65-71
    • /
    • 2016
  • The purpose of this study is to examine the prosodic pattern of the children with high functioning autism spectrum disorder(HFASD) according to sentence type. The participants were 18 children aged from 7 - 9 years; 9 children with HFASD and 9 typical development children(TD) of the same chronological age with HFASD children. Sentence reading tasks were conducted in this study. Seven interrogative sentences and 7 declarative sentences were presented to the participants and were asked to read the sentences three times. Mean values of F0, F0 range, intensity, speech rate and pitch contour were measured for each sentence. The results showed that for F0 range, significant main effect and interaction effect were observed in the subject group and sentence type. There were significant differences in intensity, mean F0, speech rate, pitch contour across sentence types. The results of this study indicated that HFASD showed no difference in intonation across sentence types. Speakers' intention may have a negative effect on pragmatic aspects. These results suggest that the assessment and intervention of prosody be important for HFASD.