Search | Korea Science

Text-to-speech with linear spectrogram prediction for quality and speed improvement (음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech)

Yoon, Hyebin
- Phonetics and Speech Sciences
- /
- v.13 no.3
- /
- pp.71-78
- /
- 2021
Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.
https://doi.org/10.13064/KSSS.2021.13.3.071 인용 PDF KSCI

A Study on Monitoring of Liver Function Based on Voice Signal Analysis for u-Health System (u-Health 시스템을 위한 음성신호 분석 기반의 간 기능 모니터링에 관한 연구)

Kim, Bong-Hyun;Cho, Dong-Uk
- The KIPS Transactions:PartB
- /
- v.18B no.6
- /
- pp.389-396
- /
- 2011
There is getting worse to various liver diseases due to change in eating habits, stress, alcohol etc in modern society. Therefore, we proposed methodology to diagnose early for liver disease to study the influence on voice in liver diseases. To this end, we carried out experiment to apply parameter of voice analysis to collect each voice inpatients and patients by treatment of liver diseases patients. Particularly, we carried out experiment to apply element value of pronunciation and the third formant frequency bandwidths about velar sounds associated liver in oriental medicine, then to produce objective index resonance cavity and influence vocalization in liver diseases. In addition, we carried out to study about design of system to monitoring a liver function in u-Health environment based on result by experiment.
https://doi.org/10.3745/KIPSTB.2011.18B.6.389 인용 PDF KSCI

The Ignition Characteristics of Tree Branches, Barks, Living Leaves and Dead Leaves in Pinus Densiflora and Quercus Dentata (소나무와 떡갈나무의 주요 부위별 착화특성에 관한 연구)

Park, Young-Ju;Lee, Si-Young;Sin, Young-Ju;Kim, Su-Young;Kim, Young-Tak;Lee, Hae-Pyeong
- Proceedings of the Korea Institute of Fire Science and Engineering Conference
- /
- 2008.04a
- /
- pp.308-312
- /
- 2008
In this study, we have carried out test to examine the ignition characteristics, such as a relation of moisture content and combustibility, and ignition temperature using KRS-RG-9000 tester, of significant part of above trees which are representative species of Young Dong Province of Korea. After rainfall, the percentage of moisture content of living leaves and branches was between 52 and 70%. But it was just between 17 and 33% after 144 hours drying at normal temperature. For dead leaves, it was 10% lower than of first. There was a significant difference on ignition characteristics. The hazard of ignition is highest on dead leaves. The ignition temperature of barks and branches is higher so a retard time is long than of living and dead leaves at normal temperature.
PDF

Paralinguistic Behavior as a Deception Cue (거짓말의 단서로서 준언어행위)

Kim, Daejoong;Park, Jihye
- The Journal of the Korea Contents Association
- /
- v.19 no.4
- /
- pp.187-196
- /
- 2019
This experimental study examines whether paralinguistic behavior is a deception cue in an interrogation. 92 college students participated in an experiment and were randomly assigned to two conditions. Participant were then asked to take the money or not to take the money according to the condition they were assigned. Then participants had a face-to-face interrogation. During the interrogation, participants' paralinguistic behavior was recorded and used for coding and analysis. Results reveal that participants' paralinguistic behaviors differ depending on question types and deceptive paralinguistic cues are speech speed and fillers for the closed critical question and response latency, response length, and fillers for the open critical question. These findings implicate that part of paralinguistic behavior could be a deception cue and thus these cues might be applicable to deception detection in real world criminal investigations.
https://doi.org/10.5392/JKCA.2019.19.04.187 인용 PDF KSCI HTML

Handwritten Hangul Recognition by Dynamic Lattice Search with Structural Constraints (문자의 구조적 제약과 동적 격자 탐색을 이용한 필기 한글 문자 인식)

Kang, Kyung-Won;Kim, Jin-Hyung
- Annual Conference on Human and Language Technology
- /
- 2001.10d
- /
- pp.359-364
- /
- 2001
필기 한글문자 인식은 다양한 필기 변형, 자모 간의 접촉과 같은 문제들을 내포하고 있다. 최근 이를 해결하기 위한 방법으로 랜덤 그래프를 이용한 필기 한글 모델링이 제안되었으나, 상향식 정보처리의 한계인 시간 복잡도 문제를 겪고 있다. 영어 단어인식에 관한 인지과학적 연구에서는 하향식 정보처리의 주요한 역할 중 하나로 인식 과정에서의 계산 중복을 없애는 필터링의 역할을 들고 있다. 본 논문에서는 랜덤 그래프를 이용한 필기 한글 모델링을 기반으로 하여 필기체에 나타나는 다양한 변형을 흡수하며, 시간 복잡도를 해결하기 위한 한글 문자의 구조에 바탕을 둔 하향식 정보처리 방법을 제안한다. 제안하는 방법은 모델 발화를 이용한 자모 후보 추출 DP 정합과 동적 격자 탐색을 이용한 문자 후보 탐색, 그리고 문자의 구조적 제약을 이용한 후보 제거 기법을 포함한다. 필기 한글 데이터베이스인 SERI-DB에 대한 예비 실험 결과, 제안한 방법은 인식률의 큰 저하 없이 상향식 정보 처리에 바탕을 둔 기존 방법에 비해 높은 속도 향상을 가져 왔다.
PDF

A Study on the Relation among English Speech Rate, Pitch and Stress by Korean Speakers (한국인 화자의 영어 발화 속도와 피치, 강세 간의 관계 연구)

Kim, Ji-Eun
- Phonetics and Speech Sciences
- /
- v.6 no.3
- /
- pp.101-108
- /
- 2014
This study investigates the relation among pitch range differences, speech rate and realization of stress. To identify the realization of the stress, vowel formants and durational differences of stressed and unstressed vowels are measured. The Korean learners were asked to read a textbook passage which includes nine sentences. The major results indicate that: (1) Korean speakers' pitch range is less than 50% of the native speakers; (2) There is a significantly negative relation between high-low pitch range and speech rate; (3) The vowel qualities and durations of the stressed and unstressed vowels are related to the speech rate. But these are not related to the high-low pitch range.
https://doi.org/10.13064/KSSS.2014.6.3.101 인용 PDF KSCI

Effects of stuttering severity on articulation rate in fluent and dysfluent utterances of preschool children who stutter (취학 전 말더듬 아동의 말더듬 중증도에 따른 발화 형태 별 조음속도 비교)

Chon, HeeCheong;Lee, SooBok
- Phonetics and Speech Sciences
- /
- v.8 no.3
- /
- pp.79-90
- /
- 2016
The purpose of this study was to investigate the effects of stuttering severity on articulation rate measured from different types of utterances in preschool children who stutter. Participants were 40 boys who stutter (CWS) and age-matched 10 boys who do not stutter (CWNS). CWS were sub-grouped based on the severity of their stuttering: 15 mild, 13 moderate, and 12 severe. Utterances were categorized as "overall utterance" including all utterances that children spoke and "fluent utterance" which did not contain any disfluencies. Utterances containing abnormal disfluencies were categorized as "SLD utterance" for CWS. The results revealed no significant difference among groups in any type of utterance. There were significant positive correlations in articulation rates between utterance types. Stuttering severity was not a factor for characterizing the articulation rate of each type of utterance. Also, current findings suggest that articulation rate may not predict speech motor control ability in preschool CWS.
https://doi.org/10.13064/KSSS.2016.8.3.079 인용 PDF KSCI

Increase in Speaking Rate by $3{\sim}8$-year-old Korean Children (한국어 발화 속도의 연령별 증가에 관한 연구 －만 $3{\sim}8$ 세 아동을 대상으로－)

Kim, Tae-Kyung;Chang, Kyung-Hee;Lee, Phil-Young
- Speech Sciences
- /
- v.13 no.3
- /
- pp.83-95
- /
- 2006
This study attempts to suggest a criterion of Korean language development. For this purpose we investigated speaking rates of the spontaneous utterances produced by 144 children, aged 3 to 8. We analyzed each subject's speaking rate and its relevance with speaker's age, gender and utterance length. To determine the relative contributions of variables to the speaking rate, multiple regression was conducted. Results of this study can be summarized as follows: (1) The mean and maximum values of the speaking rate increased with the growth of age. (2) A statistically significant increase in speaking rate appeared at two-year intervals. (3) There was no significant difference between male and female groups in the speaking rate. (4) The multiple regression analysis has shown that along with the speaker's age, the utterance length(the mean number of syllables per utterance) is also important in estimating the speaking rates.
PDF

Comparison of overall speaking rate and pause between children with speech sound disorders and typically developing children (말소리장애 아동과 일반 아동의 발화 속도와 쉼 비교)

Lee, HeungIm;Kim, SooJin
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.111-118
- /
- 2017
This study compares speech rate, articulatory rate, and pause between the children with mild and moderate Speech Sound Disorder (SSD) who performed Sentence Repetition Tasks and the Typically Developing children (TD) of the same chronological age. The results showed that three groups are categorized in terms of speaking rate and articulatory rate. There is no difference between the two groups with SSD children, namely between the mild and moderate groups. However, there is a significant difference in their rate of speech and the articulatory rate between the two groups, such that the two groups with SSD are significantly slower than the TD group. The results also showed that there are no significant difference in the length and frequency of pause between the moderate group and the mild group. However, there is a substantial difference between them and the TD group. This study, provided the basic data for evaluating the speech rate of the children and implies that there are limitations in speech rate among the children with SSD.
https://doi.org/10.13064/KSSS.2017.9.2.111 인용 PDF KSCI

A Study of Data Augmentation and Auto Speech Recognition for the Elderly (한국어 노인 음성 데이터 증강 및 인식 연구 )

Keon Hee Kim;Seoyoon Park;Hansaem Kim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.56-60
- /
- 2023
기존의 음성인식은 청장년 층에 초점이 맞추어져 있었으나, 최근 고령화가 가속되면서 노인 음성에 대한 연구 필요성이 증대되고 있다. 그러나 노인 음성 데이터셋은 청장년 음성 데이터셋에 비해서는 아직까지 충분히 확보되지 못하고 있다. 본 연구에서는 부족한 노인 음성 데이터셋 확보에 기여하고자 희소한 노인 데이터셋을 증강할 수 있는 방법론에 대해 연구하였다. 이를 위해 노인 음성 특징(feature)을 분석하였으며, '주파수'와 '발화 속도' 특징을 일반 성인 음성에 합성하여 데이터를 증강하였다. 이후 Whisper small 모델을 파인 튜닝한 뒤 노인 음성에 대한 CER(Character Error Rate)를 구하였고, 기존 노인 데이터셋에 증강한 데이터셋을 함께 사용하는 것이 가장 효과적임을 밝혀내었다.
PDF

Search Result 127, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)