Search | Korea Science

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

Kim, Jin-Young;Eom, Ki-Wan
- Speech Sciences
- /
- v.2
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

An Aerodynamic and Acoustic Analysis of the Breathy Voice of Thyroidectomy Patients (갑상선 수술 후 성대마비 환자의 기식 음성에 대한 공기역학적 및 음향적 분석)

Kang, Young-Ae;Yoon, Kyu-Chul;Kim, Jae-Ock
- Phonetics and Speech Sciences
- /
- v.4 no.2
- /
- pp.95-104
- /
- 2012
Thyroidectomy patients may have vocal paralysis or paresis, resulting in a breathy voice. The aim of this study was to investigate the aerodynamic and acoustic characteristics of a breathy voice in thyroidectomy patients. Thirty-five subjects who have vocal paralysis after thyroidectomy participated in this study. According to perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 14). Aerodynamic analysis was conducted by three tasks (Voicing Efficiency, Maximum Sustained Phonation, Vital Capacity) and acoustic analysis was measured during Maximum Sustained Phonation task. The breathy voice group had significantly higher subglottal pressure and more pathological voice characteristics than the non breathy voice group. Showing 94.1% classification accuracy in result logistic regression of aerodynamic analysis, the predictor parameters for breathiness were maximum sound pressure level, sound pressure level range, phonation time of Maximum Sustained Phonation task and Pitch range, peak air pressure, and mean peak air pressure of Voicing Efficiency task. Classification accuracy of acoustic logistic regression was 88.6%, and five frequency perturbation parameters were shown as predictors. Vocal paralysis creates air turbulence at the glottis. It fluctuates frequency-related parameters and increases aspiration in high frequency areas. These changes determine perceptual breathiness.
https://doi.org/10.13064/KSSS.2012.4.2.095 인용 PDF

Speech Parameters for the Robust Emotional Speech Recognition (감정에 강인한 음성 인식을 위한 음성 파라메터)

Kim, Weon-Goo
- Journal of Institute of Control, Robotics and Systems
- /
- v.16 no.12
- /
- pp.1137-1142
- /
- 2010
This paper studied the speech parameters less affected by the human emotion for the development of the robust speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient and frequency warped mel-cepstral coefficient were used as feature parameters. And CMS (Cepstral Mean Subtraction) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using vocal tract length normalized mel-cepstral coefficient, its derivatives and CMS as a signal bias removal showed the best performance of 0.78% word error rate. This corresponds to about a 50% word error reduction as compare to the performance of baseline system using mel-cepstral coefficient, its derivatives and CMS.
https://doi.org/10.5302/J.ICROS.2010.16.12.1137 인용 PDF KSCI

Effect of Radiation Therapy on Voice Parameters in Early Glottic Cancer and Normal Larynx (방사선 요법이 초기 성대암 및 정상 후두의 음성 지표에 미치는 영향)

김민식;박한종;선동일;박영학;조승호
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.7 no.1
- /
- pp.32-38
- /
- 1996
The preservation of the voice-producing mechanism is an important feature in the management of laryngeal cancer by radiotherapy. But, radiation therapy has certain side effects such as mucositis, tissue edema, necrosis and fibrosis which could effect on normal voice production. Several subjective studies that used questionnaires and auditory perceptual judgements of voice have been interpreted to mean that radiation results in a normal or near-normal voice. Objective evidence of the status of vocal function after radiation treatment, however, is still lacking. We analyzed the changes that occur in voice parameters in a group of patients undergoing radiation therapy, in order to determine the effect of radiation on voice quality. In this study acoustic, aerodynamic measures of vocal function were used to determine the characteristics of voice production. We found that voice parameters in early glottic cancer changed meaningfully comparing to normal larynx with or without radiation and radiation therapy has an little effect on normal larynx.
PDF

Post-Processing of High-Speed Video-Laryngoscopic Images to Two-Dimensional Scanning Digital Kymographic Images (초고속 후두내시경 영상을 이용한 평면 스캔 비디오카이모그래피 영상 생성)

Cha, Wonjae;Wang, Soo-Geun;Jang, Jeon Yeob;Kim, Geun-Hyo;Lee, Yeon-Woo
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.28 no.2
- /
- pp.89-95
- /
- 2017
Background and Objectives : High-speed videolaryngoscopy (HSV) is the only technique that captures the true intra-cycle vibratory behavior of the vocal folds by capturing full images of the vocal folds. However, it has problems of no immediate feedback during examination, considerable waiting time for digital kymography (DKG), recording duration limited to a few seconds, and extreme demands for storage space. Herein, we demonstrate a new post-processing method that converts HSV images to two-dimensional digital kymography (2D-DKG) images, which adopts the algorithm of 2D videokymography (2D VKG). Materials and Methods : HSV system was used to capture images of vocal folds. HSV images were post-processed in Kay image-process software (KIPS), and conventional DKG images were retrieved. Custom-made post-processing system was used to convert HSV images to 2D-DKG images. The quantitative parameters of the post-processed 2D-DKG images was validated by comparing these parameters with those of the DKG images. Results : Serial HSV images for all phases of vocal fold vibratory movement are included. The images were converted by the scanning method using U-medical image-process software. Similar to conventional DKG, post-processed 2D DKG image from the HSV image can provide quantitative information on vocal fold mucosa vibration, including the various vibratory phases. Differences in amplitude symmetry index, phase symmetry index, open quotient, and close quotient between 2D-DKG and DKG were analyzed. There were no statistical differences between the quantitative parameters of vocal fold vibratory movement in 2D-DKG and DKG. Conclusion : The post-processing method of converting HSV images to 2D DKG images could provide clinical information and storage economy.
PDF

Prediction of Post-Treatment Outcome of Pathologic Voice Using Voice Synthesis (음성합성을 이용한 병적 음성의 치료 결과에 대한 예측)

이주환;최홍식;김영호;김한수;최현승;김광문
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.14 no.1
- /
- pp.30-39
- /
- 2003
Background and Objectives : Patients with pathologic voice often concern about recovery of voice after surgery. In our investigation, we give controlled values of three parameters of voice synthesis program of Dr. Speech Science. such as jitter, shimmer, and NNE(normalized noise energy) which characterize someone's voice from others and deviced a method to synthesize the predicted voice after performing operation. Subjects and Method : Values of vocal jitter, vocal shimmer, and glottal noise were measured with voices of 10 vocal cord Paralysis and 10 vocal Polyp Patients 1 week Prior to and 1 month after the surgery. With Dr. Speech science voice synthesis program we synthesized 'ae' vowel which is closely identical to preoperative and post-operative voice of the patients by controlling the values of jitter, shimmer, and glottal noise. then we analyzed the synthesized voices and compared with pre and post-operative voice. Results : 1) After inputting the preoperative and corrected values of jitter, shimmer, and glottal noise into the voice synthesis Program, voices identical to vocal Polyp Patients' Pre- and Postoperative voices withiin statistical significance were synthesized 2) After elimination of synergistic effects between three paramenter, we were able to synthesize voice identical to vocal paralysis patients' preoperative voices. 3) After inputting only slightly increased jitter, shimmer into the synthesis program, we were able to synthesize voice identical to vocal cord paralysis patients' postoperative voices. Conclusion : Voices synthesized with Dr. Speech science program were identical to patients' actual pre and postoperative voice, and clinicians will be able to give the patients more information and thus increased patients cooperability can be expected.
PDF

Development of Two-Dimensional Scanning Videokymography for Analysis of Vocal Fold Vibration

Wang, Soo-Geun;Lee, Byung-Joo;Lee, Jin-Choon;Lim, Yun-Sung;Park, Young Min;Park, Hee-June;Roh, Jung-Hoon;Jeon, Gye-Rok;Kwon, Soon-Bok;Shin, Bum-Joo
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.24 no.2
- /
- pp.107-111
- /
- 2013
Objectives : We developed two-dimensional (2D) scanning videokyomography to evaluate the mucosal wave of whole vocal cords in real time to overcome the limit of preexisting stroboscopy and line scanning videokymography which could not evaluate it. Methods : We implemented a continuous light source with high brightness, a high-definition CMOS camera, and capture board for saving the data. We created the software program to analyze the image data from the system. The test of the functionality of the 2D scanning videokymography camera was performed in one of the authors (P.H.J 32 years old male). Vocal cord images were obtained during normal phonation and falsetto phonation. Images were obtained also during cough, diplophonia. Results : The system made it possible to measure objective parameters, including fundamental frequency, amplitude, regularity, mucosal wave, and phase difference, medial and lateral peak, opening versus closing duration related to vocal fold vibration. Simultaneously, it enabled analysis of the whole mucosal wave of the entire vocal fold in real time. 2D scanning videokymography was also effective for evaluating the dynamic status of the vocal fold when the subject phonated aperiodic voice. Conclusion : In conclusion, 2D scanning videokymography can support the analysis of the whole mucosal wave of the entire vocal cord with objective vocal parameters, overcoming the limitations of stroboscopy and previous line scanning videokymography techniques.
PDF

The Efficacy of the Bel canto Singing Technique as a Method of Improving Voice Quality of Vocal Bowing Sulcus Vocalis

Yoo, Jae-Yeon;Seo, Dong-Il
- Phonetics and Speech Sciences
- /
- v.3 no.4
- /
- pp.103-108
- /
- 2011
The purpose of this study was to investigate the effects of the Bel canto singing technique on voice quality in patients with vocal bowing and sulcus vocalis. Five patients with vocal bowing, and five patients with sulcus vocalis participated in the study. Each subject was assessed acoustically (Jitter, Shimmer, NNE) in the first and last session. Dr. Speech (version 4.0, Tiger-DRS) was used to compare acoustic parameters of pre- and post-treatment. The Bel canto singing technique consisted of breathing exercises, relaxation exercises, and phonation exercises. The results showed that the Bel canto singing technique tended to be effective on improving voice quality in patients with organic voice disorders.
PDF

Predictive Factors for the Efficacy of Voice Therapy for Pediatric Vocal Fold Nodule (소아 성대결절의 음성치료 효과에 미치는 예후 인자)

Yun, Chang Bin;Kim, Young-Mo;Choi, Jeong-Seok;Kim, Ji Won
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.32 no.3
- /
- pp.130-134
- /
- 2021
Background and Objectives Voice therapy (VT) is considered to be the gold standard of treatment of vocal fold nodule in children. This study was designed to analyze the success rate of pediatric VT and investigate the predictive factors for good response of periatic VT for vocal fold nodule. Materials and Method This was a retrospective cohort study of 23 patients under 18 years old who were diagnosed with vocal fold nodule and received pediatric VT. We divided the patients into responding and non-responding groups. We analyzed clinical and voice parameters related to the voice results. Results Twelve patients showed improved findings after VT. By univariate analysis, female patients (85.7%) and adolescence children (100%) showed a good response to VT. In multivariate analysis, female sex (p<0.05) and adolescence children (p<0.05) were significantly related to a successful voice response. Proton pump inhibitor or antihistamine, mucolytics treatment and pre-VT voice parameters did not significantly influence voice outcomes. Conclusion Pediatric VT is more effective in female and adolescence children.
https://doi.org/10.22469/jkslp.2021.32.3.130 인용 PDF KSCI

Differentiation of Vocal Cyst and Polyp by High-Piched Phonation Characteristics (성대낭종과 성대폴립 간의 고음발성 양상의 차이)

Lee, Jong-Ik;Jeong, Go-Eun;Kim, Seong-Tae;Kim, Sang-Yeon;Nam, Soon-Yuhl;Kim, Sang-Yoon;Roh, Jong-Lyel;Choi, Seung-Ho
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.23 no.1
- /
- pp.48-51
- /
- 2012
Background and Objectives : Vocal fold cyst is generally treated by surgical resection, it has a difference with vocal fold polyp, treated by conservative management first. Decrease in mucosal waves is known as main diagnostic criteria of vocal fold cyst. Sometimes there is a difficulty for diffrential diagnosis between cyst and polyp only by endoscopic examination. The purpose of the study is to identify the objective features of vocal cyst and polyp on the basis of voice analysis for the proper differential diagnosis, especially at high pitched phonation. Materials and Method : The voice analysis was done in 15 focal fold cyst patients and 42 vocal fold polyp. Parameters of perceptual assessment, acoustic and aerodynamic measure, and voice range profile were compared between two groups. Results : Vocal fold cyst patients showed significantly reduced MPT by acoustic and aerodynamic analysis, narrowed frequency-range and low maximun frequency by voice range profile analysis compared with vocal fold polyp patient. Maximun frequency 381 Hz is established for cut off value, differential diagnosis between cyst and polyp (ROC analysis, sensitivity 60%, specificity 68%). Conclusion : Voice analysis is helpful for differential diagnosis between vocal fold cyst and polyp, especially there is a difficulty for distinguish cyst from polyp at clinical situation by endoscopic examination. The result of decreased maximum frequncy at vocal fold cyst supports incomplete high-pitched phonation and falsetto regester at vocal fold cyst patients due to decreased mucosal wave, compared with vocal fold polyp patients.
PDF

Search Result 172, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)