• Title/Summary/Keyword: synthetic voice

Search Result 29, Processing Time 0.019 seconds

Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis (HMM 기반 감정 음성 합성기 개발을 위한 감정 음성 데이터의 음색 유사도 분석)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.9
    • /
    • pp.5763-5768
    • /
    • 2014
  • Maintaining a voice color is important when compounding both the normal voice because an emotion is not expressed with various emotional voices in a single synthesizer. When a synthesizer is developed using the recording data of too many expressed emotions, a voice color cannot be maintained and each synthetic speech is can be heard like the voice of different speakers. In this paper, the speech data was recorded and the change in the voice color was analyzed to develop an emotional HMM-based speech synthesizer. To realize a speech synthesizer, a voice was recorded, and a database was built. On the other hand, a recording process is very important, particularly when realizing an emotional speech synthesizer. Monitoring is needed because it is quite difficult to define emotion and maintain a particular level. In the realized synthesizer, a normal voice and three emotional voice (Happiness, Sadness, Anger) were used, and each emotional voice consists of two levels, High/Low. To analyze the voice color of the normal voice and emotional voice, the average spectrum, which was the measured accumulated spectrum of vowels, was used and the F1(first formant) calculated by the average spectrum was compared. The voice similarity of Low-level emotional data was higher than High-level emotional data, and the proposed method can be monitored by the change in voice similarity.

The Rule of Korean Pitch Variation for a Natural Synthetic Female Voice (자연스러운 여성 합성음을 위한 한국어의 피치 변화 법칙)

  • Kim, Chung-Won;Park, Dae-Duck;Kim, Boh-Hyun;Kwon, Cheol-Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.26-32
    • /
    • 1996
  • In this paper we make a rule of pitch variation for a natural synthetic female voice. Intonation phrase, which is the basic unit the rule is applied to, mostly consists of a syllable or syllables. The pitch values of the first, second, and final syllables make up the pitch contour of the intonation phrase. Those of the first and second syllable are determined by the initial consonants of the respective syllables, and that of the final syllable by the type of the function word. There are two kinds of boundaries between intonation phrases. One is a boundary with pause, and the other is a boundary without pause. The pitch contour of the intonation phrase with the boundary phenomena determines the pitch pattern of a sentence.

  • PDF

Control of Duration Model Parameters in HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 지속시간 모델 파라미터 제어)

  • Kim, Il-Hwan;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.97-105
    • /
    • 2008
  • Nowadays an HMM-based text-to-speech system (HTS) has been very widely studied because it needs less memory and low computation complexity and is suitable for embedded systems in comparison with a corpus-based unit concatenation text-to-speech one. It also has the advantage that voice characteristics and the speaking rate of the synthetic speech can be converted easily by modifying HMM parameters appropriately. We implemented an HMM-based Korean text-to-speech system using a small size Korean speech DB and proposes a method to increase the naturalness of the synthetic speech by controlling duration model parameters in the HMM-based Korean text-to speech system. We performed a paired comparison test to verify that theses techniques are effective. The test result with the preference scores of 73.8% has shown the improvement of the naturalness of the synthetic speech through controlling the duration model parameters.

  • PDF

Limitations of Spectrogram Analysis for Smartphone Voice Recording File Forgery Detection (스마트폰 음성 녹음 파일 위변조 검출을 위한 스펙트로그램 분석의 한계점)

  • Sangmin Han;Yeongmin Son;Jae Wan Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.2
    • /
    • pp.545-551
    • /
    • 2023
  • As digital information is readily available to everyone today, the adoption of digital evidence is increasing. However, it is virtually impossible to determine the authenticity of forgery in the case of a voice recording file that has gone through a sophisticated editing process along with the spread of various voice file editing tools. This study aims to prove that forgery, which is difficult to distinguish from the original file, is possible by using insertion, deletion, linking, and synthetic editing technologies in voice recording files. This study presents the difficulty of detecting forgery by encoding a forged voice file with the same extension as the original. In addition, it was shown that forgery detection is impossible if additional transition band deletion and secondary encoding are performed only for experiments in which features occurred. Through this, this study is expected to contribute to the establishment of more stringent evidence admissibility criteria for adopting voice recording files as digital evidence.

Surgery of Benign Laryngeal Mucosal Lesions (후두 양성점막 병변의 수술적 치료)

  • Jin, Sung Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.24 no.2
    • /
    • pp.83-87
    • /
    • 2013
  • The term "phonosurgery," coined in the early 1960s, refers to surgical procedures that maintain, restore, or enhance the human voice. Phonosurgery includes phonomicrosurgery (endoscopic microsurgery of the vocal folds), laryngoplastic phonosurgery (open-neck surgery that restructures the cartilaginous framework of the larynx and the soft tissues), laryngeal injection (injection of medications as well as synthetic and organic biologic substances), and reinnervation of the larynx. Phonomicrosurgery is a means of maximally preserving the layered microstructure of the vocal fold, that is, the epithelium and lamina propria. The purpose of the surgery is usually to improve the vibratory characteristics of the layered microstructure of the vocal folds. Phonomicrosurgery has developed from convergence of microlaryngoscopic surgical technique theory and the mucosal wave theory of laryngeal sound production. Improvements in technology (i.e., laryngoscopes, handled instruments, and lasers), which in part arise from developments in more frequently performed minimally invasive surgical procedures, will probably facilitate the next generation of procedural innovations. The best methods of optimizing phonosurgical outcomes include making an accurate diagnosis, completing a comprehensive voice evaluation, providing sufficient preoperative therapy, carefully selecting patients to undergo phonomicrosurgical procedures, and requiring sufficient postoperative rest and therapy. Phonomicrosurgery will continue to evolve as a result of the interdependent collaboration of surgeons with voice scientists, speech pathologist, and other voice professionals.

  • PDF

Synthetic Speech Quality Improvement By Glottal parameter Interpolation - Preliminary study on open quotient interpolation in the speech corpus - (성대특성 보간에 의한 합성음의 음질향상 - 음성코퍼스 내 개구간 비 보간을 위한 기초연구 -)

  • Bae, Jae-Hyun;Oh, Yung-Hwa
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.63-66
    • /
    • 2005
  • For the Large Corpus based TTS the consistency of the speech corpus is very important. It is because the inconsistency of the speech quality in the corpus may result in a distortion at the concatenation point. And because of this inconsistency, large corpus must be tuned repeatedly One of the reasons for the inconsistency of the speech corpus is the different glottal characteristics of the speech sentence in the corpus. In this paper, we adjusted the glottal characteristics of the speech in the corpus to prevent this distortion. And the experimental results are showed.

  • PDF

Real Time Implementation of a Korean Speech Synthesizer (한국어 음성합성기의 실시간 구현에 관한 연구)

  • 임광일;이규태;조철우;이우선;신인철;이태원
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.2
    • /
    • pp.176-181
    • /
    • 1988
  • In this paper, the LPC speech synthesizer with Multipulsse excitation is implemented using general-purpose DSP \ulcornerD7720. As the driving function for synthesis filter is used in the amplitude and position of pulse, the Voice/Unvoice decision and pitch period detectioncan be excluded. The synthesizer is implemented with DSP device which is operated on the interrupt mehtod with main computer and on the DMA mehtod with D/A converter. The comparision of synthetic and original waveform, alogn with the listening test, proves the validity of this system.

  • PDF

Phonosurgery after Laser Cordectomy (레이저 성문절제술 후의 음성수술)

  • So, Yoon-Kyung;Son, Young-Ik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.19 no.1
    • /
    • pp.11-15
    • /
    • 2008
  • Endoscopic laser cordectomy is known as an oncologically sound procedure for T1 and selected T2 glottic carcinoma ; it has comparable local control rate and better long-term laryngeal preservation rate when compared with those of radiotherapy. Even if results of the reported voice outcome studies after surgery or radiotherapy are diverse and controversial, resection deeper than the body layer of the vocal fold (type III, IV, V cordectomy) usually leads to aerodynamic insufficiency during phonation and results in poor voice quality. A keyhole defect or development of synechiae at the anterior commissure after type VI cordecomy may also result in unsatisfactory vocal outcome. However, many advances in phonosurgical techniques are reported to be successfully applied in the reconstruction of glottal defect that is subsequent to endoscopic laser cordectomy. In case of glottal insufficiency, voice restoration can be achieved by means of augmentation of the paraglottic space or medialization of the excavated vocal fold. Injection laryngoplasty with synthetic materials or autologous fat is gaining its popularity for restoring minor glottal volume defect because of its convenience. Laryngeal framework surgery, especially type I thyroplasty with premade implant systems or Gore-Tex, is most frequently used to correct larger glottic volume defect. In case of anterior commissural keyhole defect, additional procedure including laryngofissure may be required. For anterior commissural synechiae, laryngeal keel may be inserted for several weeks or mitomycin-C may be repeatedly applied after the division of adhesive scar to prevent restenosis. In this paper, current concepts and the authors' experiences of phonosurgical reconstruction of vocal function after endoscopic cordectomy will be introduced.

  • PDF

A comparison of CPP analysis among breathiness ranks (기식 등급에 따른 CPP (Cepstral Peak Prominence) 분석 비교)

  • Kang, Youngae;Koo, Bonseok;Jo, Cheolwoo
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.21-26
    • /
    • 2015
  • The aim of this study is to synthesize pathological breathy voice and to make a cepstral peak prominence (CPP) table following breathiness ranks by cepstral analysis to supplement reliability of the perceptual auditory judgment task. KlattGrid synthesizer included in Praat was used. Synthesis parameters consist of two groups, i.e., constants and variables. Constant parameters are pitch, amplitude, flutter, open phase, oral formant and bandwidth. Variable parameters are breathiness (BR), aspiration amplitude (AH), and spectral tilt (TL). Five hundred sixty samples of synthetic breathy vowel /a/ for male were created. Three raters participated in ranking of the breathiness. 217 were proved to be inadequate samples from perceptual judgment and cepstral analysis. Finally, 343 samples were selected. These CPP values and other related parameters from cepstral analysis are classified under four breathiness ranks (B0~B3). The mean and standard deviation of CPP is $16.10{\pm}1.15$ dB(B0), $13.68{\pm}1.34$ dB(B1), $10.97{\pm}1.41$ dB(B2), and $3.03{\pm}4.07$ dB(B3). The value of CPP decreases toward the severe group of breathiness because there is a lot of noise and a small quantity of harmonics.

Spectral Characteristics and Formant Bandwidths of English Vowels by American Males with Different Speaking Styles (발화방식에 따른 미국인 남성 영어모음의 스펙트럼 특성과 포먼트 대역)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.91-99
    • /
    • 2014
  • Speaking styles tend to have an influence on spectral characteristics of produced speech. There are not many studies on the spectral characteristics of speech because of complicated processing of too much spectral data. The purpose of this study was to examine spectral characteristics and formant bandwidths of English vowels produced by nine American males with different speaking styles: clear or conversational styles; high- or low-pitched voices. Praat was used to collect pitch-corrected long-term averaged spectra and bandwidths of the first two formants of eleven vowels in the speaking styles. Results showed that the spectral characteristics of the vowels varied systematically according to the speaking styles. The clear speech showed higher spectral energy of the vowels than that of the conversational speech while the high-pitched voice did the same over the low-pitched voice. In addition, front and back vowel groups showed different spectral characteristics. Secondly, there was no statistically significant difference between B1 and B2 in the speaking styles. B1 was generally lower than B2 when reflecting the source spectrum and radiation effect. However, there was a statistically significant difference in B2 between the front and back vowel groups. The author concluded that spectral characteristics reflect speaking styles systematically while bandwidths measured at a few formant frequency points do not reveal style differences properly. Further studies would be desirable to examine how people would evaluate different sets of synthetic vowels with spectral characteristics or with bandwidths modified.