• Title/Summary/Keyword: Voice synthesis

Search Result 102, Processing Time 0.03 seconds

Computerization and Application of Hangeul Standard Pronunciation Rule (음성처리를 위한 표준 발음법의 전산화)

  • 이계영
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1363-1366
    • /
    • 2003
  • This paper introduces computerized version of Hangout(Korean Language) Standard Pronunciation Rule that can be used in Korean processing systems such as Korean voice synthesis system and Korean voice recognition system. For this purpose, we build Petri net models for each items of the Standard Pronunciation Rule, and then integrate them into the vocal sound conversion table. The reversion of Hangul Standard Pronunciation Rule regulates the way of matching vocal sounds into grammatically correct written characters. This paper presents not only the vocal sound conversion table but also character conversion table obtained by reversely converting the vocal sound conversion table. Making use of these tables, we have implemented a Hangeul character into a vocal sound system and a Korean vocal sound into character conversion system, and tested them with various data sets reflecting all the items of the Standard Pronunciation Rule to verify the soundness and completeness of our tables. The test results shows that the tables improves the process speed in addition to the soundness and completeness.

  • PDF

A Study on the Formant Comparison of Korean Monophthongs according to Age and Gender -A Survey on Patients in Oriental Hospitals- (연령 및 성별에 따른 한국인 단모음 포먼트 비교에 관한 연구 -한방병원 내원환자를 중심으로-)

  • Kim, Young-Su;Kim, Keun Ho;Kim, Jong Yeol;Jang, Jun-Su
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.73-80
    • /
    • 2013
  • Formant is one of the essential vocal features for research of voice production, recognition and synthesis. Numerous studies were established on foreign languages including English vowels. However, studies related to Korean were done with a limited number of voice data. In this study, we compare four formants according to age and gender using a large number of Korean monophthongs. A total of 2614 Korean speakers participated in our experiments. We summarize statistical results by mean and standard deviation for each formant of five monophthongs. The results show a notable difference in each age and gender group. A quantitative study based on a large dataset is suggested for future studies on Korean speech sounds.

A Study on Individual Pitch Pulse using FIR-STREAK Filter in Speech Coding Method (음성부호화 방식에 있어서 FIR-STREAK 필터를 사용한 개별 피치펄스에 관한 연구)

  • Lee See-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.4 no.4
    • /
    • pp.65-70
    • /
    • 2004
  • In this paper, I propose a new extraction method of Individual Pitch Pulse in order to accommodate the changes in each pitch interval and reduce pitch errors in Speech Coding. The extraction rate of individual pitch pulses was $96\%$ for male voice and $85\%$ for female voice respectively. This method has the capability of being applied to many fields, such as speech coding, speech analysis, speech synthesis and speech recognition.

  • PDF

A performance evaluation study of a deep learning-based voice synthesis technique using Mel-Conceptual Distortion (MCD). (멜-셉스트럴 왜곡(MCD)를 활용한 딥러닝 기반 목소리 합성 기술의 성능 평가 연구)

  • Jaesang Han;Yunseo Kang;Sangwoo Na;Hayeon Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.488-489
    • /
    • 2023
  • 노래 음성 변환(Singing Voice Conversion, SVC)은 오디오 처리 분야에서 최근 활발히 연구되는 분야 중 하나로, 원래의 멜로디와 가사를 유지하면서 소스 가수의 노래 음성을 대상 가수의 음성으로 변환하는 것을 목표로 한다. 본 논문에서는 딥러닝 기반 SVC 모델을 중심으로 멜 셉스트럴 왜곡 지표를 활용해 모델 간 성능 평가를 진행한다. 이를 통해 엔터테인먼트, 교육 등 분야에서 효율적인 SVC 모델을 찾아 활용할 수 있을 것이다.

Design and Implementation of Simple Text-to-Speech System using Phoneme Units (음소단위를 이용한 소규모 문자-음성 변환 시스템의 설계 및 구현)

  • Park, Ae-Hee;Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.49-60
    • /
    • 1995
  • This paper is a study on the design and implementation of the Korean Text-to-Speech system which is used for a small and simple system. In this paper, a parameter synthesis method is chosen for speech syntheiss method, we use PARCOR(PARtial autoCORrelation) coefficient which is one of the LPC analysis. And we use phoneme for synthesis unit which is the basic unit for speech synthesis. We use PARCOR, pitch, amplitude as synthesis parameter of voice, we use residual signal, PARCOR coefficients as synthesis parameter of unvoice. In this paper, we could obtain the 60% intelligibility by using the residual signal as excitation signal of unvoiced sound. The result of synthesis experiment, synthesis of a word unit is available. The controlling of phoneme duration is necessary for synthesizing of a sentence unit. For setting up the synthesis system, PC 486, a 70[Hz]-4.5[KHz] band pass filter for speech input/output, amplifier, and TMS320C30 DSP board was used.

  • PDF

An ACLMS-MPC Coding Method Integrated with ACFBD-MPC and LMS-MPC at 8kbps bit rate. (8kbps 비트율을 갖는 ACFBD-MPC와 LMS-MPC를 통합한 ACLMS-MPC 부호화 방식)

  • Lee, See-woo
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.1-7
    • /
    • 2018
  • This paper present an 8kbps ACLMS-MPC(Amplitude Compensation and Least Mean Square - Multi Pulse Coding) coding method integrated with ACFBD-MPC(Amplitude Compensation Frequency Band Division - Multi Pulse Coding) and LMS-MPC(Least Mean Square - Multi Pulse Coding) used V/UV/S(Voiced / Unvoiced / Silence) switching, compensation in a multi-pulses each pitch interval and Unvoiced approximate-synthesis by using specific frequency in order to reduce distortion of synthesis waveform. In integrating several methods, it is important to adjust the bit rate of voiced and unvoiced sound source to 8kbps while reducing the distortion of the speech waveform. In adjusting the bit rate of voiced and unvoiced sound source to 8 kbps, the speech waveform can be synthesized efficiently by restoring the individual pitch intervals using multi pulse in the representative interval. I was implemented that the ACLMS-MPC method and evaluate the SNR of APC-LMS in coding condition in 8kbps. As a result, SNR of ACLMS-MPC was 15.0dB for female voice and 14.3dB for male voice respectively. Therefore, I found that ACLMS-MPC was improved by 0.3dB~1.8dB for male voice and 0.3dB~1.6dB for female voice compared to existing MPC, ACFBD-MPC and LMS-MPC. These methods are expected to be applied to a method of speech coding using sound source in a low bit rate such as a cellular phone or internet phone. In the future, I will study the evaluation of the sound quality of 6.9kbps speech coding method that simultaneously compensation the amplitude and position of multi-pulse source.

VoiceXML Dialog System Based on RSS for Contents Syndication (콘텐츠 배급을 위한 RSS 기반의 VoiceXML 다이얼로그 시스템)

  • Kwon, Hyeong-Joon;Kim, Jung-Hyun;Lee, Hyon-Gu;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.14B no.1 s.111
    • /
    • pp.51-58
    • /
    • 2007
  • This paper suggests prototype of dialog system combining VXML(VoiceXML) that is the W3C's standard XML format for specifying interactive voice dialogues between human and computer, and RSS(RDF Site Summary or Really Simple Syndication) that is representative technology of semantic web for syndication and subscription of updated web-contents. Merits of the proposed system are as following: 1) It is a new method that recognize spoken contents using ire and wireless telephone networks and then provide contents to user via STT(Speech-to-Text) and TTS(Text-to-Speech) instead of traditional method using web only. 2) It can apply advantage of RSS that subscription of updated contents is converted to VXML without modifying traditional method to provide RSS service, 3) In terms of users, it can reduce restriction on time-spate in search of contents that is provided by RSS because it uses ire and wireless telephone networks, not internet environment. 4) In terms of information provider, it does not need special component for syndication of the newest contents using speech recognition and synthesis technology. We implemented a news service system using VXML and RSS for performance evaluation of the proposed system. In experiment results, we estimated the response time and the speech recognition rate in subscription and search of actuality contents, and confirmed that the proposed system can provide contents those are provided using RSS Feed.

Gender Analysis in Elderly Speech Signal Processing (노인음성신호처리에서의 젠더 분석)

  • Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.16 no.10
    • /
    • pp.351-356
    • /
    • 2018
  • Changes in vocal cords due to aging can change the frequency of speech, and the speech signals of the elderly can be automatically distinguished from normal speech signals through various analyzes. The purpose of this study is to provide a tool that can be easily accessed by the elderly and disabled people who can be excluded from the rapidly changing technological society and to improve the voice recognition performance. In the study, the gender of the subjects was reported as sex analysis, and the number of female and male voice samples was used equally. In addition, the gender analysis was applied to set the voices of the elderly without using voices of all ages. Finally, we applied a review methodology of standards and reference models to reduce gender difference. 10 Korean women and 10 men aged 70 to 80 years old are used in this study. Comparing the F0 value extracted directly with the waveform and the F0 extracted with TF32 and the Wavesufer speech analysis program, Wavesufer analyzed the F0 of the elderly voice better than TF32. However, there is a need for a voice analysis program for elderly people. In conclusions, analyzing the voice of the elderly will improve speech recognition and synthesis capabilities of existing smart medical systems.

Real Time Implementation of a Korean Speech Synthesizer (한국어 음성합성기의 실시간 구현에 관한 연구)

  • 임광일;이규태;조철우;이우선;신인철;이태원
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.2
    • /
    • pp.176-181
    • /
    • 1988
  • In this paper, the LPC speech synthesizer with Multipulsse excitation is implemented using general-purpose DSP \ulcornerD7720. As the driving function for synthesis filter is used in the amplitude and position of pulse, the Voice/Unvoice decision and pitch period detectioncan be excluded. The synthesizer is implemented with DSP device which is operated on the interrupt mehtod with main computer and on the DMA mehtod with D/A converter. The comparision of synthetic and original waveform, alogn with the listening test, proves the validity of this system.

  • PDF

Sensor Control and Aquisition Information Using Voice I/O (음성 입출력을 이용한 센서 제어 및 정보 획득)

  • Youn, Hyung Jin;Lee, Chang Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.495-496
    • /
    • 2018
  • As more and more companies introduce artificial intelligent(AI) speakers, the price of the speakers has become a burden to someone. Based on some knowledge and dexterity, it is not difficult to make an AI speaker that acquires sensor information and environmental information of the house in accordance with your own taste. In this paper, we implement an AI speaker using Raspberry Pie, Google Cloud Speech (GCS) and Naver's Clova Speech Synthesis (CSS) API.

  • PDF