• Title/Summary/Keyword: Voice Synthesis

Search Result 103, Processing Time 0.029 seconds

Communication Support System for ALS Patient Based on Text Input Interface Using Eye Tracking and Deep Learning Based Sound Synthesi (눈동자 추적 기반 입력 및 딥러닝 기반 음성 합성을 적용한 루게릭 환자 의사소통 지원 시스템)

  • Park Hyunjoo;Jeong Seungdo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.27-36
    • /
    • 2024
  • Accidents or disease can lead to acquired voice dysphonia. In this case, we propose a new input interface based on eye movements to facilitate communication for patients. Unlike the existing method that presents the English alphabet as it is, we reorganized the layout of the alphabet to support the Korean alphabet and designed it so that patients can enter words by themselves using only eye movements, gaze, and blinking. The proposed interface not only reduces fatigue by minimizing eye movements, but also allows for easy and quick input through an intuitive arrangement. For natural communication, we also implemented a system that allows patients who are unable to speak to communicate with their own voice. The system works by tracking eye movements to record what the patient is trying to say, then using Glow-TTS and Multi-band MelGAN to reconstruct their own voice using the learned voice to output sound.

Noise-Robust Speech Detection Using The Coefficient of Variation of Spectrum (스펙트럼의 변동계수를 이용한 잡음에 강인한 음성 구간 검출)

  • Kim Youngmin;Hahn Minsoo
    • MALSORI
    • /
    • no.48
    • /
    • pp.107-116
    • /
    • 2003
  • This paper deals with a new parameter for voice detection which is used for many areas of speech engineering such as speech synthesis, speech recognition and speech coding. CV (Coefficient of Variation) of speech spectrum as well as other feature parameters is used for the detection of speech. CV is calculated only in the specific range of speech spectrum. Average magnitude and spectral magnitude are also employed to improve the performance of detector. From the experimental results the proposed voice detector outperformed the conventional energy-based detector in the sense of error measurements.

  • PDF

Face-to-face Communication in Cyberspace using Analysis and Synthesis of Facial Expression

  • Shigeo Morishima
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1999.06a
    • /
    • pp.111-118
    • /
    • 1999
  • Recently computer can make cyberspace to walk through by an interactive virtual reality technique. An a avatar in cyberspace can bring us a virtual face-to-face communication environment. In this paper, an avatar is realized which has a real face in cyberspace and a multiuser communication system is constructed by voice transmitted through network. Voice from microphone is transmitted and analyzed, then mouth shape and facial expression of avatar are synchronously estimated and synthesized on real time. And also an entertainment application of a real-time voice driven synthetic face is introduced and this is an example of interactive movie. Finally, face motion capture system using physics based face model is introduced.

An Implementation of Speech DB Gathering System Using VoiceXML (VoiceXML을 이용한 음성 DB 수집 시스템 구현)

  • Kim Dong-Hyun;Roh Yong-Wan;Hong Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.6 no.1
    • /
    • pp.39-50
    • /
    • 2005
  • Speech DB is basically required factor when we are study for phonetics, speech recognition and speech synthesis and so on. The quantity and quality of speech DB decide the efficiency of system that we develop. therefore. speech DB has an extremely important factor, Recently, development of the various telephone service technique such as voice portal. it is actual condition where the necessity of collection of telephone speech DB. The existing IVR application telephone speech DB collection system used C/C++ language or the exclusive development tool. Thus it is the actual condition where the recycle of each application service for resources is difficult and have a problem of many labors and time necessity. But. VoiceXML is a language having tag form ipredicated in XML. which has easy and simple grammar system. Therefore, if we make a few efforts we could draw up easily. it has a merit reducing labors and time, Also, VoiceXML has many advantages of various telephone speech DB gathering because of changing contents of DB. In this paper, we introduce telephone speech DB gathering system which is the mast important factor for development of speech information processing technique.

  • PDF

A Study on Extraction of Pitch and TSIUVC in Continuous Speech (연속음성신호에서 피치와 TSIUVC 추출에 관한 연구)

  • Lee See-Woo
    • Journal of Internet Computing and Services
    • /
    • v.6 no.4
    • /
    • pp.85-92
    • /
    • 2005
  • In this paper, I propose a new extraction method Pitch Pulse and TSIUVC in continuous speech, The TSIUVC searching and extraction method is based on a zero-crossing rate and individual Pitch Pulse extraction method using FIR-STREAK filter. As a result, the extraction rate of individual pitch pulses was $96{\%}$ for male voice and $85{\%}$ for female voice respectively. The TSIUVC extraction rates are $94.9{\%}$ under $88{\%}$ for male voice and $94.9{\%}$ under $84.8{\%}$ for female voice. This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis and speech synthesis.

  • PDF

A Study on Multi-Pulse Speech Coding Method by Using V/S/TSIUVC (V/S/TSIUVC를 이용한 멀티펄스 음성부호화 방식에 관한 연구)

  • Lee See-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.9
    • /
    • pp.1233-1239
    • /
    • 2004
  • In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. This paper present a new multi-pulse coding method by using V/S/TSIUVC switching, individual pitch pulses and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice respectively. The important thing is that the frequency information of 0.347kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluate the MPC use V/UV and the FBD-MPC use V/S/TSIUVC. As a result, I knew that synthesis speech of the FBD-MPC was better in speech quality than synthesis speech of the MPC.

  • PDF

A Study on ACFBD-MPC in 8kbps (8kbps에 있어서 ACFBD-MPC에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.7
    • /
    • pp.49-53
    • /
    • 2016
  • Recently, the use of signal compression methods to improve the efficiency of wireless networks have increased. In particular, the MPC system was used in the pitch extraction method and the excitation source of voiced and unvoiced to reduce the bit rate. In general, the MPC system using an excitation source of voiced and unvoiced would result in a distortion of the synthesis speech waveform in the case of voiced and unvoiced consonants in a frame. This is caused by normalization of the synthesis speech waveform in the process of restoring the multi-pulses of the representation segment. This paper presents an ACFBD-MPC (Amplitude Compensation Frequency Band Division-Multi Pulse Coding) using amplitude compensation in a multi-pulses each pitch interval and specific frequency to reduce the distortion of the synthesis speech waveform. The experiments were performed with 16 sentences of male and female voices. The voice signal was A/D converted to 10kHz 12bit. In addition, the ACFBD-MPC system was realized and the SNR of the ACFBD-MPC estimated in the coding condition of 8kbps. As a result, the SNR of ACFBD-MPC was 13.6dB for the female voice and 14.2dB for the male voice. The ACFBD-MPC improved the male and female voice by 1 dB and 0.9 dB, respectively, compared to the traditional MPC. This method is expected to be used for cellular telephones and smartphones using the excitation source with a low bit rate.

Virtual Reality based Situation Immersive English Dialogue Learning System (가상현실 기반 상황몰입형 영어 대화 학습 시스템)

  • Kim, Jin-Won;Park, Seung-Jin;Min, Ga-Young;Lee, Keon-Myung
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.6
    • /
    • pp.245-251
    • /
    • 2017
  • This presents an English conversation training system with which learners train their conversation skills in English, which makes them converse with native speaker characters in a virtual reality environment with voice. The proposed system allows the learners to talk with multiple native speaker characters in varous scenarios in the virtual reality environment. It recongizes voices spoken by the learners and generates voices by a speech synthesis method. The interaction with characters in the virtual reality environment in voice makes the learners immerged in the conversation situations. The scoring system which evaluates the learner's pronunciation provides the positive feedback for the learners to get engaged in the learning context.

Formation of A Phonetic-Value Look-up Table for Korean Voice Synthesis (한국어 음성 합성을 위한 음가 변환 테이블 생성)

  • Lee, Gye-Young;Yim, Jae-Geol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.5
    • /
    • pp.44-57
    • /
    • 2001
  • In order to synthesize grammatically correct Korean voices, we have to refer to the 'Standard Pronunciation Rules(SPR)' stated in the 'Standard Grammar of Korean Language.' Therefore, the rules that is used for a Korean-voice-synthesis system to find Korean voices corresponding to a given Korean sentence must completely reflect the SPR and must be sound. However, in the field of computer science they have just used the SPR without proving the completeness and soundness of their rules. In this paper, we construct a Petri net model for each rule of SPR, integrate all the Petri net models to build one big Petri net completely representing SPR, and analyse the Petri net to prove the consistency of it. Then, we transfer the Petri net model into a look-up table for Korean voice. Using this table, we can avoid the drawbacks of existing approaches such as going through several stages or repetitively applying a converting process.

  • PDF

A study on Speech Coding Method using V/S/TSIUVC Switching (V/S/TSIUVC 스위칭을 이용한 음성부호화 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.6
    • /
    • pp.1180-1184
    • /
    • 2006
  • In a speech coding system using excitation source of voiced and unvoiced, it would be a distortion of speech quality in a voiced and an unvoiced consonants in a frame. In this paper, I propose a new multi-pulse coding method make use of V/S/TSIUVC switching and TSIUVC approximation-synthesis method in order to restrict a distortion of speech quality. The TSIUVC is extracted by using the zero crossing rate and individual pitch pulse. And the TSIUVC extraction rate was 91% for female voice and 96.2% for male voice. The important thing is that the frequency information of 0.547kHz below and 2.813kHz above can be made with high quality synthesis waveform within TSIUVC. I evaluated the MPC of V/UV and FBD-MPC of V/S/TSIUVC. As a result, the synthesis speech of FBD-MPC was better in speech quality than the MPC.

  • PDF