• Title/Summary/Keyword: text-to-speech system

Search Result 246, Processing Time 0.028 seconds

Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality (가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현)

  • 최장석;이기영
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.7
    • /
    • pp.95-102
    • /
    • 1998
  • This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.

  • PDF

A Study on Text Mining Analysis of Presidential Maritime Concept in KOREA (텍스트마이닝을 이용한 한국 대통령의 해양관에 관한 연구)

  • Kim, Sung-Kuk;Lee, Tae-Hwee
    • Journal of Korea Port Economic Association
    • /
    • v.36 no.3
    • /
    • pp.39-54
    • /
    • 2020
  • In the presidential political system, the word of the president has great influence on the formation of national policy and the decision-making process. Policy priorities are determined according to the president's ideology and core values, and various policies are established and executed according to the priorities. Therefore, this paper analyzes the contents of the president's speech. Since the president's speech is a semantic datum, in order to analyze unstructured text, big data analysis is conducted through the methods of machine learning and deep learning. In this study, the president's speech at the "National Sea Day" commemoration was obtained 1996 onwards and analyzed using topic modeling. As a result of the analysis, all the presidents' speeches were delivered with a view of the ocean that was consistent with the direction of their administration. It was confirmed that the ocean-industry-resource topics, which are the intrinsic values of the ocean, were not damaged and consistently emphasized by all presidents.

Speech Synthesis for the Korean large Vocabulary Through the Waveform Analysis in Time Domains and Evauation of Synthesized Speech Quality (시간영역에서의 파형분석에 의한 무제한 어휘 합성 및 음절 유형별 규칙합성음 음질평가)

  • Kang, Chan-Hee;Chin, Yong-Ohk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.71-83
    • /
    • 1994
  • This paper deals with the improvement of the synthesized speech quality and naturality in the Korean TTS(Text-to-Speech) system. We had extracted the parameters(table2) such as its amplitude, duration and pitch period in a syllable through the analysis of speech waveforms(table1) in the time domain and synthesized syllables using them. To the frequencies of the Korean pronunciation large vocabulary dictionary we had synthesized speeches selected 229 syllables such as V types are 19, CV types are 80. VC types are 30 and CVC types are 100. According to the 4 Korean syllable types from the data format dictionary(table3) we had tested each 15 syllables with the objective MOS(Mean Opinion Score) evaluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. As the results of experiments the qualities of them are very clear and we can control the prosodic elements such as durations, accents and pitch periods (fig9, 10, 11, 12).

  • PDF

Improvement of Shop Music Broadcasting Services Using Music Lists and User Experience (방송목록과 사용자 경험 정보를 이용한 매장 음원 방송 서비스의 개선)

  • Kang, Sun-Mee;Kim, Hyun-Deuc;Chang, Moon-Soo
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.121-130
    • /
    • 2008
  • This paper proposes the way of improvement and system build-up for shop music broadcasting services provided by the Internet. Comparing the shop music broadcasting services and personal music broadcasting services, we propose the way of shop music broadcasting services customers prefer to. That is, such a function is provided that a user can control the broadcasting music lists a specialist provides according to the current circumstance of shop. This paper proposes the whole system such a service is possible and verifies the efficiency by experiments.

  • PDF

Postprocessing of A Speech Recognition using the Morphological Anlaysis Technique (형태소 분석 기법을 이용한 음성 인식 후처리)

  • 박미성;김미진;김계성;김성규;이문희;최재혁;이상조
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.4
    • /
    • pp.65-77
    • /
    • 1999
  • There are two problems which will be processed to graft a continuous speech recognition results into natural language processing technique. First, the speaking's unit isn't consistent with text's spacing unit. Second, when it is to be pronounced the phonological alternation phenomena occur inside morphemes or among morphemes. In this paper, we implement the postprocessing system of a continuous speech recognition that above all, solve two problems using the eo-jeol generator and syllable recoveror and morphologically analyze the generated results and then correct the failed results through the corrector. Our system experiments with two kinds of speech corpus, i.e., a primary school text book and editorial corpus. The successful percentage of the former is 93.72%, that of the latter is 92.26%. As results of experiment, we verified that our system is stable regardless the sorts of corpus.

  • PDF

A Drowsiness Detection System using ChatGPT and Image Processing (ChatGPT와 영상처리를 이용한 졸음 감지 시스템)

  • Hyeon-Jun Lee;Hyeon-Sang Soon;Seong-Hun Jo;Chang-Hui Seo;Ji-Yun Kang;Se-Jin Oh
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.259-260
    • /
    • 2024
  • 졸음운전으로 인한 교통사고는 매년 꾸준하게 일어나 이에 대한 다방면의 해결책이 요구되고 있다. 본 논문에서는 위 문제를 개선하고자 ChatGPT와 영상처리를 이용한 졸음 감지 시스템을 구현하였다. 이 시스템은 운전자의 얼굴 부분을 영상처리로 인식하여 눈동자의 종횡비를 구해 PERCLOS 공식에 따른 운전자의 졸음을 판별시키고, 경고와 동시에 ChatGPT가 운전자에게 특정 주제를 키워드로 TTS와 STT를 통해 대화한다. 운전자의 졸음을 판별하기 위해 임베디드 보드에서 연결된 캠을 통해 졸음 판별을 하고, ChatGPT도 마찬가지로 보드에서 연결한 스피커, 마이크를 통해 운전자와 대화한다. 이를 활용하여 운전자의 졸음 자각을 통한 안전운전 및 사고 발생률의 감소를 기대할 수 있다.

  • PDF

The Smoothing Method of the Concatenation Parts in Speech Waveform by using the Forward/Backward LPC Technique (전, 후방향 LPC법에 의한 음성 파형분절의 연결부분 스므딩법)

  • 이미숙
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.15-20
    • /
    • 1991
  • In a text-to-speech system, sound units (e. q., phonemes, words, or phrases) can be concatenated together to produce required utterance. The quality of the resulting speech is dependent on factors including the phonological/prosodic contour, the quality of basic concatenation units, and how well the units join together. Thus although the quality of each basic sound unit is high, if occur the discontinuity in the concatenation part then the quality of synthesis speech is decrease. To solve this problem, a smoothing operation should be carried out in concatenation parts. But a major problem is that, as yet, no method of parameter smoothing is availalbe for joining the segment together.

  • PDF

An Implementation of Speaker Verification System Based on Continuants and Multilayer Perceptrons

  • Lee, Tae-Seung;Park, Sung-Won;Lim, Sang-Seok;Hwang, Byong-Won
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.216-219
    • /
    • 2003
  • Among the techniques to protect private information by adopting biometrics, speaker verification is expected to be widely used due to advantages in convenient usage and inexpensive implementation cost Speaker verification should achieve a high degree of the reliability in the verification nout the flexibility in speech text usage, and the efficiency in verification system complexity. Continuants have excellent speaker-discriminant power and the modest number of phonemes in the category, and multilayer perceptrons (MLPs) have superior recognition ability and fast operation speed. In consequence, the two provide viable ways for speaker verification system to obtain the above properties. This paper implements a system to which continuants and MLPs are applied, and evaluates the system using a Korean speech database. The results of the experiment prove that continuants and MLPs enable the system to acquire the three properties.

  • PDF

Effective Text Question Analysis for Goal-oriented Dialogue (목적 지향 대화를 위한 효율적 질의 의도 분석에 관한 연구)

  • Kim, Hakdong;Go, Myunghyun;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.48-57
    • /
    • 2019
  • The purpose of this study is to understand the intention of the inquirer from the single text type question in Goal-oriented dialogue. Goal-Oriented Dialogue system means a dialogue system that satisfies the user's specific needs via text or voice. The intention analysis process is a step of analysing the user's intention of inquiry prior to the answer generation, and has a great influence on the performance of the entire Goal-Oriented Dialogue system. The proposed model was used for a daily chemical products domain and Korean text data related to the domain was used. The analysis is divided into a speech-act which means independent on a specific field concept-sequence and which means depend on a specific field. We propose a classification method using the word embedding model and the CNN as a method for analyzing speech-act and concept-sequence. The semantic information of the word is abstracted through the word embedding model, and concept-sequence and speech-act classification are performed through the CNN based on the semantic information of the abstract word.

The Acoustic Analysis of Korean Read Speech - with respect to the prosodic phrasing - (한국어 낭독체 문장의 음향분석 -바람과 햇님의 운율구 생성을 중심으로-)

  • Sung Chuljae
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.157-172
    • /
    • 1996
  • This study aims to suggest some theoretical methodology for analysis of the prosodic patterns in Korean Read Speech. The engineering effort relevant to the phonetic study has focused to the importance of prosodic phrasing which may play a major role in analyzing the phonetic DB. Before establishing the prosodic phrase as the prosodic unit, we should describe the features of the boundary signal in a target sentence. With this in mind, the general characteristics of Read Speech and the ToBI(tones and Break Indices), which has been currently in vogue with respect to the prosodic labelling system were presented as the first step. The concrete analysis was carried out with the fable 'North Wind and the Sun' Korean version, where about 25 prosodic units were discriminated by perceptual approach for 5 subjects. Establishing various informations which can be used for deciding a boundary position systematically, we can proceed to the next, viz. acoustic analysis of prosodic unit. The most important which we primarily study for improving the naturalness of synthetic speech may be, at first, detecting the boundary signals in the speech file and accordingly reestablishment it within the raw text.

  • PDF