• Title/Summary/Keyword: text-to-speech system

Search Result 246, Processing Time 0.023 seconds

Development of a 3D-Graphics Based Visualization Application for Reliability-Centered Maintenance (신뢰도 중심 유지보수 기법을 이용한 3차원 기반의 변전소 유지보수 시각화 프로그램 개발)

  • Jung, Hong-Suk;Park, Chang-Hyun;Jang, Gil-Soo
    • Proceedings of the KIEE Conference
    • /
    • 2007.11b
    • /
    • pp.288-290
    • /
    • 2007
  • This paper presents a visualization application using 3D-graphics for effective maintenance of power equipment. The maintenance algorithm implemented in the application is based on Condition-Based Maintenance (CBM) and Reliability -Centered Maintenance (RCM). The main frame of the developed application was made up based on Windows Application Programming Interface (API) and Microsoft Fundamental Classes (MFC). In order to develop the interactive 3D application, the WorldToolKit (WTK) library based on Open GL was used. Also Text-to-Speech (TTS) technology was used to enhance the efficiency of operators. The developed application can help the power system operators to intuitively recognize the present state and maintenance information of the equipment.

  • PDF

Ubiquitous Car Maintenance Services Using Augmented Reality and Context Awareness (증강현실을 활용한 상황인지기반의 편재형 자동차 정비 서비스)

  • Rhee, Gue-Won;Seo, Dong-Woo;Lee, Jae-Yeol
    • Korean Journal of Computational Design and Engineering
    • /
    • v.12 no.3
    • /
    • pp.171-181
    • /
    • 2007
  • Ubiquitous computing is a vision of our future computing lifestyle in which computer systems seamlessly integrate into our everyday lives, providing services and information in anywhere and anytime fashion. Augmented reality (AR) can naturally complement ubiquitous computing by providing an intuitive and collaborative visualization and simulation interface to a three-dimensional information space embedded within physical reality. This paper presents a service framework and its applications for providing context-aware u-car maintenance services using augmented reality, which can support a rich set of ubiquitous services and collaboration. It realizes bi-augmentation between physical and virtual spaces using augmented reality. It also offers a context processing module to acquire, interpret and disseminate context information. In particular, the context processing module considers user's preferences and security profile for providing private and customer-oriented services. The prototype system has been implemented to support 3D animation, TTS (Text-to-Speech), augmented manual, annotation, and pre- and post-augmentation services in ubiquitous car service environments.

A study on speech disentanglement framework based on adversarial learning for speaker recognition (화자 인식을 위한 적대학습 기반 음성 분리 프레임워크에 대한 연구)

  • Kwon, Yoohwan;Chung, Soo-Whan;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.447-453
    • /
    • 2020
  • In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.

"In the Beginning was the Deed": Sigmund Freud's Auditory Imagination

  • KIM, TaeChul
    • English & American cultural studies
    • /
    • v.9 no.1
    • /
    • pp.113-139
    • /
    • 2009
  • Such is an elective affinity between literary studies and psychoanalysis that the latter sometime serves as a form of literary pedagogy. The affinity mainly consists in their shared concern for language. The signification of language in psychoanalysis is much similar to that of literature. Many of psychoanalytic terms and theoretical tenets bear witness to its dependence clinically on speech phenomena and theoretically on language in general. It is most true of Sigmund Freud, for whom the unconscious is in effect the linguistic unconscious. The Freudian unconscious, compressing and displacing through images and ideas, works as a text for psychoanalysis, which approach has not only paved one of the ways to poststructuralist anti-essentialism but with which literary studies also feel uncanny familiarity. Freudian psychoanalysis, starting empirically from clinical observations, discovers that words exist independent of meanings in the form of things in the unconscious system. Out of the various sensory elements of a word-thing, in psychoanalytic terms, the auditory is central. Now with the auditory imagination cultivated in the clinic, Freud figures out compression and displacement as the chief unconscious works, of which my main argument is that they are based phonetically on heteronym and homonym associations respectively. Compression and displacement work to be masks, which excites Freud's sense of challenge: his is a kind of poststructuralist approach, in the sense that the closed interrelatedness of words without external referents determines the signification in a given situation. But the works of compression and displacement, viewed in auditory terms rather than mapped on to metaphor and metonymy, can provide a new insight for a literary reading of Freud. Pursuing Freud's auditory imagination is not only an attempt to read his writing as literary text rather than for theoretical discussion, but also an experiment with the possibility of literary reading of a theoretical text in the age of after-theory.

Issues in Chinese prosody: conceptual foundations of a linguistically-motivated text-to-speech system for Mandarin

  • Lavin, Richard S.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.259-270
    • /
    • 2002
  • I examine various controversial aspects of Chinese prosody-tone structure, syllable structure, stress, and intonation-and stress the need to view all of these as interacting systems, aspects of a hierarchical prosodic structure. 1 examine various proposals at these various levels of the hierarchy and suggest which are most appropriate. Specifically, 1 suggest the adoption of Bao's version of syllable and tone, and Chen's account of stress. As for intonation, it is still not possible to make any definitive claims regarding an optimal model, but I examine work done by Kratochvil, Shih, and Carding et al, and suggest promising directions for future work.

  • PDF

Speech Animation with Multilevel Control (다중 제어 레벨을 갖는 입모양 중심의 표정 생성)

  • Moon, Bo-Hee;Lee, Son-Ou;Wohn, Kwang-yun
    • Korean Journal of Cognitive Science
    • /
    • v.6 no.2
    • /
    • pp.47-79
    • /
    • 1995
  • Since the early age of computer graphics, facial animation has been applied to various fields, and nowadays it has found several novel applications such as virtual reality(for representing virtual agents), teleconference, and man-machine interface.When we want to apply facial animation to the system with multiple participants connected via network, it is hard to animate facial expression as we desire in real-time because of the size of information to maintain an efficient communication.This paper's major contribution is to adapt 'Level-of-Detail'to the facial animation in order to solve the above problem.Level-of-Detail has been studied in the field of computer graphics to reperesent the appearance of complicated objects in efficient and adaptive way, but until now no attempt has mode in the field of facial animation. In this paper, we present a systematic scheme which enables this kind of adaptive control using Level-of-Detail.The implemented system can generate speech synchronized facial expressions with various types of user input such as text, voice, GUI, head motion, etc.

  • PDF

A System of Audio Data Analysis and Masking Personal Information Using Audio Partitioning and Artificial Intelligence API (오디오 데이터 내 개인 신상 정보 검출과 마스킹을 위한 인공지능 API의 활용 및 음성 분할 방법의 연구)

  • Kim, TaeYoung;Hong, Ji Won;Kim, Do Hee;Kim, Hyung-Jong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.5
    • /
    • pp.895-907
    • /
    • 2020
  • With the recent increasing influence of multimedia content other than the text-based content, services that help to process information in content brings us great convenience. These services' representative features are searching and masking the sensitive data. It is not difficult to find the solutions that provide searching and masking function for text information and image. However, even though we recognize the necessity of the technology for searching and masking a part of the audio data, it is not easy to find the solution because of the difficulty of the technology. In this study, we propose web application that provides searching and masking functions for audio data using audio partitioning method. While we are achieving the research goal, we evaluated several speech to text conversion APIs to choose a proper API for our purpose and developed regular expressions for searching sensitive information. Lastly we evaluated the accuracy of the developed searching and masking feature. The contribution of this work is in design and implementation of searching and masking a sensitive information from the audio data by the various functionality proving experiments.

On a Study of the Improvement of Speaker Recognition with Characteristics of High Order Reflection Coefficients (고차 반사계수 특성을 이용한 화자인식의 성능 향상에 관한 연구)

  • 이윤주;오세영;함명규;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.667-670
    • /
    • 1999
  • As the number of reference patterns increase in the text dependant speaker recognition, the recognition performance of the system degrades. So, if reference patterns were decreased the high recognition rate can be obtained. It’s because the speaker recognition can obtain the high discrimination. In this paper, to decrease the number of reference patterns, we choose candidate reference patterns to perform pattern matching with test pattern by high order component of the reflection coefficients of the uttered speech signal Consequently the total recognition rate of the proposed method is about 2% higher than that of the conventional method.

  • PDF

An Emotion Scanning System on Text Documents (텍스트 문서 기반의 감성 인식 시스템)

  • Kim, Myung-Kyu;Kim, Jung-Ho;Cha, Myung-Hoon;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.12 no.4
    • /
    • pp.433-442
    • /
    • 2009
  • People are tending to buy products through the Internet rather than purchasing them from the store. Some of the consumers give their feedback on line such as reviews, replies, comments, and blogs after they purchased the products. People are also likely to get some information through the Internet. Therefore, companies and public institutes have been facing this situation where they need to collect and analyze reviews or public opinions for them because many consumers are interested in other's opinions when they are about to make a purchase. However, most of the people's reviews on web site are too numerous, short and redundant. Under these circumstances, the emotion scanning system of text documents on the web is rising to the surface. Extracting writer's opinions or subjective ideas from text exists labeled words like GI(General Inquirer) and LKB(Lexical Knowledge base of near synonym difference) in English, however Korean language is not provided yet. In this paper, we labeled positive, negative, and neutral attribute at 4 POS(part of speech) which are noun, adjective, verb, and adverb in Korean dictionary. We extract construction patterns of emotional words and relationships among words in sentences from a large training set, and learned them. Based on this knowledge, comments and reviews regarding products are classified into two classes polarities with positive and negative using SO-PMI, which found the optimal condition from a combination of 4 POS. Lastly, in the design of the system, a flexible user interface is designed to add or edit the emotional words, the construction patterns related to emotions, and relationships among the words.

  • PDF

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.