• Title/Summary/Keyword: Speech-to-Text

Search Result 505, Processing Time 0.025 seconds

Speech Animation with Multilevel Control (다중 제어 레벨을 갖는 입모양 중심의 표정 생성)

  • Moon, Bo-Hee;Lee, Son-Ou;Wohn, Kwang-yun
    • Korean Journal of Cognitive Science
    • /
    • v.6 no.2
    • /
    • pp.47-79
    • /
    • 1995
  • Since the early age of computer graphics, facial animation has been applied to various fields, and nowadays it has found several novel applications such as virtual reality(for representing virtual agents), teleconference, and man-machine interface.When we want to apply facial animation to the system with multiple participants connected via network, it is hard to animate facial expression as we desire in real-time because of the size of information to maintain an efficient communication.This paper's major contribution is to adapt 'Level-of-Detail'to the facial animation in order to solve the above problem.Level-of-Detail has been studied in the field of computer graphics to reperesent the appearance of complicated objects in efficient and adaptive way, but until now no attempt has mode in the field of facial animation. In this paper, we present a systematic scheme which enables this kind of adaptive control using Level-of-Detail.The implemented system can generate speech synchronized facial expressions with various types of user input such as text, voice, GUI, head motion, etc.

  • PDF

The Design for Self-care System Based on RFID (RFID를 이용한 Self-care System 설계)

  • Xiao, Huang;Zhou, Kun-Peng;Jin, Woo-Jeong;Cho, Yong-Soon;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.879-881
    • /
    • 2010
  • For the rapid development of society, such as small family, one-people family is following. The traditional family is being changed, so the older stay home alone. That makes it more and more. Staying home alone, the older's health and safety are worth considering by us. With the rapid development of RFIDRadio Frequency Identification) technology, its applications have extended to all areas of our lifes. RFIDRadio Frequency Identification) has became a major topic of concern in multi-industry. With the high-speed economic growth and the development of science, medicine, the old people's life expectancy is increasing slightly. So it is necessary to design a protective system for the older's safety. In this thesis, self-care system is made by using RFID(Radio Frequency Identification) technology to authenticate an user and using TTS(test to speech) to convert character information to voice information and also using infrared radiation technology to protect home effectively, and using e-blood pressure monitors to examination the older's bodies.

  • PDF

Proposal of speaker change detection system considering speaker overlap (화자 겹침을 고려한 화자 전환 검출 시스템 제안)

  • Park, Jisu;Yun, Young-Sun;Cha, Shin;Park, Jeon Gue
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.466-472
    • /
    • 2021
  • Speaker Change Detection (SCD) refers to finding the moment when the main speaker changes from one person to the next in a speech conversation. In speaker change detection, difficulties arise due to overlapping speakers, inaccuracy in the information labeling, and data imbalance. To solve these problems, TIMIT corpus widely used in speech recognition have been concatenated artificially to obtain a sufficient amount of training data, and the detection of changing speaker has performed after identifying overlapping speakers. In this paper, we propose an speaker change detection system that considers the speaker overlapping. We evaluated and verified the performance using various approaches. As a result, a detection system similar to the X-Vector structure was proposed to remove the speaker overlapping region, while the Bi-LSTM method was selected to model the speaker change system. The experimental results show a relative performance improvement of 4.6 % and 13.8 % respectively, compared to the baseline system. Additionally, we determined that a robust speaker change detection system can be built by conducting related studies based on the experimental results, taking into consideration text and speaker information.

Development of Intelligent Messenger for Affective Interaction of Content Robot (콘텐츠 로봇의 감성적 반응을 위한 지능형 메신저 개발)

  • Park, Bum-Jun;So, Su-Hwan;Park, Tae-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.9
    • /
    • pp.9-17
    • /
    • 2010
  • Nowadays, many research have been conducted on robots or interactive characters that properly respond to the users affection. In this paper, we develop an intelligent messenger that provides appropriate responses to text inputs according to user's intention and affection. In order to properly respond, the intelligent messenger adapts methods to recognize user's speech act and affection. And it uses an AIML-based interactive script to which tags are additionally attached to express affection and speech act. If the intelligent messenger finds a proper reply in the interactive scripts, it displays the reply in a dialog window, and an animation character expresses emotion assimilated with a user's affection. If the animation character is synchronized with a content robot through a wireless link, the robot in the same space with the user can provide emotional response.

Speaker Verification System Using Continuants and Multilayer Perceptrons (지속음 및 다층신경망을 이용한 화자증명 시스템)

  • Lee, Tae-Seung;Park, Sung-Won;Hwang, Byong-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.1015-1020
    • /
    • 2003
  • Among the techniques to protect private information by adopting biometrics, speaker verification is expected to be widely used due to advantages in convenient usage and implementation cost. Speaker verification should achieve a high degree of the reliability in the verification score, the flexibility in speech text usage, and the efficiency in verification system complexity. Continuants have excellent speaker-discriminant power and the modest number of phonemes in the category, and multilayer perceptrons (MLPs) have superior recognition ability and fast operation speed. In consequence, the two provide viable ways for speaker verification system to obtain the above properties. This paper implements a system to which continuants and MLPs are applied, and evaluates the system using a Korean speech database. The results of the experiment prove that continuants and MLPs enable the system to acquire the three properties.

  • PDF

A Study on the Fast Enrollment of Text-Independent Speaker Verification for Vehicle Security (차량 보안을 위한 어구독립 화자증명의 등록시간 단축에 관한 연구)

  • Lee, Tae-Seung;Choi, Ho-Jin
    • Journal of Advanced Navigation Technology
    • /
    • v.5 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • Speech has a good characteristics of which car drivers busy to concern with miscellaneous operation can make use in convenient handling and manipulating of devices. By utilizing this, this works proposes a speaker verification method for protecting cars from being stolen and identifying a person trying to access critical on-line services. In this, continuant phonemes recognition which uses language information of speech and MLP(mult-layer perceptron) which has some advantages against previous stochastic methods are adopted. The recognition method, though, involves huge computation amount for learning, so it is somewhat difficult to adopt this in speaker verification application in which speakers should enroll themselves at real time. To relieve this problem, this works presents a solution that introduces speaker cohort models from speaker verification score normalization technique established before, dividing background speakers into small cohorts in advance. As a result, this enables computation burden to be reduced through classifying the enrolling speaker into one of those cohorts and going through enrollment for only that cohort.

  • PDF

An Emotion Scanning System on Text Documents (텍스트 문서 기반의 감성 인식 시스템)

  • Kim, Myung-Kyu;Kim, Jung-Ho;Cha, Myung-Hoon;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.12 no.4
    • /
    • pp.433-442
    • /
    • 2009
  • People are tending to buy products through the Internet rather than purchasing them from the store. Some of the consumers give their feedback on line such as reviews, replies, comments, and blogs after they purchased the products. People are also likely to get some information through the Internet. Therefore, companies and public institutes have been facing this situation where they need to collect and analyze reviews or public opinions for them because many consumers are interested in other's opinions when they are about to make a purchase. However, most of the people's reviews on web site are too numerous, short and redundant. Under these circumstances, the emotion scanning system of text documents on the web is rising to the surface. Extracting writer's opinions or subjective ideas from text exists labeled words like GI(General Inquirer) and LKB(Lexical Knowledge base of near synonym difference) in English, however Korean language is not provided yet. In this paper, we labeled positive, negative, and neutral attribute at 4 POS(part of speech) which are noun, adjective, verb, and adverb in Korean dictionary. We extract construction patterns of emotional words and relationships among words in sentences from a large training set, and learned them. Based on this knowledge, comments and reviews regarding products are classified into two classes polarities with positive and negative using SO-PMI, which found the optimal condition from a combination of 4 POS. Lastly, in the design of the system, a flexible user interface is designed to add or edit the emotional words, the construction patterns related to emotions, and relationships among the words.

  • PDF

AI Advisor for Response of Disaster Safety in Risk Society (위험사회 재난 안전 분야 대응을 위한 AI 조력자)

  • Lee, Yong-Hak;Kang, Yunhee;Lee, Min-Ho;Park, Seong-Ho;Kang, Myung-Ju
    • Journal of Platform Technology
    • /
    • v.8 no.3
    • /
    • pp.22-29
    • /
    • 2020
  • The 4th industrial revolution is progressing by country as a mega trend that leads various technological convergence directions in the social and economic fields from the initial simple manufacturing innovation. The epidemic of infectious diseases such as COVID-19 is shifting digital-centered non-face-to-face business from economic operation, and the use of AI and big data technology for personalized services is essential to spread online. In this paper, we analyze cases focusing on the application of artificial intelligence technology, which is a key technology for the effective implementation of the digital new deal promoted by the government, as well as the major technological characteristics of the 4th industrial revolution and describe the use cases in the field of disaster response. As a disaster response use case, AI assistants suggest appropriate countermeasures according to the status of the reporter in an emergency call. To this end, AI assistants provide speech recognition data-based analysis and disaster classification of converted text for adaptive response.

  • PDF

Analyzing Vocabulary Characteristics of Colloquial Style Corpus and Automatic Construction of Sentiment Lexicon (구어체 말뭉치의 어휘 사용 특징 분석 및 감정 어휘 사전의 자동 구축)

  • Kang, Seung-Shik;Won, HyeJin;Lee, Minhaeng
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.144-151
    • /
    • 2020
  • In a mobile environment, communication takes place via SMS text messages. Vocabularies used in SMS texts can be expected to use vocabularies of different classes from those used in general Korean literary style sentence. For example, in the case of a typical literary style, the sentence is correctly initiated or terminated and the sentence is well constructed, while SMS text corpus often replaces the component with an omission and a brief representation. To analyze these vocabulary usage characteristics, the existing colloquial style corpus and the literary style corpus are used. The experiment compares and analyzes the vocabulary use characteristics of the colloquial corpus SMS text corpus and the Naver Sentiment Movie Corpus, and the written Korean written corpus. For the comparison and analysis of vocabulary for each corpus, the part of speech tag adjective (VA) was used as a standard, and a distinctive collexeme analysis method was used to measure collostructural strength. As a result, it was confirmed that adjectives related to emotional expression such as'good-','sorry-', and'joy-' were preferred in the SMS text corpus, while adjectives related to evaluation expressions were preferred in the Naver Sentiment Movie Corpus. The word embedding was used to automatically construct a sentiment lexicon based on the extracted adjectives with high collostructural strength, and a total of 343,603 sentiment representations were automatically built.

The Study on Automatic Speech Recognizer Utilizing Mobile Platform on Korean EFL Learners' Pronunciation Development (자동음성인식 기술을 이용한 모바일 기반 발음 교수법과 영어 학습자의 발음 향상에 관한 연구)

  • Park, A Young
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1101-1107
    • /
    • 2017
  • This study explored the effect of ASR-based pronunciation instruction, using a mobile platform, on EFL learners' pronunciation development. Particularly, this quasi-experimental study focused on whether using mobile ASR, which provides voice-to-text feedback, can enhance the perception and production of target English consonants minimal pairs (V-B, R-L, and G-Z) of Korean EFL learners. Three intact classes of 117 Korean university students were assigned to three groups: a) ASR Group: ASR-based pronunciation instruction providing textual feedback by the mobile ASR; b) Conventional Group: conventional face-to-face pronunciation instruction providing individual oral feedback by the instructor; and the c) Hybrid Group: ASR-based pronunciation instruction plus conventional pronunciation instruction. The ANCOVA results showed that the adjusted mean score for pronunciation production post-test on the Hybrid instruction group (M=82.71, SD =3.3) was significantly higher than the Conventional group (M=62.6, SD =4.05) (p<.05).