• Title/Summary/Keyword: text-to-speech system

Search Result 246, Processing Time 0.035 seconds

Development of Automatic Lip-sync MAYA Plug-in for 3D Characters (3D 캐릭터에서의 자동 립싱크 MAYA 플러그인 개발)

  • Lee, Sang-Woo;Shin, Sung-Wook;Chung, Sung-Taek
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.3
    • /
    • pp.127-134
    • /
    • 2018
  • In this paper, we have developed the Auto Lip-Sync Maya plug-in for extracting Korean phonemes from voice data and text information based on Korean and produce high quality 3D lip-sync animation using divided phonemes. In the developed system, phoneme separation was classified into 8 vowels and 13 consonants used in Korean, referring to 49 phonemes provided by Microsoft Speech API engine SAPI. In addition, the pronunciation of vowels and consonants has variety Mouth Shapes, but the same Viseme can be applied to some identical ones. Based on this, we have developed Auto Lip-sync Maya Plug-in based on Python to enable lip-sync animation to be implemented automatically at once.

Generative Interactive Psychotherapy Expert (GIPE) Bot

  • Ayesheh Ahrari Khalaf;Aisha Hassan Abdalla Hashim;Akeem Olowolayemo;Rashidah Funke Olanrewaju
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.15-24
    • /
    • 2023
  • One of the objectives and aspirations of scientists and engineers ever since the development of computers has been to interact naturally with machines. Hence features of artificial intelligence (AI) like natural language processing and natural language generation were developed. The field of AI that is thought to be expanding the fastest is interactive conversational systems. Numerous businesses have created various Virtual Personal Assistants (VPAs) using these technologies, including Apple's Siri, Amazon's Alexa, and Google Assistant, among others. Even though many chatbots have been introduced through the years to diagnose or treat psychological disorders, we are yet to have a user-friendly chatbot available. A smart generative cognitive behavioral therapy with spoken dialogue systems support was then developed using a model Persona Perception (P2) bot with Generative Pre-trained Transformer-2 (GPT-2). The model was then implemented using modern technologies in VPAs like voice recognition, Natural Language Understanding (NLU), and text-to-speech. This system is a magnificent device to help with voice-based systems because it can have therapeutic discussions with the users utilizing text and vocal interactive user experience.

The Design for Self-care System Based on RFID (RFID를 이용한 Self-care System 설계)

  • Xiao, Huang;Zhou, Kun-Peng;Jin, Woo-Jeong;Cho, Yong-Soon;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.879-881
    • /
    • 2010
  • For the rapid development of society, such as small family, one-people family is following. The traditional family is being changed, so the older stay home alone. That makes it more and more. Staying home alone, the older's health and safety are worth considering by us. With the rapid development of RFIDRadio Frequency Identification) technology, its applications have extended to all areas of our lifes. RFIDRadio Frequency Identification) has became a major topic of concern in multi-industry. With the high-speed economic growth and the development of science, medicine, the old people's life expectancy is increasing slightly. So it is necessary to design a protective system for the older's safety. In this thesis, self-care system is made by using RFID(Radio Frequency Identification) technology to authenticate an user and using TTS(test to speech) to convert character information to voice information and also using infrared radiation technology to protect home effectively, and using e-blood pressure monitors to examination the older's bodies.

  • PDF

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

LLM-based chatbot system to improve worker efficiency and prevent safety incidents (작업자의 업무 능률 향상과 안전 사고 방지를 위한 LLM 기반 챗봇 시스템)

  • Doohwan Kim;Yohan Han;Inhyuk Jeong;Yeongseok Hwnag;Jinju Park;Nahyeon Lee;Yujin Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.321-324
    • /
    • 2024
  • 본 논문에서는 LLM(Large Language Models) 기반의 STT 결합 챗봇 시스템을 제안한다. 제조업 공장에서 안전 교육의 부족과 외국인 근로자의 증가는 안전을 중시하는 작업 환경에서 새로운 도전과제로 부상하고 있다. 이에 본 연구는 언어 모델과 음성 인식(Speech-to-Text, STT) 기술을 활용한 혁신적인 챗봇 시스템을 통해 이러한 문제를 해결하고자 한다. 제안된 시스템은 작업자들이 장비 사용 매뉴얼 및 안전 지침을 쉽게 접근하도록 지원하며, 비상 상황에서 신속하고 정확한 대응을 가능하게 한다. 연구 과정에서 LLM은 작업자의 의도를 파악하고, STT 기술은 음성 명령을 효과적으로 처리한다. 실험 결과, 이 시스템은 작업자의 업무 효율성을 증대시키고 언어 장벽을 해소하는데 효과적임이 확인되었다. 본 연구는 제조업 현장에서 작업자의 안전과 업무 효율성 향상에 기여할 것으로 기대된다.

  • PDF

A Real-time Bus Arrival Notification System for Visually Impaired Using Deep Learning (딥 러닝을 이용한 시각장애인을 위한 실시간 버스 도착 알림 시스템)

  • Seyoung Jang;In-Jae Yoo;Seok-Yoon Kim;Youngmo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.2
    • /
    • pp.24-29
    • /
    • 2023
  • In this paper, we propose a real-time bus arrival notification system using deep learning to guarantee movement rights for the visually impaired. In modern society, by using location information of public transportation, users can quickly obtain information about public transportation and use public transportation easily. However, since the existing public transportation information system is a visual system, the visually impaired cannot use it. In Korea, various laws have been amended since the 'Act on the Promotion of Transportation for the Vulnerable' was enacted in June 2012 as the Act on the Movement Rights of the Blind, but the visually impaired are experiencing inconvenience in using public transportation. In particular, from the standpoint of the visually impaired, it is impossible to determine whether the bus is coming soon, is coming now, or has already arrived with the current system. In this paper, we use deep learning technology to learn bus numbers and identify upcoming bus numbers. Finally, we propose a method to notify the visually impaired by voice that the bus is coming by using TTS technology.

  • PDF

Web-based Text-To-Sign Language Translating System (웹기반 청각장애인용 수화 웹페이지 제작 시스템)

  • Park, Sung-Wook;Wang, Bo-Hyeun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.3
    • /
    • pp.265-270
    • /
    • 2014
  • Hearing-impaired people have difficulty in hearing, so it is also hard for them to learn letters that represent sound and text that conveys complex and abstract concepts. Therefore it has been natural choice for the hearing-impaired people to use sign language for communication, which employes facial expression, and hands and body motion. However, the major communication methods in daily life are text and speech, which are big obstacles for the hearing-impaired people to access information, to learn and make intellectual activities, and to get jobs. As delivering information via internet become common the hearing-impaired people are experiencing more difficulty in accessing information since internet represents information mostly in text forms. This intensifies unbalance of information accessibility. This paper reports web-based text-to-sign language translating system that helps web designer to use sign language in web page design. Since the system is web-based, if web designers are equipped with common computing environment for internet browsing, they can use the system. The web-based text-to-sign language system takes the format of bulletin board as user interface. When web designers write paragraphs and post them through the bulletin board to the translating server, the server translates the incoming text to sign language, animates with 3D avatar and records the animation in a MP4 file. The file addresses are fetched by the bulletin board and it enables web designers embed the translated sign language file into their web pages by using HTML5 or Javascript. Also we analyzed text used by web pages of public services, then figured out new words to the translating system, and added to improve translation. This addition is expected to encourage wide and easy acceptance of web pages for hearing-impaired people to public services.

Robust Speaker Identification Using Linear Transformation Optimized for Diagonal Covariance GMM (대각공분산 GMM에 최적인 선형변환을 이용한 강인한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.65
    • /
    • pp.67-80
    • /
    • 2008
  • We have been building a text-independent speaker recognition system that is robust to unknown channel and noise environments. In this paper, we propose a linear transformation to obtain robust features. The transformation is optimized to maximize the distances between the Gaussian mixtures. We use rotation of the axes, to cope with the problem of scaling the transformation matrix. The proposed transformation is similar to PCA or LDA, but can achieve better result in some special cases where PCA and LDA can not work properly. We use YOHO database to evaluate the proposed method and compare the result with PCA and LDA. The results show that the proposed method outperforms all the baseline, PCA and LDA.

  • PDF

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

Development of Walking Assist Smartphone Case for Blind People (시각장애인의 보행보조를 위한 스마트폰 케이스 구현)

  • Choi, Jin-Woo;Jeong, Gu-Min
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.3
    • /
    • pp.239-242
    • /
    • 2015
  • In this paper, we propose a walking assisting system for blind people using Android smartphone and Arduino board. In our proposed system, we use an Android smartphone case and an external ultrasonic sensor to detect the obstacles ahead. In this manner, blind people is able to aware unexpected objects by smartphone speakers or vibration functionality. In addition, the walking assisting system is also designed a notice system which will be triggered by built-in smartphone camera flash when blind people walk in some darkness place. The experimental results from real experiments on blind people have demonstrated the applicability of our walking assisting system, when it not only efficiently helps blind people avoid obstacles ahead but also possible traffic collisions in darkness condition.