• Title/Summary/Keyword: text-to-speech system

Search Result 246, Processing Time 0.025 seconds

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Prediction of Prosodic Break Using Syntactic Relations and Prosodic Features (구문 관계와 운율 특성을 이용한 한국어 운율구 경계 예측)

  • Jung, Young-Im;Cho, Sun-Ho;Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.1
    • /
    • pp.89-105
    • /
    • 2008
  • In this paper, we suggest a rule-based system for the prediction of natural prosodic phrase breaks from Korean texts. For the implementation of the rule-based system, (1) sentence constituents are sub-categorized according to their syntactic functions, (2) syntactic phrases are recognized using the dependency relations among sub-categorized constituents, (3) rules for predicting prosodic phrase breaks are created. In addition, (4) the length of syntactic phrases and sentences, the position of syntactic phrases in a sentence, sense information of contextual words have been considered as to determine the variable prosodic phrase breaks. Based on these rules and features, we obtained the accuracy over 90% in predicting the position of major break and no break which have high correlation with the syntactic structure of the sentence. As for the overall accuracy in predicting the whole prosodic phrase breaks, the suggested system shows Break_Correct of 87.18% and Juncture Correct of 89.27% which is higher than that of other models.

  • PDF

Artificial Intelligence Art : A Case study on the Artwork An Evolving GAIA (대화형 인공지능 아트 작품의 제작 연구 :진화하는 신, 가이아(An Evolving GAIA)사례를 중심으로)

  • Roh, Jinah
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.5
    • /
    • pp.311-318
    • /
    • 2018
  • This paper presents the artistic background and implementation structure of a conversational artificial intelligence interactive artwork, "An Evolving GAIA". Recent artworks based on artificial intelligence technology are introduced. Development of biomimetics and artificial life technology has burred differentiation of machine and human. In this paper, artworks presenting machine-life metaphor are shown, and the distinct implementation of conversation system is emphasized in detail. The artwork recognizes and follows the movement of audience using its eyes for natural interaction. It listens questions of the audience and replies appropriate answers by text-to-speech voice, using the conversation system implemented with an Android client in the artwork and a webserver based on the question-answering dictionary. The interaction gives to the audience discussion of meaning of life in large scale and draws sympathy for the artwork itself. The paper shows the mechanical structure, the implementation of conversational system of the artwork, and reaction of the audience which can be helpful to direct and make future artificial intelligence interactive artworks.

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

Fashion attribute-based mixed reality visualization service (패션 속성기반 혼합현실 시각화 서비스)

  • Yoo, Yongmin;Lee, Kyounguk;Kim, Kyungsun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.2-5
    • /
    • 2022
  • With the advent of deep learning and the rapid development of ICT (Information and Communication Technology), research using artificial intelligence is being actively conducted in various fields of society such as politics, economy, and culture and so on. Deep learning-based artificial intelligence technology is subdivided into various domains such as natural language processing, image processing, speech processing, and recommendation system. In particular, as the industry is advanced, the need for a recommendation system that analyzes market trends and individual characteristics and recommends them to consumers is increasingly required. In line with these technological developments, this paper extracts and classifies attribute information from structured or unstructured text and image big data through deep learning-based technology development of 'language processing intelligence' and 'image processing intelligence', and We propose an artificial intelligence-based 'customized fashion advisor' service integration system that analyzes trends and new materials, discovers 'market-consumer' insights through consumer taste analysis, and can recommend style, virtual fitting, and design support.

  • PDF

Performance Improvement Methods of a Spoken Chatting System Using SVM (SVM을 이용한 음성채팅시스템의 성능 향상 방법)

  • Ahn, HyeokJu;Lee, SungHee;Song, YeongKil;Kim, HarkSoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.6
    • /
    • pp.261-268
    • /
    • 2015
  • In spoken chatting systems, users'spoken queries are converted to text queries using automatic speech recognition (ASR) engines. If the top-1 results of the ASR engines are incorrect, these errors are propagated to the spoken chatting systems. To improve the top-1 accuracies of ASR engines, we propose a post-processing model to rearrange the top-n outputs of ASR engines using a ranking support vector machine (RankSVM). On the other hand, a number of chatting sentences are needed to train chatting systems. If new chatting sentences are not frequently added to training data, responses of the chatting systems will be old-fashioned soon. To resolve this problem, we propose a data collection model to automatically select chatting sentences from TV and movie scenarios using a support vector machine (SVM). In the experiments, the post-processing model showed a higher precision of 4.4% and a higher recall rate of 6.4% compared to the baseline model (without post-processing). Then, the data collection model showed the high precision of 98.95% and the recall rate of 57.14%.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Syntactic and Semantic Disambiguation for Interpretation of Numerals in the Information Retrieval (정보 검색을 위한 숫자의 해석에 관한 구문적.의미적 판별 기법)

  • Moon, Yoo-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.8
    • /
    • pp.65-71
    • /
    • 2009
  • Natural language processing is necessary in order to efficiently perform filtering tremendous information produced in information retrieval of world wide web. This paper suggested an algorithm for meaning of numerals in the text. The algorithm for meaning of numerals utilized context-free grammars with the chart parsing technique, interpreted affixes connected with the numerals and was designed to disambiguate their meanings systematically supported by the n-gram based words. And the algorithm was designed to use POS (part-of-speech) taggers, to automatically recognize restriction conditions of trigram words, and to gradually disambiguate the meaning of the numerals. This research performed experiment for the suggested system of the numeral interpretation. The result showed that the frequency-proportional method recognized the numerals with 86.3% accuracy and the condition-proportional method with 82.8% accuracy.

A Study On Intelligent Robot Control Based On Voice Recognition For Smart FA (스마트 FA를 위한 음성인식 지능로봇제어에 관한 연구)

  • Sim, H.S.;Kim, M.S.;Choi, M.H.;Bae, H.Y.;Kim, H.J.;Kim, D.B.;Han, S.H.
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.2
    • /
    • pp.87-93
    • /
    • 2018
  • This Study Propose A New Approach To Impliment A Intelligent Robot Control Based on Voice Recognition For Smart Factory Automation Since human usually communicate each other by voices, it is very convenient if voice is used to command humanoid robots or the other type robot system. A lot of researches has been performed about voice recognition systems for this purpose. Hidden Markov Model is a robust statistical methodology for efficient voice recognition in noise environments. It has being tested in a wide range of applications. A prediction approach traditionally applied for the text compression and coding, Prediction by Partial Matching which is a finite-context statistical modeling technique and can predict the next characters based on the context, has shown a great potential in developing novel solutions to several language modeling problems in speech recognition. It was illustrated the reliability of voice recognition by experiments for humanoid robot with 26 joints as the purpose of application to the manufacturing process.

Case Study : Cinematography using Digital Human in Tiny Virtual Production (초소형 버추얼 프로덕션 환경에서 디지털 휴먼을 이용한 촬영 사례)

  • Jaeho Im;Minjung Jang;Sang Wook Chun;Subin Lee;Minsoo Park;Yujin Kim
    • Journal of the Korea Computer Graphics Society
    • /
    • v.29 no.3
    • /
    • pp.21-31
    • /
    • 2023
  • In this paper, we introduce a case study of cinematography using digital human in virtual production. This case study deals with the system overview of virtual production using LEDs and an efficient filming pipeline using digital human. Unlike virtual production using LEDs, which mainly project the background on LEDs, in this case, we use digital human as a virtual actor to film scenes communicating with a real actor. In addition, to film the dialogue scene between the real actor and the digital human using a real-time engine, we automatically generated speech animation of the digital human in advance by applying our Korean lip-sync technology based on audio and text. We verified this filming case by using a real-time engine to produce short drama content using real actor and digital human in an LED-based virtual production environment.