• Title/Summary/Keyword: Intelligent Data Analysis

Search Result 1,456, Processing Time 0.03 seconds

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

A Technique on the 3-D Terrain Analysis Modeling for Optimum Site Selection and development of Stereo Tourism in the Future (미래입체관광의 최적지선정 및 개발을 위한 3차원지형분석모델링 기법)

  • Yeon, Sang-Ho;Choi, Seung-Kuk
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.11
    • /
    • pp.415-422
    • /
    • 2013
  • The contents development for the Internet and cyber tour has been attempted in a number of areas. 3D topography of the spatial environment, land planning and land information contents as a 3D tour of the future ubiquitous city safe for tourism due to the implementation of information made available major area. Domestic service, and in urban areas of the country where land and precise spatial information in order to shoot satellites and aircraft in the area you want to mount the camera on a variety of photo images taken by conducting 3D spatial that is required is able to obtain the information. Geo spatial information in a variety of direct or indirect acquisition of the initial spatial data into a database for accurate collection, storage, editing, manipulation and application technology changes in the future by establishing a database of 3D spatial by securing content organization ubiquitous tourist to take advantage of new tourism industry was greatly. As a result of this study for future tourism using geo spatial information and analysis of 3D modeling by intelligent land information indirectly, with quite a few stereo site experience and a variety of tourist spatial acquisition and utilization of information could prove.

Developing A Multi-dimensional Spatio-visual Information System (다차원기반 고정밀 공간영상정보 시스템 구축에 관한 연구)

  • Kim, Mi-Yun;Yeo, Wook-Hyun;Choi, Jin-Won
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.27 no.6
    • /
    • pp.649-658
    • /
    • 2009
  • The recent emergence of the paradigm of new urban planning for building intelligent urban spaces, such as U-City and U-Eco City, of which the concept of ubiquitous technology is applied, requires high quality three-dimensional spatial information of the urban area. The aim of this study is to build a multi-dimensional spatio-visual information system that includes the solution for visualization, spatial information search, analysis, and evaluation by integrating various types of 3D-modeled spatial information concerning the large urban-size area based on the latest GIS application technology. The range of this study is the integration, visualization, and utilization of spatial information with the goal of building 3D virtual urban environment of high-quality and high-resolution by increasing the utilization of the systematic urban facilities in order to fully reflect the actual user's needs, using the aerial LiDAR data as the plan to overcome the limitations of the existing 3D urban modeling. By reproducing the virtual urban environment the most similar to the actual world through the mash-up of satellite images and aerial photos on the standard format of spatial information constituted of properties and signs, the system will be built with many analysis and utilization functions that support the view and sunlight analysis, various administrative tasks, as well as the decision making process of the city.

Trend Analysis of Sports for All-Related Issues in Early Stage of COVID-19 Using Topic Modeling (토픽 모델링을 활용한 코로나19 초기 생활체육 이슈 분석)

  • Chung, Yunkil;Seo, Sumin;Kang, Hyunmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.57-79
    • /
    • 2022
  • COVID-19, which started in December 2019, has had a great impact on our lives in general, including politics, economy, society, and culture, and activities in sports and arts have also been significantly reduced. In the case of sports, sports for all fields in which ordinary citizens participate were particularly affected, and cases of infection in places closely related to people's lives, such as gyms, table tennis, and badminton clubs, also amplified the social fear of the spread of COVID-19. Therefore, in this study, we analyzed news articles related to sports for all at the time when COVID-19 was first spread, and investigated what issues were emerging and being discussed in the sports for all field under the COVID-19 situation. Specifically, we collected news articles dealt with sports for all issues under the COVID-19 situation from Korea's leading portal news sites and identified key sports for all issues by performing topic modeling on these articles. Through the analysis, we found meaningful issues such as COVID-19 outbreak in sports facilities and support for sports activities. In addition, through wordcloud analysis of these major issues, we visually understood the issues and identified the changes in these issues over time.

RPA Log Mining-based Process Automation Status Analysis - An Empirical Study on SMEs (RPA 로그 마이닝 기반 프로세스 자동화 현황 분석 - 중소기업대상 실증 연구)

  • Young Sik Kang;Jinwoo Jung;Seonyoung Shim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.265-288
    • /
    • 2023
  • Process mining has generally analyzed the default logs of Information Systems such as SAP ERP, but as the use of automation software called RPA expands, the logs by RPA bots can be utilized. In this study, the actual status of RPA automation in the field was identified by applying RPA bots to the work of three domestic manufacturing companies (cosmetic field) and analyzing them after leaving logs. Using Uipath and Python, we implemented RPA bots and wrote logs. We used Disco, a software dedicated to process mining to analyze the bot logs. As a result of log analysis in two aspects of bot utilization and performance through process mining, improvement requirements were found. In particular, we found that there was a point of improvement in all cases in that the utilization of the bot and errors or exceptions were found in many cases of process. Our approach is very scientific and empirical in that it analyzes the automation status and performance of bots using data rather than existing qualitative methods such as surveys or interviews. Furthermore, our study will be a meaningful basic step for bot behavior optimization, and can be seen as the foundation for ultimately performing process management.

YouTube Video Content Analysis: Focusing on Korean Dance Videos (유튜브(YouTube) 영상 콘텐츠 분석: 국내 무용 영상을 중심으로)

  • Suejung Chae;Jihae Suh
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.1-13
    • /
    • 2023
  • The widespread adoption of smartphones and advancements in internet technology have notably shifted content consumption habits toward video. This research aims to dissect the nature of videos posted on YouTube, the global video-sharing platform, to understand the characteristics of both produced and preferred content. For this study, dance was chosen as a specific subject from a variety of video categories. Data on YouTube videos associated with the term "dance" was compiled over three years, from 2019 to 2021. The investigation revealed a clear distinction between the types of dance videos frequently uploaded to YouTube and those that receive a high number of views. The empirical analysis of this study indicates a viewer preference for vlogs that provide insights into the daily lives of dance students, as well as for purpose-driven videos, such as those highlighting dance exam preparations or school dance events. Notably, the vlogs that attract the most attention are typically created by dance students at the college or secondary school level, rather than by professionals. Although the study was focused on dance, its methodologies can be applied to different subjects. These insights are expected to contribute to the development of a recommendation system that aids content creators in effectively targeting their productions.

Comparative analysis of informationattributes inchemical accident response systems through Unstructured Data: Spotlighting on the OECD Guidelines for Chemical Accident Prevention, Preparedness, and Response (비정형 데이터를 이용한 화학물질 사고 대응 체계 정보속성 비교 분석 : 화학사고 예방, 대비 및 대응을 위한 OECD 지침서를 중심으로)

  • YongJin Kim;Chunghyun Do
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.91-110
    • /
    • 2023
  • The importance of manuals is emphasized because chemical accidents require swift response and recovery, and often result in environmental pollution and casualties. In this regard, the OECD revised OECD Guidelines for the Prevention, Preparedness, and Response to Chemical Accidents (referred to as the OECD Guidelines), in June 2023. Moreover, while existing research primarily raises awareness about chemical accidents, highlighting the need for a system-wide response including laws, regulations, and manuals, it was difficult to find comparative research on the attributes of manuals. So, this paper aims to compare and analyze the second and third editions of the OECD Guidelines, in order to uncover the information attributes and implications of the revised version. Specifically, TF-IDF (Term Frequency-Inverse Document Frequency) was applied to understand which keywords have become more important, and Word2Vec was applied to identify keywords that were used similarly and those that were differentiated. Lastly, a 2×2 matrix was proposed, identifying the topics within each quadrant to provide a deeper comparison of the information attributes of the OECD Guidelines. This study offers a framework to help researchers understand information attributes. From a practical perspective, it appears valuable for the revision of standard manuals by domestic government agencies and corporations related to chemistry.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

The Effect of Expert Reviews on Consumer Product Evaluations: A Text Mining Approach (전문가 제품 후기가 소비자 제품 평가에 미치는 영향: 텍스트마이닝 분석을 중심으로)

  • Kang, Taeyoung;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.63-82
    • /
    • 2016
  • Individuals gather information online to resolve problems in their daily lives and make various decisions about the purchase of products or services. With the revolutionary development of information technology, Web 2.0 has allowed more people to easily generate and use online reviews such that the volume of information is rapidly increasing, and the usefulness and significance of analyzing the unstructured data have also increased. This paper presents an analysis on the lexical features of expert product reviews to determine their influence on consumers' purchasing decisions. The focus was on how unstructured data can be organized and used in diverse contexts through text mining. In addition, diverse lexical features of expert reviews of contents provided by a third-party review site were extracted and defined. Expert reviews are defined as evaluations by people who have expert knowledge about specific products or services in newspapers or magazines; this type of review is also called a critic review. Consumers who purchased products before the widespread use of the Internet were able to access expert reviews through newspapers or magazines; thus, they were not able to access many of them. Recently, however, major media also now provide online services so that people can more easily and affordably access expert reviews compared to the past. The reason why diverse reviews from experts in several fields are important is that there is an information asymmetry where some information is not shared among consumers and sellers. The information asymmetry can be resolved with information provided by third parties with expertise to consumers. Then, consumers can read expert reviews and make purchasing decisions by considering the abundant information on products or services. Therefore, expert reviews play an important role in consumers' purchasing decisions and the performance of companies across diverse industries. If the influence of qualitative data such as reviews or assessment after the purchase of products can be separately identified from the quantitative data resources, such as the actual quality of products or price, it is possible to identify which aspects of product reviews hamper or promote product sales. Previous studies have focused on the characteristics of the experts themselves, such as the expertise and credibility of sources regarding expert reviews; however, these studies did not suggest the influence of the linguistic features of experts' product reviews on consumers' overall evaluation. However, this study focused on experts' recommendations and evaluations to reveal the lexical features of expert reviews and whether such features influence consumers' overall evaluations and purchasing decisions. Real expert product reviews were analyzed based on the suggested methodology, and five lexical features of expert reviews were ultimately determined. Specifically, the "review depth" (i.e., degree of detail of the expert's product analysis), and "lack of assurance" (i.e., degree of confidence that the expert has in the evaluation) have statistically significant effects on consumers' product evaluations. In contrast, the "positive polarity" (i.e., the degree of positivity of an expert's evaluations) has an insignificant effect, while the "negative polarity" (i.e., the degree of negativity of an expert's evaluations) has a significant negative effect on consumers' product evaluations. Finally, the "social orientation" (i.e., the degree of how many social expressions experts include in their reviews) does not have a significant effect on consumers' product evaluations. In summary, the lexical properties of the product reviews were defined according to each relevant factor. Then, the influence of each linguistic factor of expert reviews on the consumers' final evaluations was tested. In addition, a test was performed on whether each linguistic factor influencing consumers' product evaluations differs depending on the lexical features. The results of these analyses should provide guidelines on how individuals process massive volumes of unstructured data depending on lexical features in various contexts and how companies can use this mechanism from their perspective. This paper provides several theoretical and practical contributions, such as the proposal of a new methodology and its application to real data.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.