• Title/Summary/Keyword: intelligent system

Search Result 9,785, Processing Time 0.033 seconds

The Relationships among Perceived Value, Use-Diffusion, Loyalty of Mobile Instant Messaging Service (모바일 메신저 서비스의 지각된 가치, 사용-확산 그리고 충성도 간의 관계에 대한 연구)

  • Jo, Dong-Hyuk;Park, Jong-Woo;Chun, Hyun-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.193-212
    • /
    • 2011
  • Mobile instant messaging service is surfacing to an important keyword in the mobile market together with popularization of Smart phones. Mobile instant messaging service in Korea has become popular to the degree of 87.9% usages from total Smartphone holders, and it is expected that using populations will be more enlarged afterwards if considering a fact that its populations of Smartphone is continuously being increased after exceeding 10 million persons (Trend Monitor, June 2011). In the instant messaging market where competitions have been deepened day by day, raising customer's royalties will be the key for company's business survivals and goals of corporate marketing strategies. It could be said that understanding on which factors affect to customer retentions and royalties is very important. Specially, as changing status is being progressed very quickly in case of innovative mobile services like the instant messaging service, research necessities on how many do consumers use the services after accepting them, how much do consumers use them variously, and whether does it connect to long-term relations have been increased, but studies on such matters are in insufficient situations actually. Therefore, this study examined on which effects were affected to use-diffusion and loyalty factors from perceived customer vales' factors having been occurred after accepting the mobile instant messaging service, namely 'functional value', 'monetary value', 'emotional value', and 'social value'. Also, the study looked into what kind of roles do the service usage and using variety play to service's continued using intents as a loyalty index, recommending intents to others, and brand switching intents. And then the study laid the main purpose in trying to provide implications for enhancing customer securities and royalties on the mobile instant messaging service through research's results. The research hypotheses are as follows; H1: Perceived values will affect influences to royalties. H2: Use-Diffusion will affect influences to loyalty. H3: Perceived value will affect influences to loyalty. H4: The use-diffusion will play intermediating roles between perceived values and loyalty. Total 276 cases among collected 284 ones were used for the statistical analysis by SPSS ver. 15 package. Reliability, Factor analysis, regression were done. As the result of research, 'monetary value' and 'emotional value' affected to 'usage' among perceived value factors, and 'emotional value' was appeared as affecting the largest influence. Besides, the usage affected to constant-using intents and recommending intents to others, and using varieties were displayed as affecting to recommending intents to others. On the other hand, 'Using' and 'Using diversity' were appeared as not affecting to 'brand switching intentions'. Meanwhile, as the result of recognizing about effects of perceived values on the loyalty, it was appeared such like 'continued using intents' affected to'functional value', 'monetary value', and 'social value' first, and also 'monetary value', 'emotional value', and 'social value' affected to 'recommending intents to others'. On the other hand, it was shown such like only 'social value' affected influences to 'brand switching intents', and thus contrary results with the factor 'constant-using intents' were displayed. So, it seems that there are many applications to service provides who are worrying about marketing strategies for making consumer retains (constant-using) and new consumer's inductions (brand-switching intents). Finally, as a result of looking into intermediating roles of the use-diffusion factor in relations between conceived values and royalties at hypothesis 4, 'using' and 'using diversity' were displayed as affecting significant influences all together. Regarding to research result's implications, for expanding and promoting continued uses of the mobile instant messaging service by service providers: First, encouraging recognitions on the perceived value connected to users' service usage are necessary. Second, setting up user's use-diffusion strategies are required so as to enhance the loyalty after understanding a fact that use-diffusion patterns affecting to the service's loyalty are different. Finally, methods of raising customer loyalties and making constant relationships have to be grouped by analyzing on what are the customer value's factors that can satisfy users in competitive alterations.

Corporate Credit Rating based on Bankruptcy Probability Using AdaBoost Algorithm-based Support Vector Machine (AdaBoost 알고리즘기반 SVM을 이용한 부실 확률분포 기반의 기업신용평가)

  • Shin, Taek-Soo;Hong, Tae-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.25-41
    • /
    • 2011
  • Recently, support vector machines (SVMs) are being recognized as competitive tools as compared with other data mining techniques for solving pattern recognition or classification decision problems. Furthermore, many researches, in particular, have proved them more powerful than traditional artificial neural networks (ANNs) (Amendolia et al., 2003; Huang et al., 2004, Huang et al., 2005; Tay and Cao, 2001; Min and Lee, 2005; Shin et al., 2005; Kim, 2003).The classification decision, such as a binary or multi-class decision problem, used by any classifier, i.e. data mining techniques is so cost-sensitive particularly in financial classification problems such as the credit ratings that if the credit ratings are misclassified, a terrible economic loss for investors or financial decision makers may happen. Therefore, it is necessary to convert the outputs of the classifier into wellcalibrated posterior probabilities-based multiclass credit ratings according to the bankruptcy probabilities. However, SVMs basically do not provide such probabilities. So it required to use any method to create the probabilities (Platt, 1999; Drish, 2001). This paper applied AdaBoost algorithm-based support vector machines (SVMs) into a bankruptcy prediction as a binary classification problem for the IT companies in Korea and then performed the multi-class credit ratings of the companies by making a normal distribution shape of posterior bankruptcy probabilities from the loss functions extracted from the SVMs. Our proposed approach also showed that their methods can minimize the misclassification problems by adjusting the credit grade interval ranges on condition that each credit grade for credit loan borrowers has its own credit risk, i.e. bankruptcy probability.

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

Extracting Beginning Boundaries for Efficient Management of Movie Storytelling Contents (스토리텔링 콘텐츠의 효과적인 관리를 위한 영화 스토리 발단부의 자동 경계 추출)

  • Park, Seung-Bo;You, Eun-Soon;Jung, Jason J.
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.279-292
    • /
    • 2011
  • Movie is a representative media that can transmit stories to audiences. Basically, a story is described by characters in the movie. Different from other simple videos, movies deploy narrative structures for explaining various conflicts or collaborations between characters. These narrative structures consist of 3 main acts, which are beginning, middle, and ending. The beginning act includes 1) introduction to main characters and backgrounds, and 2) conflicts implication and clues for incidents. The middle act describes the events developed by both inside and outside factors and the story dramatic tension heighten. Finally, in the end act, the events are developed are resolved, and the topic of story and message of writer are transmitted. When story information is extracted from movie, it is needed to consider that it has different weights by narrative structure. Namely, when some information is extracted, it has a different influence to story deployment depending on where it locates at the beginning, middle and end acts. The beginning act is the part that exposes to audiences for story set-up various information such as setting of characters and depiction of backgrounds. And thus, it is necessary to extract much kind information from the beginning act in order to abstract a movie or retrieve character information. Thereby, this paper proposes a novel method for extracting the beginning boundaries. It is the method that detects a boundary scene between the beginning act and middle using the accumulation graph of characters. The beginning act consists of the scenes that introduce important characters, imply the conflict relationship between them, and suggest clues to resolve troubles. First, a scene that the new important characters don't appear any more should be detected in order to extract a scene completed the introduction of them. The important characters mean the major and minor characters, which can be dealt as important characters since they lead story progression. Extra should be excluded in order to extract a scene completed the introduction of important characters in the accumulation graph of characters. Extra means the characters that appear only several scenes. Second, the inflection point is detected in the accumulation graph of characters. It is the point that the increasing line changes to horizontal line. Namely, when the slope of line keeps zero during long scenes, starting point of this line with zero slope becomes the inflection point. Inflection point will be detected in the accumulation graph of characters without extra. Third, several scenes are considered as additional story progression such as conflicts implication and clues suggestion. Actually, movie story can arrive at a scene located between beginning act and middle when additional several scenes are elapsed after the introduction of important characters. We will decide the ratio of additional scenes for total scenes by experiment in order to detect this scene. The ratio of additional scenes is gained as 7.67% by experiment. It is the story inflection point to change from beginning to middle act when this ratio is added to the inflection point of graph. Our proposed method consists of these three steps. We selected 10 movies for experiment and evaluation. These movies consisted of various genres. By measuring the accuracy of boundary detection experiment, we have shown that the proposed method is more efficient.

A Hybrid Forecasting Framework based on Case-based Reasoning and Artificial Neural Network (사례기반 추론기법과 인공신경망을 이용한 서비스 수요예측 프레임워크)

  • Hwang, Yousub
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.43-57
    • /
    • 2012
  • To enhance the competitive advantage in a constantly changing business environment, an enterprise management must make the right decision in many business activities based on both internal and external information. Thus, providing accurate information plays a prominent role in management's decision making. Intuitively, historical data can provide a feasible estimate through the forecasting models. Therefore, if the service department can estimate the service quantity for the next period, the service department can then effectively control the inventory of service related resources such as human, parts, and other facilities. In addition, the production department can make load map for improving its product quality. Therefore, obtaining an accurate service forecast most likely appears to be critical to manufacturing companies. Numerous investigations addressing this problem have generally employed statistical methods, such as regression or autoregressive and moving average simulation. However, these methods are only efficient for data with are seasonal or cyclical. If the data are influenced by the special characteristics of product, they are not feasible. In our research, we propose a forecasting framework that predicts service demand of manufacturing organization by combining Case-based reasoning (CBR) and leveraging an unsupervised artificial neural network based clustering analysis (i.e., Self-Organizing Maps; SOM). We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the service forecasting domain. Our proposed approach has several appealing features : (1) We applied CBR and SOM in a new forecasting domain such as service demand forecasting. (2) We proposed our combined approach between CBR and SOM in order to overcome limitations of traditional statistical forecasting methods and We have developed a service forecasting tool based on the proposed approach using an unsupervised artificial neural network and Case-based reasoning. In this research, we conducted an empirical study on a real digital TV manufacturer (i.e., Company A). In addition, we have empirically evaluated the proposed approach and tool using real sales and service related data from digital TV manufacturer. In our empirical experiments, we intend to explore the performance of our proposed service forecasting framework when compared to the performances predicted by other two service forecasting methods; one is traditional CBR based forecasting model and the other is the existing service forecasting model used by Company A. We ran each service forecasting 144 times; each time, input data were randomly sampled for each service forecasting framework. To evaluate accuracy of forecasting results, we used Mean Absolute Percentage Error (MAPE) as primary performance measure in our experiments. We conducted one-way ANOVA test with the 144 measurements of MAPE for three different service forecasting approaches. For example, the F-ratio of MAPE for three different service forecasting approaches is 67.25 and the p-value is 0.000. This means that the difference between the MAPE of the three different service forecasting approaches is significant at the level of 0.000. Since there is a significant difference among the different service forecasting approaches, we conducted Tukey's HSD post hoc test to determine exactly which means of MAPE are significantly different from which other ones. In terms of MAPE, Tukey's HSD post hoc test grouped the three different service forecasting approaches into three different subsets in the following order: our proposed approach > traditional CBR-based service forecasting approach > the existing forecasting approach used by Company A. Consequently, our empirical experiments show that our proposed approach outperformed the traditional CBR based forecasting model and the existing service forecasting model used by Company A. The rest of this paper is organized as follows. Section 2 provides some research background information such as summary of CBR and SOM. Section 3 presents a hybrid service forecasting framework based on Case-based Reasoning and Self-Organizing Maps, while the empirical evaluation results are summarized in Section 4. Conclusion and future research directions are finally discussed in Section 5.

Measuring the Economic Impact of Item Descriptions on Sales Performance (온라인 상품 판매 성과에 영향을 미치는 상품 소개글 효과 측정 기법)

  • Lee, Dongwon;Park, Sung-Hyuk;Moon, Songchun
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.1-17
    • /
    • 2012
  • Personalized smart devices such as smartphones and smart pads are widely used. Unlike traditional feature phones, theses smart devices allow users to choose a variety of functions, which support not only daily experiences but also business operations. Actually, there exist a huge number of applications accessible by smart device users in online and mobile application markets. Users can choose apps that fit their own tastes and needs, which is impossible for conventional phone users. With the increase in app demand, the tastes and needs of app users are becoming more diverse. To meet these requirements, numerous apps with diverse functions are being released on the market, which leads to fierce competition. Unlike offline markets, online markets have a limitation in that purchasing decisions should be made without experiencing the items. Therefore, online customers rely more on item-related information that can be seen on the item page in which online markets commonly provide details about each item. Customers can feel confident about the quality of an item through the online information and decide whether to purchase it. The same is true of online app markets. To win the sales competition against other apps that perform similar functions, app developers need to focus on writing app descriptions to attract the attention of customers. If we can measure the effect of app descriptions on sales without regard to the app's price and quality, app descriptions that facilitate the sale of apps can be identified. This study intends to provide such a quantitative result for app developers who want to promote the sales of their apps. For this purpose, we collected app details including the descriptions written in Korean from one of the largest app markets in Korea, and then extracted keywords from the descriptions. Next, the impact of the keywords on sales performance was measured through our econometric model. Through this analysis, we were able to analyze the impact of each keyword itself, apart from that of the design or quality. The keywords, comprised of the attribute and evaluation of each app, are extracted by a morpheme analyzer. Our model with the keywords as its input variables was established to analyze their impact on sales performance. A regression analysis was conducted for each category in which apps are included. This analysis was required because we found the keywords, which are emphasized in app descriptions, different category-by-category. The analysis conducted not only for free apps but also for paid apps showed which keywords have more impact on sales performance for each type of app. In the analysis of paid apps in the education category, keywords such as 'search+easy' and 'words+abundant' showed higher effectiveness. In the same category, free apps whose keywords emphasize the quality of apps showed higher sales performance. One interesting fact is that keywords describing not only the app but also the need for the app have asignificant impact. Language learning apps, regardless of whether they are sold free or paid, showed higher sales performance by including the keywords 'foreign language study+important'. This result shows that motivation for the purchase affected sales. While item reviews are widely researched in online markets, item descriptions are not very actively studied. In the case of the mobile app markets, newly introduced apps may not have many item reviews because of the low quantity sold. In such cases, item descriptions can be regarded more important when customers make a decision about purchasing items. This study is the first trial to quantitatively analyze the relationship between an item description and its impact on sales performance. The results show that our research framework successfully provides a list of the most effective sales key terms with the estimates of their effectiveness. Although this study is performed for a specified type of item (i.e., mobile apps), our model can be applied to almost all of the items traded in online markets.

Analyzing the Effect of Online media on Overseas Travels: A Case study of Asian 5 countries (해외 출국에 영향을 미치는 온라인 미디어 효과 분석: 아시아 5개국을 중심으로)

  • Lee, Hea In;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.53-74
    • /
    • 2018
  • Since South Korea has an economic structure that has a characteristic which market-dependent on overseas, the tourism industry is considered as a very important industry for the national economy, such as improving the country's balance of payments or providing income and employment increases. Accordingly, the necessity of more accurate forecasting on the demand in the tourism industry has been raised to promote its industry. In the related research, economic variables such as exchange rate and income have been used as variables influencing tourism demand. As information technology has been widely used, some researchers have also analyzed the effect of media on tourism demand. It has shown that the media has a considerable influence on traveler's decision making, such as choosing an outbound destination. Furthermore, with the recent availability of online information searches to obtain the latest information and two-way communication in social media, it is possible to obtain up-to-date information on travel more quickly than before. The information in online media such as blogs can naturally create the Word-of-Mouth effect by sharing useful information, which is called eWOM. Like all other service industries, the tourism industry is characterized by difficulty in evaluating its values before it is experienced directly. And furthermore, most of the travelers tend to search for more information in advance from various sources to reduce the perceived risk to the destination, so they can also be influenced by online media such as online news. In this study, we suggested that the number of online media posting, which causes the effects of Word-of-Mouth, may have an effect on the number of outbound travelers. We divided online media into public media and private media according to their characteristics and selected online news as public media and blog as private media, one of the most popular social media in tourist information. Based on the previous studies about the eWOM effects on online news and blog, we analyzed a relationship between the volume of eWOM and the outbound tourism demand through the panel model. To this end, we collected data on the number of national outbound travelers from 2007 to 2015 provided by the Korea Tourism Organization. According to statistics, the highest number of outbound tourism demand in Korea are China, Japan, Thailand, Hong Kong and the Philippines, which are selected as a dependent variable in this study. In order to measure the volume of eWOM, we collected online news and blog postings for the same period as the number of outbound travelers in Naver, which is the largest portal site in South Korea. In this study, a panel model was established to analyze the effect of online media on the demand of Korean outbound travelers and to identify that there was a significant difference in the influence of online media by each time and countries. The results of this study can be summarized as follows. First, the impact of the online news and blog eWOM on the number of outbound travelers was significant. We found that the number of online news and blog posting have an influence on the number of outbound travelers, especially the experimental result suggests that both the month that includes the departure date and the three months before the departure were found to have an effect. It is shown that online news and blog are online media that have a significant influence on outbound tourism demand. Next, we found that the increased volume of eWOM in online news has a negative effect on departure, while the increase in a blog has a positive effect. The result with the country-specific models would be the same. This paper shows that online media can be used as a new variable in tourism demand by examining the influence of the eWOM effect of the online media. Also, we found that both social media and news media have an important role in predicting and managing the Korean tourism demand and that the influence of those two media appears different depending on the country.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.