• Title/Summary/Keyword: Research Information Systems

Search Result 12,210, Processing Time 0.053 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

  • Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.49-65
    • /
    • 2014
  • Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

Modern Paper Quality Control

  • Olavi Komppa
    • Proceedings of the Korea Technical Association of the Pulp and Paper Industry Conference
    • /
    • 2000.06a
    • /
    • pp.16-23
    • /
    • 2000
  • The increasing functional needs of top-quality printing papers and packaging paperboards, and especially the rapid developments in electronic printing processes and various computer printers during past few years, set new targets and requirements for modern paper quality. Most of these paper grades of today have relatively high filler content, are moderately or heavily calendered , and have many coating layers for the best appearance and performance. In practice, this means that many of the traditional quality assurance methods, mostly designed to measure papers made of pure. native pulp only, can not reliably (or at all) be used to analyze or rank the quality of modern papers. Hence, introduction of new measurement techniques is necessary to assure and further develop the paper quality today and in the future. Paper formation , i.e. small scale (millimeter scale) variation of basis weight, is the most important quality parameter of paper-making due to its influence on practically all the other quality properties of paper. The ideal paper would be completely uniform so that the basis weight of each small point (area) measured would be the same. In practice, of course, this is not possible because there always exists relatively large local variations in paper. However, these small scale basis weight variations are the major reason for many other quality problems, including calender blacking uneven coating result, uneven printing result, etc. The traditionally used visual inspection or optical measurement of the paper does not give us a reliable understanding of the material variations in the paper because in modern paper making process the optical behavior of paper is strongly affected by using e.g. fillers, dye or coating colors. Futhermore, the opacity (optical density) of the paper is changed at different process stages like wet pressing and calendering. The greatest advantage of using beta transmission method to measure paper formation is that it can be very reliably calibrated to measure true basis weight variation of all kinds of paper and board, independently on sample basis weight or paper grade. This gives us the possibility to measure, compare and judge papers made of different raw materials, different color, or even to measure heavily calendered, coated or printed papers. Scientific research of paper physics has shown that the orientation of the top layer (paper surface) fibers of the sheet paly the key role in paper curling and cockling , causing the typical practical problems (paper jam) with modern fax and copy machines, electronic printing , etc. On the other hand, the fiber orientation at the surface and middle layer of the sheet controls the bending stiffness of paperboard . Therefore, a reliable measurement of paper surface fiber orientation gives us a magnificent tool to investigate and predict paper curling and coclking tendency, and provides the necessary information to finetune, the manufacturing process for optimum quality. many papers, especially heavily calendered and coated grades, do resist liquid and gas penetration very much, bing beyond the measurement range of the traditional instruments or resulting invonveniently long measuring time per sample . The increased surface hardness and use of filler minerals and mechanical pulp make a reliable, nonleaking sample contact to the measurement head a challenge of its own. Paper surface coating causes, as expected, a layer which has completely different permeability characteristics compared to the other layer of the sheet. The latest developments in sensor technologies have made it possible to reliably measure gas flow in well controlled conditions, allowing us to investigate the gas penetration of open structures, such as cigarette paper, tissue or sack paper, and in the low permeability range analyze even fully greaseproof papers, silicon papers, heavily coated papers and boards or even detect defects in barrier coatings ! Even nitrogen or helium may be used as the gas, giving us completely new possibilities to rank the products or to find correlation to critical process or converting parameters. All the modern paper machines include many on-line measuring instruments which are used to give the necessary information for automatic process control systems. hence, the reliability of this information obtained from different sensors is vital for good optimizing and process stability. If any of these on-line sensors do not operate perfectly ass planned (having even small measurement error or malfunction ), the process control will set the machine to operate away from the optimum , resulting loss of profit or eventual problems in quality or runnability. To assure optimum operation of the paper machines, a novel quality assurance policy for the on-line measurements has been developed, including control procedures utilizing traceable, accredited standards for the best reliability and performance.

Comparison of Models for Stock Price Prediction Based on Keyword Search Volume According to the Social Acceptance of Artificial Intelligence (인공지능의 사회적 수용도에 따른 키워드 검색량 기반 주가예측모형 비교연구)

  • Cho, Yujung;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.103-128
    • /
    • 2021
  • Recently, investors' interest and the influence of stock-related information dissemination are being considered as significant factors that explain stock returns and volume. Besides, companies that develop, distribute, or utilize innovative new technologies such as artificial intelligence have a problem that it is difficult to accurately predict a company's future stock returns and volatility due to macro-environment and market uncertainty. Market uncertainty is recognized as an obstacle to the activation and spread of artificial intelligence technology, so research is needed to mitigate this. Hence, the purpose of this study is to propose a machine learning model that predicts the volatility of a company's stock price by using the internet search volume of artificial intelligence-related technology keywords as a measure of the interest of investors. To this end, for predicting the stock market, we using the VAR(Vector Auto Regression) and deep neural network LSTM (Long Short-Term Memory). And the stock price prediction performance using keyword search volume is compared according to the technology's social acceptance stage. In addition, we also conduct the analysis of sub-technology of artificial intelligence technology to examine the change in the search volume of detailed technology keywords according to the technology acceptance stage and the effect of interest in specific technology on the stock market forecast. To this end, in this study, the words artificial intelligence, deep learning, machine learning were selected as keywords. Next, we investigated how many keywords each week appeared in online documents for five years from January 1, 2015, to December 31, 2019. The stock price and transaction volume data of KOSDAQ listed companies were also collected and used for analysis. As a result, we found that the keyword search volume for artificial intelligence technology increased as the social acceptance of artificial intelligence technology increased. In particular, starting from AlphaGo Shock, the keyword search volume for artificial intelligence itself and detailed technologies such as machine learning and deep learning appeared to increase. Also, the keyword search volume for artificial intelligence technology increases as the social acceptance stage progresses. It showed high accuracy, and it was confirmed that the acceptance stages showing the best prediction performance were different for each keyword. As a result of stock price prediction based on keyword search volume for each social acceptance stage of artificial intelligence technologies classified in this study, the awareness stage's prediction accuracy was found to be the highest. The prediction accuracy was different according to the keywords used in the stock price prediction model for each social acceptance stage. Therefore, when constructing a stock price prediction model using technology keywords, it is necessary to consider social acceptance of the technology and sub-technology classification. The results of this study provide the following implications. First, to predict the return on investment for companies based on innovative technology, it is most important to capture the recognition stage in which public interest rapidly increases in social acceptance of the technology. Second, the change in keyword search volume and the accuracy of the prediction model varies according to the social acceptance of technology should be considered in developing a Decision Support System for investment such as the big data-based Robo-advisor recently introduced by the financial sector.

A Checklist to Improve the Fairness in AI Financial Service: Focused on the AI-based Credit Scoring Service (인공지능 기반 금융서비스의 공정성 확보를 위한 체크리스트 제안: 인공지능 기반 개인신용평가를 중심으로)

  • Kim, HaYeong;Heo, JeongYun;Kwon, Hochang
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.259-278
    • /
    • 2022
  • With the spread of Artificial Intelligence (AI), various AI-based services are expanding in the financial sector such as service recommendation, automated customer response, fraud detection system(FDS), credit scoring services, etc. At the same time, problems related to reliability and unexpected social controversy are also occurring due to the nature of data-based machine learning. The need Based on this background, this study aimed to contribute to improving trust in AI-based financial services by proposing a checklist to secure fairness in AI-based credit scoring services which directly affects consumers' financial life. Among the key elements of trustworthy AI like transparency, safety, accountability, and fairness, fairness was selected as the subject of the study so that everyone could enjoy the benefits of automated algorithms from the perspective of inclusive finance without social discrimination. We divided the entire fairness related operation process into three areas like data, algorithms, and user areas through literature research. For each area, we constructed four detailed considerations for evaluation resulting in 12 checklists. The relative importance and priority of the categories were evaluated through the analytic hierarchy process (AHP). We use three different groups: financial field workers, artificial intelligence field workers, and general users which represent entire financial stakeholders. According to the importance of each stakeholder, three groups were classified and analyzed, and from a practical perspective, specific checks such as feasibility verification for using learning data and non-financial information and monitoring new inflow data were identified. Moreover, financial consumers in general were found to be highly considerate of the accuracy of result analysis and bias checks. We expect this result could contribute to the design and operation of fair AI-based financial services.

A Study on the Effect of Booth Recommendation System on Exhibition Visitors Unplanned Visit Behavior (전시장 참관객의 계획되지 않은 방문행동에 있어서 부스추천시스템의 영향에 대한 연구)

  • Chung, Nam-Ho;Kim, Jae-Kyung
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.175-191
    • /
    • 2011
  • With the MICE(Meeting, Incentive travel, Convention, Exhibition) industry coming into the spotlight, there has been a growing interest in the domestic exhibition industry. Accordingly, in Korea, various studies of the industry are being conducted to enhance exhibition performance as in the United States or Europe. Some studies are focusing particularly on analyzing visiting patterns of exhibition visitors using intelligent information technology in consideration of the variations in effects of watching exhibitions according to the exhibitory environment or technique, thereby understanding visitors and, furthermore, drawing the correlations between exhibiting businesses and improving exhibition performance. However, previous studies related to booth recommendation systems only discussed the accuracy of recommendation in the aspect of a system rather than determining changes in visitors' behavior or perception by recommendation. A booth recommendation system enables visitors to visit unplanned exhibition booths by recommending visitors suitable ones based on information about visitors' visits. Meanwhile, some visitors may be satisfied with their unplanned visits, while others may consider the recommending process to be cumbersome or obstructive to their free observation. In the latter case, the exhibition is likely to produce worse results compared to when visitors are allowed to freely observe the exhibition. Thus, in order to apply a booth recommendation system to exhibition halls, the factors affecting the performance of the system should be generally examined, and the effects of the system on visitors' unplanned visiting behavior should be carefully studied. As such, this study aims to determine the factors that affect the performance of a booth recommendation system by reviewing theories and literature and to examine the effects of visitors' perceived performance of the system on their satisfaction of unplanned behavior and intention to reuse the system. Toward this end, the unplanned behavior theory was adopted as the theoretical framework. Unplanned behavior can be defined as "behavior that is done by consumers without any prearranged plan". Thus far, consumers' unplanned behavior has been studied in various fields. The field of marketing, in particular, has focused on unplanned purchasing among various types of unplanned behavior, which has been often confused with impulsive purchasing. Nevertheless, the two are different from each other; while impulsive purchasing means strong, continuous urges to purchase things, unplanned purchasing is behavior with purchasing decisions that are made inside a store, not before going into one. In other words, all impulsive purchases are unplanned, but not all unplanned purchases are impulsive. Then why do consumers engage in unplanned behavior? Regarding this question, many scholars have made many suggestions, but there has been a consensus that it is because consumers have enough flexibility to change their plans in the middle instead of developing plans thoroughly. In other words, if unplanned behavior costs much, it will be difficult for consumers to change their prearranged plans. In the case of the exhibition hall examined in this study, visitors learn the programs of the hall and plan which booth to visit in advance. This is because it is practically impossible for visitors to visit all of the various booths that an exhibition operates due to their limited time. Therefore, if the booth recommendation system proposed in this study recommends visitors booths that they may like, they can change their plans and visit the recommended booths. Such visiting behavior can be regarded similarly to consumers' visit to a store or tourists' unplanned behavior in a tourist spot and can be understand in the same context as the recent increase in tourism consumers' unplanned behavior influenced by information devices. Thus, the following research model was established. This research model uses visitors' perceived performance of a booth recommendation system as the parameter, and the factors affecting the performance include trust in the system, exhibition visitors' knowledge levels, expected personalization of the system, and the system's threat to freedom. In addition, the causal relation between visitors' satisfaction of their perceived performance of the system and unplanned behavior and their intention to reuse the system was determined. While doing so, trust in the booth recommendation system consisted of 2nd order factors such as competence, benevolence, and integrity, while the other factors consisted of 1st order factors. In order to verify this model, a booth recommendation system was developed to be tested in 2011 DMC Culture Open, and 101 visitors were empirically studied and analyzed. The results are as follows. First, visitors' trust was the most important factor in the booth recommendation system, and the visitors who used the system perceived its performance as a success based on their trust. Second, visitors' knowledge levels also had significant effects on the performance of the system, which indicates that the performance of a recommendation system requires an advance understanding. In other words, visitors with higher levels of understanding of the exhibition hall learned better the usefulness of the booth recommendation system. Third, expected personalization did not have significant effects, which is a different result from previous studies' results. This is presumably because the booth recommendation system used in this study did not provide enough personalized services. Fourth, the recommendation information provided by the booth recommendation system was not considered to threaten or restrict one's freedom, which means it is valuable in terms of usefulness. Lastly, high performance of the booth recommendation system led to visitors' high satisfaction levels of unplanned behavior and intention to reuse the system. To sum up, in order to analyze the effects of a booth recommendation system on visitors' unplanned visits to a booth, empirical data were examined based on the unplanned behavior theory and, accordingly, useful suggestions for the establishment and design of future booth recommendation systems were made. In the future, further examination should be conducted through elaborate survey questions and survey objects.

Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

  • Park, Hyun-Jung;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.19-43
    • /
    • 2014
  • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.