Search | Korea Science

Real-time CRM Strategy of Big Data and Smart Offering System: KB Kookmin Card Case (KB국민카드의 빅데이터를 활용한 실시간 CRM 전략: 스마트 오퍼링 시스템)

Choi, Jaewon;Sohn, Bongjin;Lim, Hyuna
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.1-23
- /
- 2019
Big data refers to data that is difficult to store, manage, and analyze by existing software. As the lifestyle changes of consumers increase the size and types of needs that consumers desire, they are investing a lot of time and money to understand the needs of consumers. Companies in various industries utilize Big Data to improve their products and services to meet their needs, analyze unstructured data, and respond to real-time responses to products and services. The financial industry operates a decision support system that uses financial data to develop financial products and manage customer risks. The use of big data by financial institutions can effectively create added value of the value chain, and it is possible to develop a more advanced customer relationship management strategy. Financial institutions can utilize the purchase data and unstructured data generated by the credit card, and it becomes possible to confirm and satisfy the customer's desire. CRM has a granular process that can be measured in real time as it grows with information knowledge systems. With the development of information service and CRM, the platform has change and it has become possible to meet consumer needs in various environments. Recently, as the needs of consumers have diversified, more companies are providing systematic marketing services using data mining and advanced CRM (Customer Relationship Management) techniques. KB Kookmin Card, which started as a credit card business in 1980, introduced early stabilization of processes and computer systems, and actively participated in introducing new technologies and systems. In 2011, the bank and credit card companies separated, leading the 'Hye-dam Card' and 'One Card' markets, which were deviated from the existing concept. In 2017, the total use of domestic credit cards and check cards grew by 5.6% year-on-year to 886 trillion won. In 2018, we received a long-term rating of AA + as a result of our credit card evaluation. We confirmed that our credit rating was at the top of the list through effective marketing strategies and services. At present, Kookmin Card emphasizes strategies to meet the individual needs of customers and to maximize the lifetime value of consumers by utilizing payment data of customers. KB Kookmin Card combines internal and external big data and conducts marketing in real time or builds a system for monitoring. KB Kookmin Card has built a marketing system that detects realtime behavior using big data such as visiting the homepage and purchasing history by using the customer card information. It is designed to enable customers to capture action events in real time and execute marketing by utilizing the stores, locations, amounts, usage pattern, etc. of the card transactions. We have created more than 280 different scenarios based on the customer's life cycle and are conducting marketing plans to accommodate various customer groups in real time. We operate a smart offering system, which is a highly efficient marketing management system that detects customers' card usage, customer behavior, and location information in real time, and provides further refinement services by combining with various apps. This study aims to identify the traditional CRM to the current CRM strategy through the process of changing the CRM strategy. Finally, I will confirm the current CRM strategy through KB Kookmin card's big data utilization strategy and marketing activities and propose a marketing plan for KB Kookmin card's future CRM strategy. KB Kookmin Card should invest in securing ICT technology and human resources, which are becoming more sophisticated for the success and continuous growth of smart offering system. It is necessary to establish a strategy for securing profit from a long-term perspective and systematically proceed. Especially, in the current situation where privacy violation and personal information leakage issues are being addressed, efforts should be made to induce customers' recognition of marketing using customer information and to form corporate image emphasizing security.
https://doi.org/10.13088/jiis.2019.25.2.001 인용 PDF KSCI HTML

A Study on the Simcho of Wooden Pagodas in Baekjae (백제의 심초 및 사리봉안)

Jung, Ja Young
- Korean Journal of Heritage: History & Science
- /
- v.41 no.1
- /
- pp.109-125
- /
- 2008
Recently, there has been an increase in excavation studies of wood pagodas from the Three Kingdoms and Unified Shilla periods and new data related to wood pagoda erection are being found bringing about progress in research on this field. In other words, studies on wooden pagodas in Korea were composed mainly of flat, axis construction techniques and sarijangeomgu, but by acquiring new data, it has now become possible to study not only the stylobate construction procedure and transition, but also studies on restoring wooden pagodas. Furthermore, pagoda sites similar to this were found in China and Japan as well, making it possible to make comparative studies among ancient wooden pagodas possible. In this paper, the main remains were set as Baekjae wooden pagodas, which were the most frequently studied and among the wooden pagodas, the simcho (central base stone) and sarira housing locations. In result, simcho can be found changing its position from underground ${\rightarrow}$ halfway underground ${\rightarrow}$ above ground. Baekjae wooden pagodas up until the mid sixth century located at Neungsan-ri saji (AD 567) and Wangheungsaji (AD 577) had its simcho located underground and later it was constructed halfway underground and then above ground. It was confirmed that in the 7th century, it became customary to place above ground as seen in the Jaeseoksaji (AD639) and Hwangnyongsaji (AD645) wooden pagoda sites. The sarira was usually located on the south side of the simcho, but gradually changed to the center. In particular, sarira were combined in the simcho in the mid sixth century at the Wangheungsaji. This is approximately 11 years earlier than the Bijosa (AD 588) simcho found in Japan and this was not found even in the simcho of wooden pagodas in Yeongnyeongsa (AD 516) and Jopaengseong temple (AD 535~561) of China showing that the Wangheungsaji simcho was the earliest of its kind.
https://doi.org/10.22755/kjchs.2008.41.1.109 인용 PDF

On Using Near-surface Remote Sensing Observation for Evaluation Gross Primary Productivity and Net Ecosystem CO₂ Partitioning (근거리 원격탐사 기법을 이용한 총일차생산량 추정 및 순생태계 CO₂ 교환량 배분의 정확도 평가에 관하여)

Park, Juhan;Kang, Minseok;Cho, Sungsik;Sohn, Seungwon;Kim, Jongho;Kim, Su-Jin;Lim, Jong-Hwan;Kang, Mingu;Shim, Kyo-Moon
- Korean Journal of Agricultural and Forest Meteorology
- /
- v.23 no.4
- /
- pp.251-267
- /
- 2021
Remotely sensed vegetation indices (VIs) are empirically related with gross primary productivity (GPP) in various spatio-temporal scales. The uncertainties in GPP-VI relationship increase with temporal resolution. Uncertainty also exists in the eddy covariance (EC)-based estimation of GPP, arising from the partitioning of the measured net ecosystem CO2 exchange (NEE) into GPP and ecosystem respiration (RE). For two forests and two agricultural sites, we correlated the EC-derived GPP in various time scales with three different near-surface remotely sensed VIs: (1) normalized difference vegetation index (NDVI), (2) enhanced vegetation index (EVI), and (3) near infrared reflectance from vegetation (NIRv) along with NIRvP (i.e., NIRv multiplied by photosynthetically active radiation, PAR). Among the compared VIs, NIRvP showed highest correlation with half-hourly and monthly GPP at all sites. The NIRvP was used to test the reliability of GPP derived by two different NEE partitioning methods: (1) original KoFlux methods (GPP_Ori) and (2) machine-learning based method (GPP_ANN). GPP_ANN showed higher correlation with NIRvP at half-hourly time scale, but there was no difference at daily time scale. The NIRvP-GPP correlation was lower under clear sky conditions due to co-limitation of GPP by other environmental conditions such as air temperature, vapor pressure deficit and soil moisture. However, under cloudy conditions when photosynthesis is mainly limited by radiation, the use of NIRvP was more promising to test the credibility of NEE partitioning methods. Despite the necessity of further analyses, the results suggest that NIRvP can be used as the proxy of GPP at high temporal-scale. However, for the VIs-based GPP estimation with high temporal resolution to be meaningful, complex systems-based analysis methods (related to systems thinking and self-organization that goes beyond the empirical VIs-GPP relationship) should be developed.
https://doi.org/10.5532/KJAFM.2021.23.4.251 인용 PDF KSCI

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.135-149
- /
- 2020
Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.
https://doi.org/10.13088/jiis.2020.26.1.135 인용 PDF KSCI

The Measurement of Sensitivity and Comparative Analysis of Simplified Quantitation Methods to Measure Dopamine Transporters Using [I-123]IPT Pharmacokinetic Computer Simulations ([I-123]IPT 약역학 컴퓨터시뮬레이션을 이용한 민감도 측정 및 간편화된 운반체 정량분석 방법들의 비교분석 연구)

Son, Hye-Kyung;Nha, Sang-Kyun;Lee, Hee-Kyung;Kim, Hee-Joung
- The Korean Journal of Nuclear Medicine
- /
- v.31 no.1
- /
- pp.19-29
- /
- 1997
Recently, [I-123]IPT SPECT has been used for early diagnosis of Parkinson's patients(PP) by imaging dopamine transporters. The dynamic time activity curves in basal ganglia(BG) and occipital cortex(OCC) without blood samples were obtained for 2 hours. These data were then used to measure dopamine transporters by operationally defined ratio methods of (BG-OCC)/OCC at 2 hrs, binding potential $R_v=k_3/k_4$ using graphic method or $R_A$= (ABBG-ABOCC)/ABOCC for 2 hrs, where ABBG represents accumulated binding activity in basal ganglia(${\int}^{120min}_0$ BG(t)dt) and ABOCC represents accumulated binding activity in occipital cortex(${\int}^{120min}_0$ OCC(t)dt). The purpose of this study was to examine the IPT pharmacokinetics and investigate the usefulness of simplified methods of (BG-OCC)/OCC, $R_A$, and $R_v$ which are often assumed that these values reflect the true values of $k_3/k_4$. The rate constants $K_1,\;k_2\;k_3$ and $k_4$ to be used for simulations were derived using [I-123]IPT SPECT and aterialized blood data with a standard three compartmental model. The sensitivities and time activity curves in BG and OCC were computed by changing $K_l$ and $k_3$(only BG) for every 5min over 2 hours. The values (BG-OCC)/OCC, $R_A$, and $R_v$ were then computed from the time activity curves and the linear regression analysis was used to measure the accuracies of these methods. The late constants $K_l,\;k_2\;k_3\;k_4$ at BG and OCC were $1.26{\pm}5.41%,\;0.044{\pm}19.58%,\;0.031{\pm}24.36%,\;0.008{\pm}22.78%$ and $1.36{\pm}4.76%,\;0.170{\pm}6.89%,\;0.007{\pm}23.89%,\;0.007{\pm}45.09%$, respectively. The Sensitivities for ((${\Delta}S/S$)/(${\Delta}k_3/k_3$)) and ((${\Delta}S/S$)/(${\Delta}K_l/K_l$)) at 30min and 120min were measured as (0.19, 0.50) and (0.61, 0,23), respectively. The correlation coefficients and slopes of ((BG-OCC)/OCC, $R_A$, and $R_v$) with $k_3/k_4$ were (0.98, 1.00, 0.99) and (1.76, 0.47, 1.25), respectively. These simulation results indicate that a late [I-123]IPT SPECT image may represent the distribution of the dopamine transporters. Good correlations were shown between (3G-OCC)/OCC, $R_A$ or $R_v$ and true $k_3/k_4$, although the slopes between them were not unity. Pharmacokinetic computer simulations may be a very useful technique in studying dopamine transporter systems.
PDF

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.109-122
- /
- 2014
People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.
https://doi.org/10.13088/jiis.2014.20.2.109 인용 PDF KSCI

Dispersion of Standing Stones at Noseongsan(Mt.Noseong) and Aspect of the Stone Decorated Garden(Soo-suk Jeongwon) at Chongsuk-Sa(Chongsuk Buddhist Temple) in Nonsan City (논산 노성산(魯城山)의 입석(立石) 분포와 총석사(叢石寺) 수석(樹石)의 정원적 면모)

Rho, Jae Hyun;Huh, Joon;Jang, Il Young
- Korean Journal of Heritage: History & Science
- /
- v.43 no.1
- /
- pp.160-189
- /
- 2010
This study has been designed to grasp the present situation, shapes and meaning of the standing stones and rock pillars in the whole area of Noseong Mountain Fortress in Nonsan City which have never been academically reported yet. Accordingly, the research was carried out to grasp the spatial identity of Noseong Mt. and Noseong Mountain Fortress and the dispersion of standing stones scattered around inside and outside Noseong Mountain Fortress, while the shapes and structural characteristics of stones were investigated and analyzed focusing on Chongsuk Temple, which was considered to have the highest density of standing stones and greatest values for preservation as a cultural property. In consideration of the reference to the 'Top Sa' (tower temple) at the 'Bul Woo Jo' (Article about Buddhism Houses) of 'Shinjoong Dongguk Yeoji Seungram', theoretical existence of the temple according to surveying investigation, and the excavation records of roof tile pieces with the name of 'Gwan Eum Temple', it is presumed that there had been a Buddhist sanctum inside the fortress and it could be connected to the carved letters, 'Chongsuk Temple'. According the observation survey, the 6th place of standing stones among many other places inside the fortress shows that Chongsuk Temple appears to have the strong characteristics of artificially constructed space in consideration of the size of trees and stones, the composite trend of tree and stone composition, and trace of the adjacent well and strand and the construction of stairway leading to the stone gate. Along with the constellation of the Big Dipper carved on a rock at the same space, the stones, on which the letters of 'Shinseonam', 'Chilseongam' and 'Daejangam' were carved, including 'Chongsuksa', and the carved statue of Buddha, which was assumed to be Avalokitesvara Guan Yin, have offered clue which make it possible to infer that the space was a space for Chilseong and Mountain god(Folk Belief) that had originated from the combination of Buddhism, Taoism and folk religion. According to the actual measurement of standing stones at Chonsuk Temple, it was identified that there were big differences in height among 24 stones in total, ranging from 402~29cm and the averaged distance between each stone appeared to be 23.6cm. And the shape of stones appeared to be standing or flat, and various stones such as mountain-like stones and Buddha-like stones were placed in a special arrangement or assorted arrangement, but the direction of the stones had a consistency pointing to the west. And comparing to the trace of construction of ZEN Landscape Garden well known in the country, the three flat stones except for the standing and shaped stones appeared to have the shape of meditation statue, which is the typical formational factors of a ZEN Landscape Garden, on the basis of formational technique of stones. Among them, the flat stone facing the Buddhist saint statue, was formed by way of symbolization of three-mountain stone, which was assumed to be an offering stone for sacrificial food rather than carrying out ZEN Meditation. In consideration of the formation of standing stones at Chong-suk Temple, which was carried out in the composite stoning method based using the scalene triangle with ratio of 3:5:7 in order to seek the in-depth beauty based on the stone statues of three Buddhas where the three factors such as heaven, earth and humans are embodied in the elevated or flat formation, the stones at Chongsuk Temple and the space seemed to the trace of contracted garden construction that was formed with stones for a temple, so that could be used for ZEN meditation.
https://doi.org/10.22755/kjchs.2010.43.1.160 인용 PDF

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.219-239
- /
- 2019
As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.
https://doi.org/10.13088/jiis.2019.25.1.219 인용 PDF KSCI HTML

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
- Journal of Intelligence and Information Systems
- /
- v.19 no.3
- /
- pp.1-23
- /
- 2013
To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.
https://doi.org/10.13088/jiis.2013.19.3.001 인용 PDF KSCI

Search Result 2,909, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)