Search | Korea Science

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.109-122
- /
- 2014
People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.
https://doi.org/10.13088/jiis.2014.20.2.109 인용 PDF KSCI

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.73-92
- /
- 2014
An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.
https://doi.org/10.13088/jiis.2014.20.2.073 인용 PDF KSCI

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.141-166
- /
- 2019
Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.
https://doi.org/10.13088/jiis.2019.25.2.141 인용 PDF KSCI HTML

Musical Analysis of Jindo Dasiraegi music for the Scene of Performing Arts Contents (연희현장에서의 올바른 활용을 위한 진도다시래기 음악분석)

Han, Seung Seok;Nam, Cho Long
- (The) Research of the performance art and culture
- /
- no.25
- /
- pp.253-289
- /
- 2012
Dasiraegi is a traditional funeral rite performance of Jindo located in the South Jeolla Province of South Korea. With its unique stylistic structure including various dances, songs and witty dialogues, and a storyline depicting the birth of a new life in the wake of death, embodying the Buddhism belief that life and death is interconnected; it attracted great interest from performance organizers and performers who were desperately seeking new contents that can be put on stage as a performance. It is needless to say previous research on Dasiraegi had been most valuable in its recreation as it analyzed the performance from a wide range of perspectives. Despite its contributions, the previous researches were mainly academic focusing on: the symbolic meanings of the performance, basic introduction to the components of the performance such as script, lyrics, witty dialogue, appearance (costume and make-up), stage properties, rhythm, dance and etc., lacking accurate representation of the most crucial element of the performance which is sori (song). For this reason, the study analyzes the music of Dasiraegi and presents its musical characteristics along with its scores to provide practical support for performers who are active in the field. Out of all the numbers in Dasiraegi, this study analyzed all of Geosa-nori and Sadang-nori, the funeral dirge (mourning chant) sung as the performers come on stage and Gasangjae-nori, because among the five proceedings of the funeral rite they were the most commonly performed. There are a plethora of performance recordings to choose from, however, this study chose Jindo Dasiraegi, an album released by E&E Media. The album offers high quality recordings of performances, but more importantly, it is easy to obtain and utilize for performers who want to learn the Dasiraegi based on the script provided in this study. The musical analysis discovered a number of interesting findings. Firstly, most of the songs in Dasiraegi use a typical Yukjabaegi-tori which applies the Mi scale frequently containing cut-off (breaking) sounds. Although, Southern Kyoung-tori which applies the Sol scale was used, it was only in limited parts and was musically incomplete. Secondly, there was no musical affinity between Ssitgim-gut and Dasiraegi albeit both are for funeral rites. The fundamental difference in character and function of Ssitgim-gut and Dasiraegi may be the reason behind this lack of affinity, as Ssitgim-gut is sung to guide the deceased to heaven by comforting him/her, whereas, Dasiaregi is sung to reinvigorate the lives of the living. Lastly, traces of musical grammar found in Pansori are present in the earlier part of Dasiraegi. This may be attributed to the master artist (Designee of Important Intangible Cultural Heritage), who was instrumental in the restoration and hand-down of Dasiaregi, and his experience in a Changgeuk company. The performer's experience with Changgeuk may have induced the alterations in Dasiraegi, causing it to deviate from its original form. On the other hand, it expanded the performative bais by enhancing the performance aspect of Dasiraegi allowing it to be utilized as contents for Performing Arts. It would be meaningful to see this study utilized to benefit future performance artists, taking Dasiraegi as their inspiration, which overcomes the loss of death and invigorates the vibrancy of life.

An Exploratory Study on the Components of Visual Merchandising of Internet Shopping Mall (인터넷쇼핑몰의 VMD 구성요인에 대한 탐색적 연구)

Kim, Kwang-Seok;Shin, Jong-Kuk;Koo, Dong-Mo
- Journal of Global Scholars of Marketing Science
- /
- v.18 no.2
- /
- pp.19-45
- /
- 2008
This study is to empirically examine the primary dimensions of visual merchandising (VMD) of internet shopping mall, namely store design, merchandise, and merchandising cues, to be a attractive virtual store to the shoppers. The authors reviewed the literature related to the major components of VMD from the perspective of the AIDA model, which has been mainly applied to the offline store settings. The major purposes of the study are as follows; first, tries to derive the variables related with the components of visual merchandising through reviewing the existing literatures, establish the hypotheses, and test it empirically. Second, examines the relationships between the components of VMD and the attitude toward the VMD, however, putting more emphasis on finding out the component structure of the VMD. VMD needs to be examined with the perspective that an online shopping mall is a virtual self-service or clerkless store, which could reduce the number of employees, help the shoppers search, evaluate and purchase for themselves, and to be explored in terms of the in-store persuasion processes of customers. This study reviewed the literatures related to store design, merchandise, and merchandising cues which might be relevant to the store, product, and promotion respectively. VMD is a total communication tool, and AIDA model could explain the in-store consumer behavior of online shopping. Store design has to do with triggering a consumer attention to the online mall, merchandise with a product related interest, and merchandising cues with promotions such as recommendation and links that induce the desire to pruchase. These three steps might be seen as the processes for purchase actions. The theoretical rationale for the relationship between VMD and AIDA could be found in Tyagi(2005) that the three steps of consumer-oriented merchandising are a store, a product assortment, and placement, in Omar(1999) that three types of interior display are a architectural design display, commodity display, and point-of-sales(POS) display, and in Davies and Ward(2005) that the retail store interior image is related to an atmosphere, merchandise, and in-store promotion. Lee et al(2000) suggested as the web merchandising components a merchandising cues, a shopping metaphor which is an assistant tool for search, a store design, a layout(web design), and a product assortment. The store design which includes differentiation, simplicity and navigation is supposed to be related to the attention to the virtual store. Second, the merchandise dimensions comprising product assortments, visual information and product reputation have to do with the interest in the product offerings. Finally, the merchandising cues that refer to merchandiser(MD)'s recommendation of products and providing the hyperlinks to relevant goods for the shopper is concerned with attempt to induce the desire to purchase. The questionnaire survey was carried out to collect the data about the consumers who would shop at internet shopping malls frequently. To select the subject malls, the mall ranking data announced by a mall rating agency was used to differentiate the most popular and least popular five mall each. The subjects was instructed to answer the questions after navigating the designated mall for five minutes. The 300 questionnaire was distributed to the consumers, 166 samples were used in the final analysis. The empirical testing focused on identifying and confirming the dimensionality of VMD and its subdimensions using a structural equation modeling method. The confirmatory factor analysis for the endogeneous and exogeneous variables was carried out in four parts. The second-order factor analysis was done for a store design, a merchandise, and a merchandising cues, and first-order confirmatory factor analysis for the attitude toward the VMD. The model test results shows that the chi-square value of structural equation is 144.39(d.f 49), significant at 0.01 level which means the proposed model was rejected. But, judging from the ratio of chi-square value vs. degree of freedom, the ratio was 2.94 which smaller than an acceptable level of 3.0, RMR is 0.087 which is higher than a generally acceptable level of 0.08. GFI and AGFI is turned out to be 0.90 and 0.84 respectively. Both NFI and NNFI is 0.94, and CFI 0.95. The major test results are as follows; first, the second-order factor analysis and structural equational modeling reveals that the differentiation, simplicity and ease of identifying current status of the transaction are confirmed to be subdimensions of store design and to be a significant predictors of the dependent variable. This result implies that when designing an online shopping mall, it is necessary to differentiate visually from other malls to improve the effectiveness of the communications of store design. That is, the differentiated store design raise the contrast stimulus to sensory organs to promote the memory of the store and to have a favorable attitude toward the VMD of a store. The results that navigation which means the easiness of identifying current status of shopping affects the attitude to VMD could be interpreted that the navigating processes via the hyperlinks which is characteristics of an internet shopping is a complex and cognitive process and shoppers are likely to lack the sense of overall structure of the store. Consequently, shoppers are likely to be alost amid shopping not knowing where to go. The orientation tool enhance the accessibility of information to raise the perceptive power about the store environment.(Titus & Everett 1995) Second, the primary dimension of merchandise and its subdimensions was confirmed to be unidimensional respectively, have a construct validity, and nomological validity which the VMD dimensions supposed to have a positive correlation with the dependent variable. The subdimensions of product assortment, brand fame and information provision proved to have a positive effect on the attitude toward the VMD. It could be interpreted that the more plentiful the product and brand assortment of the mall is, the more likely the shoppers to favor it. Brand fame and information provision as well affect the VMD attitude, which means that the more famous the brand, the more likely the shoppers would trust and feel familiar with the mall, and the plentifully and visually presented information could have the shopper have a favorable attitude toward the store VMD. Third, it turned out to be that merchandising cue of product recommendation and hyperlinks affect the VMD attitude. This could be interpreted that recommended products could reduce the uncertainty related with the purchase decision, and the hyperlinks to relevant products would help the shopper save the cognitive effort exerted into the information search and gathering, which could lead to a favorable attitude to the VMD. This study tried to sheds some new light on the VMD of online store by reviewing the variables mentioned to be relevant with offline VMD in the existing literatures, and tried to link the VMD components from the perspective of AIDA model. The effect size of the VMD dimensions on the attitude was in the order of the merchandise, the store design and the merchandising cues.It is said that an internet has an unlimited place for display, however, the virtual store is not unlimited since the consumer has a limited amount of cognitive ability to process the external information and internal memory. Particularly, the shoppers are likely to face some difficulties in decision making on account of too many alternative and information overloads. Therefore, the internet shopping mall manager should take into consideration the cost of information search on the part of the consumer, to establish the optimal product placements and search routes. An efficient store composition would be possible by reducing the psychological burdens and cognitive efforts exerted to information search and alternatives evaluation. The store image is in most part determined by the product category and its brand it deals in. The results of this study support this proposition that the merchandise is most important to the VMD attitude than other components, the manager is required to take a strategic approach to VMD. The internet users are getting more accustomed and more knowledgeable about the internet media and more likely to accept the internet as a shopping channel as the period of time during which they use the internet to shop become longer. The web merchandiser should be aware that the product introduction using a moving pictures and a bulletin board become more important in order to present the interactive product information visually and communicate with customers more actively, therefore leading to making the quantity and quality of product information more rich.
PDF

Study of the Actual Condition and Satisfaction of Volunteer Activity in Australian Hospital (호주 일 지역의 병원 자원봉사활동 실태와 만족도)

Park, Geum-Ja;Choi, Hae-Young
- Journal of Hospice and Palliative Care
- /
- v.9 no.1
- /
- pp.17-29
- /
- 2006
Purpose: This research aimed to investigate the actual condition and satisfaction of volunteer activity in Australian hospital. Methods: Data was collected by self reported questionnaire from 101 volunteers and analyzed by frequency and percentage, t-test, ANOVA and Sheffe and Pearson's correlation coefficients using SPSS 12.0. Results: 1. Years involved in volunteer work were $5{\sim}10$ years (32.7%), above 10 years (30.7%), $2{\sim}3$ years (11.9%) and $3{\sim}5$ years (10.9%). Types of volunteer work were physical care (32.7%), physical and emotional care (14.9%), and others (18.8%). Types of allocation of tasks were by volunteer coordination (55.7%), and by volunteer preference and consent between volunteer and coordinator (both respectively, 20.5%). Main reasons for volunteer work were to help sick people (61.4%) and to make good use of leisure time (22.8%). Routes to start volunteer work were from his (her) own inquiries (43.4%), from hearing from other volunteers (30.7%) and from mass media (13.1%). 80.2% of volunteers had received some kinds of training or preparation for volunteer work. Suitability of volunteer's skill and ability to voluntary work were 'very well' (74.0%) and 'mostly well' (18.0%). Reimbursements or benefits received for volunteer work were token or lunch or group outing (31.7%), and token and lunch or group outing (19.8%). Evaluation frequency for volunteer work was occasionally (372%), frequently (30.9%), always (17.0%) and never (14.9%). Relationship with volunteer work coordinator was very good (85.0%). The relationship with other volunteers was very good (81.2%). The relationship with hospital staffs was very good (69.7%) and mostly good (21.2%). Family and friend's support for volunteer work was very good (83.2%). 2 The mean score of satisfaction for the hospital volunteer activity was $3.09{\pm}0.49\;(range:\;1{\sim}4)$. The highest score domain was 'social contact', $3.48{\pm}0.61$, and the lowest was 'social exchange', $1.65{\pm}0.63$. An item of the highest score was 'I have an opportunity to help other people' ($3.83{\pm}0.40$), and the lowest score item was 'I will receive compensation for volunteer work I have done ($1.10{\pm}0.78$).' 3. The satisfaction from hospital volunteer activity was shown by significant difference according to sex (t=2.038, P=0.044), marital status (F=3.806, P=0.013), years involved in volunteer work (F=3.326), nam reason to do volunteer work (F=2.707, P=0.035), receive any training or preparation for volunteer work (t=-1.982, 0=0.050), frequency of evaluation for volunteer work (F=7.877, P=0.000), suitability of volunteer's skill and ability to voluntary work (t=2.712, P=0.049), relationship with volunteer work coordinators (F=-2.517, P=0.013), relation with hospital staffs (F=5.202, P=0.007), and support of their volunteer work by their family and friends (t=-3.394, P=0.001). Conclusion: The satisfaction of hospice volunteer activity was moderate. The satisfaction for hospice volunteer activity was shown by significant difference according to sex (t=2.038, P=0.044), marital status (F=3.806, P=0.013), years involved in volunteer work (F=3.326), main reason to do volunteer work (F=2.707, P=0.035), receive any training or preparation for volunteer work (t=-1.982, 0=0.050), frequency of evaluation for volunteer work (F=7.877, P=0.000), suitability of volunteer's skill and ability to voluntary work (t=2.712, P=0.049), relationship with volunteer work coordinator (F=-2.517, P=0.013), relation with hospital staffs (F=5.202, P=0.007), and family and friend's support for volunteer work (t=-3.394, P=0.001). Therefore, it is necessary to consider various factors to improve the satisfaction of voluntary work.
PDF

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
- Journal of Intelligence and Information Systems
- /
- v.19 no.3
- /
- pp.1-23
- /
- 2013
To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.
https://doi.org/10.13088/jiis.2013.19.3.001 인용 PDF KSCI

Search Result 3,847, Processing Time 0.033 seconds

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

Musical Analysis of Jindo Dasiraegi music for the Scene of Performing Arts Contents (연희현장에서의 올바른 활용을 위한 진도다시래기 음악분석)

An Exploratory Study on the Components of Visual Merchandising of Internet Shopping Mall (인터넷쇼핑몰의 VMD 구성요인에 대한 탐색적 연구)

Study of the Actual Condition and Satisfaction of Volunteer Activity in Australian Hospital (호주 일 지역의 병원 자원봉사활동 실태와 만족도)

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)