• 제목/요약/키워드: Text Mining Analysis

검색결과 1,208건 처리시간 0.032초

거시적 이슈 트래킹의 한계 극복을 위한 개인 관심 트래킹 방법론 (Individual Interests Tracking : Beyond Macro-level Issue Tracking)

  • 류신;김남규
    • 한국IT서비스학회지
    • /
    • 제13권4호
    • /
    • pp.275-287
    • /
    • 2014
  • Recently, the volume of unstructured text data generated by various social media has been increasing rapidly; consequently, the use of text mining to support decision-making has also been growing. In particular, academia and industry are paying significant attention to topic analysis in order to discover the main issues from a large volume of text documents. Topic analysis can be regarded as static analysis because it analyzes a snapshot of the distribution of various issues. In contrast, some recent studies have attempted to perform dynamic issue tracking, which analyzes and traces issue trends during a predefined period. However, most traditional issue tracking methods have a common limitation : when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. Additionally, traditional issue tracking methods do not concentrate on the transition of individuals' interests from certain issues to others, although the methods can illustrate macro-level issue trends. In this paper, we propose an individual interests tracking methodology to overcome the two limitations of traditional issue tracking methods. Our main goal is not to track macro-level issue trends but to analyze trends of individual interests flow. Further, our methodology has extensible characteristics because it analyzes only newly added documents when the period of analysis is extended. In this paper, we also analyze the results of applying our methodology to news articles and their access logs.

An Exploratory Analysis of Online Discussion of Library and Information Science Professionals in India using Text Mining

  • Garg, Mohit;Kanjilal, Uma
    • Journal of Information Science Theory and Practice
    • /
    • 제10권3호
    • /
    • pp.40-56
    • /
    • 2022
  • This paper aims to implement a topic modeling technique for extracting the topics of online discussions among library professionals in India. Topic modeling is the established text mining technique popularly used for modeling text data from Twitter, Facebook, Yelp, and other social media platforms. The present study modeled the online discussions of Library and Information Science (LIS) professionals posted on Lis Links. The text data of these posts was extracted using a program written in R using the package "rvest." The data was pre-processed to remove blank posts, posts having text in non-English fonts, punctuation, URLs, emails, etc. Topic modeling with the Latent Dirichlet Allocation algorithm was applied to the pre-processed corpus to identify each topic associated with the posts. The frequency analysis of the occurrence of words in the text corpus was calculated. The results found that the most frequent words included: library, information, university, librarian, book, professional, science, research, paper, question, answer, and management. This shows that the LIS professionals actively discussed exams, research, and library operations on the forum of Lis Links. The study categorized the online discussions on Lis Links into ten topics, i.e. "LIS Recruitment," "LIS Issues," "Other Discussion," "LIS Education," "LIS Research," "LIS Exams," "General Information related to Library," "LIS Admission," "Library and Professional Activities," and "Information Communication Technology (ICT)." It was found that the majority of the posts belonged to "LIS Exam," followed by "Other Discussions" and "General Information related to the Library."

텍스트마이닝을 이용한 국내 만성질환자 대상 모바일 헬스 중재연구 동향 분석 (Analysis of research trends on mobile health intervention for Korean patients with chronic disease using text mining)

  • 손연정;이수경
    • 디지털융복합연구
    • /
    • 제17권4호
    • /
    • pp.211-217
    • /
    • 2019
  • 국내 만성질환 관리에서 모바일 헬스 적용이 임상적으로 유용하다는 보고가 증가됨에 따라, 본 연구는 텍스트 마이닝 기법을 적용하여 국내 외 학술지에 게재된 국내 만성질환자 대상 모바일 헬스 중재연구의 특성 및 중심 키워드의 변화를 파악하고자 시도된 이차분석연구이다. 분석대상 논문은 2005년부터 2018년까지 학술지에 게재된 최종 20편으로, 추출한 텍스트는 Microsoft Excel을 활용하여 논문별 분석을 실시하였고, Text Analyzer를 사용하여 주제어를 추출하였다. 연구결과, 모바일 헬스 중재 연구는 고혈압, 당뇨병, 뇌졸중 관상동맥질환자에 주로 적용되었다. 가장 많이 사용된 중재 유형은 애플리케이션 개발이었으며, 최근 연구에서는 주로 '유용성', '모바일 헬스', '결과측정' 등의 단어들이 가장 많이 출현하였다. 추후 만성질환자 대상 모바일 헬스 중재에 관한 국내 외 연구 모두를 포함하여 주제어 간의 연관성을 확인할 수 있는 사회연결망 분석방법을 적용하여 그 효용성을 확인해볼 것을 제안한다.

텍스트 마이닝 통합 애플리케이션 개발: KoALA (Application Development for Text Mining: KoALA)

  • 전병진;최윤진;김희웅
    • 경영정보학연구
    • /
    • 제21권2호
    • /
    • pp.117-137
    • /
    • 2019
  • 빅데이터 시대를 맞아 다양한 도메인에서 수없이 많은 데이터들이 생산되면서 데이터 사이언스가 대중화 되었고, 데이터의 힘이 곧 경쟁력인 시대가 되었다. 특히 전 세계 데이터의 80% 이상을 차지하는 비정형 데이터에 대한 관심이 부각되고 있다. 소셜 미디어의 발전과 더불어 비정형 데이터의 대부분은 텍스트 데이터의 형태로 발생하고 있으며, 마케팅, 금융, 유통 등 다양한 분야에서 중요한 역할을 하고 있다. 하지만 이러한 소셜 미디어를 활용한 텍스트 마이닝은 수치형 데이터를 활용한 데이터 마이닝 분야에 비해 접근이 어렵고 복잡해 기대에 비해 그 활용도가 높지 못한 실정이다. 이에 본 연구는 프로그래밍 언어나 고사양 하드웨어나 솔루션에 의존하지 않고, 쉽고 간편한 소셜 미디어 텍스트 마이닝을 위한 통합 애플리케이션으로 Korean Natural Language Application(KoALA)을 개발하고자 한다. KoALA는 소셜 미디어 텍스트 마이닝에 특화된 애플리케이션으로, 한글, 영문을 가리지 않고 분석 가능한 통합 애플리케이션이다. 데이터 수집에서 전처리, 분석, 그리고 시각화에 이르는 전 과정을 처리해준다. 본 논문에서는 디자인 사이언스(design science) 방법론을 활용해 KoALA 애플리케이션을 디자인, 구현, 적용하는 과정에 대해서 다룬다. 마지막으로 블록체인 비즈니스 관련 사례를 들어 KoALA의 실제 활용방안에 대해서 다룬다. 본 논문을 통해 소셜 미디어 텍스트 마이닝의 대중화와 다양한 도메인에서 텍스트 마이닝의 실무적, 학술적 활용을 기대해 본다.

섬유소재 분야 특허 기술 동향 분석: DETM & STM 텍스트마이닝 방법론 활용 (Research of Patent Technology Trends in Textile Materials: Text Mining Methodology Using DETM & STM)

  • 이현상;조보근;오세환;하성호
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제30권3호
    • /
    • pp.201-216
    • /
    • 2021
  • Purpose The purpose of this study is to analyze the trend of patent technology in textile materials using text mining methodology based on Dynamic Embedded Topic Model and Structural Topic Model. It is expected that this study will have positive impact on revitalizing and developing textile materials industry as finding out technology trends. Design/methodology/approach The data used in this study is 866 domestic patent text data in textile material from 1974 to 2020. In order to analyze technology trends from various aspect, Dynamic Embedded Topic Model and Structural Topic Model mechanism were used. The word embedding technique used in DETM is the GloVe technique. For Stable learning of topic modeling, amortized variational inference was performed based on the Recurrent Neural Network. Findings As a result of this analysis, it was found that 'manufacture' topics had the largest share among the six topics. Keyword trend analysis found the fact that natural and nanotechnology have recently been attracting attention. The metadata analysis results showed that manufacture technologies could have a high probability of patent registration in entire time series, but the analysis results in recent years showed that the trend of elasticity and safety technology is increasing.

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권12호
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

Analysis of Dental Hygienist Job Recognition Using Text Mining

  • Kim, Bo-Ra;Ahn, Eunsuk;Hwang, Soo-Jeong;Jeong, Soon-Jeong;Kim, Sun-Mi;Han, Ji-Hyoung
    • 치위생과학회지
    • /
    • 제21권1호
    • /
    • pp.70-78
    • /
    • 2021
  • Background: The aim of this study was to analyze the public demand for information about the job of dental hygienists by mining text data collected from the online Q & A section on an Internet portal site. Methods: Text data were collected from inquiries that were posted on the Naver Q & A section from January 2003 to July 2020 using "dental hygienist job recognition," "role recognition," "medical assistance," and "scaling" as search keywords. Text mining techniques were used to identify significant Korean words and their frequency of occurrence. In addition, the association between words was analyzed. Results: A total of 10,753 Korean words related to the job of dental hygienists were extracted from the text data. "Chi-lyo (treatment)," "chigwa (dental clinic)," "ske-illing (scaling)," "itmom (gum)," and "chia (tooth)" were the five most frequently used words. The words were classified into the following areas of job of the dental hygienist: periodontal disease treatment and prevention, medical assistance, patient care and consultation, and others. Among these areas, the number of words related to medical assistance was the largest, with sixty-six association rules found between the words, and "chi-lyo," "chigwa," and "ske-illing" as core words. Conclusion: The public demand for information about the job of dental hygienists was mainly related to "chi-lyo," "chigwa," and "ske-illing" as core words, demonstrating that scaling is recognized by the public as the job of a dental hygienist. However, the high demand for information related to treatment and medical assistance in the context of dental hygienists indicates that the job of dental hygienists is recognized by the public as being more focused on medical assistance than preventive dental care that are provided with job autonomy.

TLS 마이닝을 이용한 '정보시스템연구' 동향 분석 (Analysis on the Trend of The Journal of Information Systems Using TLS Mining)

  • 윤지혜;오창규;이종화
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제31권1호
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.

텍스트 분석을 활용한 국가 현안 대응 R&D 정보 패키징 방법론 (Methodology Using Text Analysis for Packaging R&D Information Services on Pending National Issues)

  • 현윤진;한희준;최희석;박준형;이규하;곽기영;김남규
    • Journal of Information Technology Applications and Management
    • /
    • 제20권3_spc호
    • /
    • pp.231-257
    • /
    • 2013
  • The recent rise in the unstructured data generated by social media has resulted in an increasing need to collect, store, search, analyze, and visualize it. These data cannot be managed effectively by using traditional data analysis methodologies because of their vast volume and unstructured nature. Therefore, many attempts are being made to analyze these unstructured data (e.g., text files and log files) by using commercial and noncommercial analytical tools. Especially, the attempt to discover meaningful knowledge by using text mining is being made in business and other areas such as politics, economics, and cultural studies. For instance, several studies have examined pending national issues by analyzing large volumes of texts on various social issues. However, it is difficult to create satisfactory information services that can identify R&D documents on specific national issues from among the various R&D resources. In other words, although users specify some words related to pending national issues as search keywords, they usually fail to retrieve the R&D information they are looking for. This is usually because of the discrepancy between the terms defining pending national issues and the corresponding terms used in R&D documents. We need a mediating logic to overcome this discrep 'ancy so that we can identify and package appropriate R&D information on specific pending national issues. In this paper, we use association analysis and social network analysis to devise a mediator for bridging the gap between the keywords defining pending national issues and those used in R&D documents. Further, we propose a methodology for packaging R&D information services for pending national issues by using the devised mediator. Finally, in order to evaluate the practical applicability of the proposed methodology, we apply it to the NTIS(National Science & Technology Information Service) system, and summarize the results in the case study section.

텍스트마이닝을 활용한 감정노동 연구 동향 분석 (Research Trends on Emotional Labor in Korea using text mining)

  • 조경원;한나영
    • 한국산업정보학회논문지
    • /
    • 제26권6호
    • /
    • pp.119-133
    • /
    • 2021
  • 텍스트마이닝을 이용하여 연구동향을 파악하는 연구가 많은 분야에서 이루어지고 있으나 감정노동 분야에서는 텍스트마이닝을 사용하여 연구 동향을 파악한 연구는 없는 실정이다. 본 연구는 텍스트마이닝을 이용하여 2004년부터 2019년까지 한국연구재단의 한국학술지인용색인(KCI)에서 '감정 노동'이라는 주제어가 포함된 1,465건의 검색된 논문을 심층적으로 분석하여 감정노동 연구 동향을 파악하고자 한다. LDA분석으로 주제들을 추출하고, 토픽의 비중과 유사도를 확인하기 위해 IDM분석을 실시하였다. 이를 통해 유사도가 높은 토픽들의 의미유용성을 고려하여 토픽의 통합분석을 실시하였다. 연구토픽은 11개로 구분되며, 감정노동의 스트레스(12.2%), 감정노동과 사회적 지지(12.0%), 고객서비스 종사자의 감정노동(10.9%), 감정노동과 회복탄력성(10.2%), 감정노동전략(9.2%), 콜센터상담사의 감정노동(9.1%), 감정노동의 결과(9.0%), 감정노동과 직무소진(7.9%), 감성지능(7.1%), 예비돌봄서비스 종사자의 감정노동(6.6%), 감정노동과 조직문화(5.9%) 순의 비중으로 나타났다. 토픽모델링과 트렌드분석을 통하여 감정노동의 연구동향과 학문적 추이를 분석함으로써 감정노동 연구의 나아갈 방향을 제시하고자 하며 감정노동에 관한 실무적인 전략을 수립할 수 있기를 기대한다.