• Title/Summary/Keyword: Text mining analysis

Search Result 1,187, Processing Time 0.024 seconds

Application Development for Text Mining: KoALA (텍스트 마이닝 통합 애플리케이션 개발: KoALA)

  • Byeong-Jin Jeon;Yoon-Jin Choi;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.21 no.2
    • /
    • pp.117-137
    • /
    • 2019
  • In the Big Data era, data science has become popular with the production of numerous data in various domains, and the power of data has become a competitive power. There is a growing interest in unstructured data, which accounts for more than 80% of the world's data. Along with the everyday use of social media, most of the unstructured data is in the form of text data and plays an important role in various areas such as marketing, finance, and distribution. However, text mining using social media is difficult to access and difficult to use compared to data mining using numerical data. Thus, this study aims to develop Korean Natural Language Application (KoALA) as an integrated application for easy and handy social media text mining without relying on programming language or high-level hardware or solution. KoALA is a specialized application for social media text mining. It is an integrated application that can analyze both Korean and English. KoALA handles the entire process from data collection to preprocessing, analysis and visualization. This paper describes the process of designing, implementing, and applying KoALA applications using the design science methodology. Lastly, we will discuss practical use of KoALA through a block-chain business case. Through this paper, we hope to popularize social media text mining and utilize it for practical and academic use in various domains.

Research of Patent Technology Trends in Textile Materials: Text Mining Methodology Using DETM & STM (섬유소재 분야 특허 기술 동향 분석: DETM & STM 텍스트마이닝 방법론 활용)

  • Lee, Hyun Sang;Jo, Bo Geun;Oh, Se Hwan;Ha, Sung Ho
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.201-216
    • /
    • 2021
  • Purpose The purpose of this study is to analyze the trend of patent technology in textile materials using text mining methodology based on Dynamic Embedded Topic Model and Structural Topic Model. It is expected that this study will have positive impact on revitalizing and developing textile materials industry as finding out technology trends. Design/methodology/approach The data used in this study is 866 domestic patent text data in textile material from 1974 to 2020. In order to analyze technology trends from various aspect, Dynamic Embedded Topic Model and Structural Topic Model mechanism were used. The word embedding technique used in DETM is the GloVe technique. For Stable learning of topic modeling, amortized variational inference was performed based on the Recurrent Neural Network. Findings As a result of this analysis, it was found that 'manufacture' topics had the largest share among the six topics. Keyword trend analysis found the fact that natural and nanotechnology have recently been attracting attention. The metadata analysis results showed that manufacture technologies could have a high probability of patent registration in entire time series, but the analysis results in recent years showed that the trend of elasticity and safety technology is increasing.

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

Analysis of Dental Hygienist Job Recognition Using Text Mining

  • Kim, Bo-Ra;Ahn, Eunsuk;Hwang, Soo-Jeong;Jeong, Soon-Jeong;Kim, Sun-Mi;Han, Ji-Hyoung
    • Journal of dental hygiene science
    • /
    • v.21 no.1
    • /
    • pp.70-78
    • /
    • 2021
  • Background: The aim of this study was to analyze the public demand for information about the job of dental hygienists by mining text data collected from the online Q & A section on an Internet portal site. Methods: Text data were collected from inquiries that were posted on the Naver Q & A section from January 2003 to July 2020 using "dental hygienist job recognition," "role recognition," "medical assistance," and "scaling" as search keywords. Text mining techniques were used to identify significant Korean words and their frequency of occurrence. In addition, the association between words was analyzed. Results: A total of 10,753 Korean words related to the job of dental hygienists were extracted from the text data. "Chi-lyo (treatment)," "chigwa (dental clinic)," "ske-illing (scaling)," "itmom (gum)," and "chia (tooth)" were the five most frequently used words. The words were classified into the following areas of job of the dental hygienist: periodontal disease treatment and prevention, medical assistance, patient care and consultation, and others. Among these areas, the number of words related to medical assistance was the largest, with sixty-six association rules found between the words, and "chi-lyo," "chigwa," and "ske-illing" as core words. Conclusion: The public demand for information about the job of dental hygienists was mainly related to "chi-lyo," "chigwa," and "ske-illing" as core words, demonstrating that scaling is recognized by the public as the job of a dental hygienist. However, the high demand for information related to treatment and medical assistance in the context of dental hygienists indicates that the job of dental hygienists is recognized by the public as being more focused on medical assistance than preventive dental care that are provided with job autonomy.

Analysis on the Trend of The Journal of Information Systems Using TLS Mining (TLS 마이닝을 이용한 '정보시스템연구' 동향 분석)

  • Yun, Ji Hye;Oh, Chang Gyu;Lee, Jong Hwa
    • The Journal of Information Systems
    • /
    • v.31 no.1
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.

Methodology Using Text Analysis for Packaging R&D Information Services on Pending National Issues (텍스트 분석을 활용한 국가 현안 대응 R&D 정보 패키징 방법론)

  • Hyun, Yoonjin;Han, Heejun;Choi, Heeseok;Park, Junhyung;Lee, Kyuha;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Information Technology Applications and Management
    • /
    • v.20 no.3_spc
    • /
    • pp.231-257
    • /
    • 2013
  • The recent rise in the unstructured data generated by social media has resulted in an increasing need to collect, store, search, analyze, and visualize it. These data cannot be managed effectively by using traditional data analysis methodologies because of their vast volume and unstructured nature. Therefore, many attempts are being made to analyze these unstructured data (e.g., text files and log files) by using commercial and noncommercial analytical tools. Especially, the attempt to discover meaningful knowledge by using text mining is being made in business and other areas such as politics, economics, and cultural studies. For instance, several studies have examined pending national issues by analyzing large volumes of texts on various social issues. However, it is difficult to create satisfactory information services that can identify R&D documents on specific national issues from among the various R&D resources. In other words, although users specify some words related to pending national issues as search keywords, they usually fail to retrieve the R&D information they are looking for. This is usually because of the discrepancy between the terms defining pending national issues and the corresponding terms used in R&D documents. We need a mediating logic to overcome this discrep 'ancy so that we can identify and package appropriate R&D information on specific pending national issues. In this paper, we use association analysis and social network analysis to devise a mediator for bridging the gap between the keywords defining pending national issues and those used in R&D documents. Further, we propose a methodology for packaging R&D information services for pending national issues by using the devised mediator. Finally, in order to evaluate the practical applicability of the proposed methodology, we apply it to the NTIS(National Science & Technology Information Service) system, and summarize the results in the case study section.

Research Trends on Emotional Labor in Korea using text mining (텍스트마이닝을 활용한 감정노동 연구 동향 분석)

  • Cho, Kyoung-Won;Han, Na-Young
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.26 no.6
    • /
    • pp.119-133
    • /
    • 2021
  • Research has been conducted in many fields to identify research trends using text mining, but in the field of emotional labor, no research has been conducted using text mining to identify research trends. This study uses text mining to deeply analyze 1,465 papers at the Korea Citation Index (KCI) from 2004 to 2019 containing the subject word 'emotional labor' to understand the trend of emotional labor researches. Topics were extracted by LDA analysis, and IDM analysis was performed to confirm the proportion and similarity of the topics. Through these methods, an integrated analysis of topics was conducted considering the usefulness of topics with high similarity. The research topics are divided into 11 categories in descending order: stress of emotional labor (12.2%), emotional labor and social support (12.0%), customer service workers' emotional labor (10.9%), emotional labor and resilience (10.2%), emotional labor strategy (9.2%), call center counselor's emotional labor (9.1%), results of emotional labor (9.0%), emotional labor and job exhaustion (7.9%), emotional intelligence (7.1%), preliminary care service workers' emotional labor (6.6%), emotional labor and organizational culture (5.9%). Through topic modeling and trend analysis, the research trend of emotional labor and the academic progress are analyzed to present the direction of emotional labor research, and it is expected that a practical strategy for emotional labor can be established.

Using Text Mining for the Analysis of Research Trends Related to Laws Under the Ministry of Oceans and Fisheries (텍스트 마이닝을 활용한 해양수산부 법률 관련 연구동향 분석연구)

  • Hwang, Kyu Won;Lee, Moon Suk;Yun, So Ra
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.28 no.4
    • /
    • pp.549-566
    • /
    • 2022
  • Recently, artificial intelligence (AI) technology has progressed rapidly, and industries using this technology are significantly increasing. Further, analysis research using text mining, which is an application of artificial intelligence, is being actively developed in the field of social science research. About 125 laws, including joint laws, have been enacted under the Ministry of Oceans and Fisheries in various sectors including marine environment, fisheries, ships, fishing villages, ports, etc. Research on the laws under the Ministry of Oceans and Fisheries has been progressively conducted, and is steadily increasing quantitatively. In this study, the domestic research trends were analyzed through text mining, targeting the research papers related to laws of the Ministry of Oceans and Fisheries. As part of this research method, first, topic modeling which is a type of text mining was performed to identify potential topics. Second, co-occurrence network analysis was performed, focusing on the keywords in the research papers dealing with specific laws to derive the key themes covered. Finally, author network analysis was performed to explore social networks among authors. The results showed that key topics have been changed by period, and subjects were explored by targeting Ship Safety Law, Marine Environment Management Law, Fisheries Law, etc. Furthermore, in this study, core researchers were selected based on author network analysis, and the tendency for joint research performed by authors was identified. Through this study, changes in the topics for research related to the laws of the Ministry of Oceans and Fisheries were identified up to date, and it is expected that future research topics will be further diversified, and there will be growth of quantitative and qualitative research in the field of oceans and fisheries.

Financial Footnote Analysis for Financial Ratio Predictions based on Text-Mining Techniques (재무제표 주석의 텍스트 분석 통한 재무 비율 예측 향상 연구)

  • Choe, Hyoung-Gyu;Lee, Sang-Yong Tom
    • Knowledge Management Research
    • /
    • v.21 no.2
    • /
    • pp.177-196
    • /
    • 2020
  • Since the adoption of K-IFRS(Korean International Financial Reporting Standards), the amount of financial footnotes has been increased. However, due to the stereotypical phrase and the lack of conciseness, deriving the core information from footnotes is not really easy yet. To propose a solution for this problem, this study tried financial footnote analysis for financial ratio predictions based on text-mining techniques. Using the financial statements data from 2013 to 2018, we tried to predict the earning per share (EPS) of the following quarter. We found that measured prediction errors were significantly reduced when text-mined footnotes data were jointly used. We believe this result came from the fact that discretionary financial figures, which were hardly predicted with quantitative financial data, were more correlated with footnotes texts.

Analysis of 'Better Class' Characteristics and Patterns from College Lecture Evaluation by Longitudinal Big Data

  • Nam, Min-Woo;Cho, Eun-Soon
    • International Journal of Contents
    • /
    • v.15 no.3
    • /
    • pp.7-12
    • /
    • 2019
  • The purpose of this study was to analyze characteristics and patterns of 'better class' by using the longitudinal text mining big data analysis technique from subjective lecture evaluation comments. First, this study classified upper 30% classes to deduce certain characteristics and patterns from every five-year subjective text data for 10 years. A total of 47,177courses (100%) from spring semester 2005 to fall semester 2014 were analyzed from a university at a metropolitan city in the mid area of South Korea. This study extracted meaningful words such as good, course, professor, appreciation, lecture, interesting, useful, know, easy, improvement, progress, teaching material, passion, and concern from the order of frequency 2005-2009. The other set of words were class, appreciation, professor, good, course, interesting, understanding, useful, help, student, effort, thinking, not difficult, explanation, lecture, hard, pleasant, easy, study, examination, like, various, fun, and knowledge 2010-2014. This study suggests that the characteristics and patterns of 'better class' at college, should be analyzed according to different academic code such as liberal arts, fine arts, social science, engineering, math and science, and etc.