• Title/Summary/Keyword: Text mining analysis

Search Result 1,187, Processing Time 0.031 seconds

Applying Text Mining to Identify Factors Which Affect Likes and Dislikes of Online News Comments (텍스트마이닝을 통한 댓글의 공감도 및 비공감도에 영향을 미치는 댓글의 특성 연구)

  • Kim, Jeonghun;Song, Yeongeun;Jin, Yunseon;kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.14 no.2
    • /
    • pp.159-176
    • /
    • 2015
  • As a public medium and one of the big data sources that is accumulated informally and real time, online news comments or replies are considered a significant resource to understand mentalities of article readers. The comments are also being regarded as an important medium of WOM (Word of Mouse) about products, services or the enterprises. If the diffusing effect of the comments is referred to as the degrees of agreement and disagreement from an angle of WOM, figuring out which characteristics of the comments would influence the agreements or the disagreements to the comments in very early stage would be very worthwhile to establish a comment-based eWOM (electronic WOM) strategy. However, investigating the effects of the characteristics of the comments on eWOM effect has been rarely studied. According to this angle, this study aims to conduct an empirical analysis which understands the characteristics of comments that affect the numbers of agreement and disagreement, as eWOM performance, to particular news articles which address a specific product, service or enterprise per se. While extant literature has focused on the quantitative attributes of the comments which are collected by manually, this paper used text mining techniques to acquire the qualitative attributes of the comments in an automatic and cost effective manner.

a Study on Using Social Big Data for Expanding Analytical Knowledge - Domestic Big Data supply-demand expectation - (분석지의 확장을 위한 소셜 빅데이터 활용연구 - 국내 '빅데이터' 수요공급 예측 -)

  • Kim, Jung-Sun;Kwon, Eun-Ju;Song, Tae-Min
    • Knowledge Management Research
    • /
    • v.15 no.3
    • /
    • pp.169-188
    • /
    • 2014
  • Big data seems to change knowledge management system and method of enterprises to large extent. Further, the type of method for utilization of unstructured data including image, v ideo, sensor data a nd text may determine the decision on expansion of knowledge management of the enterprise or government. This paper, in this light, attempts to figure out the prediction model of demands and supply for big data market of Korea trough data mining decision making tree by utilizing text bit data generated for 3 years on web and SNS for expansion of form for knowledge management. The results indicate that the market focused on H/W and storage leading by the government is big data market of Korea. Further, the demanders of big data have been found to put important on attribute factors including interest, quickness and economics. Meanwhile, innovation and growth have been found to be the attribute factors onto which the supplier puts importance. The results of this research show that the factors affect acceptance of big data technology differ for supplier and demander. This article may provide basic method for study on expansion of analysis form of enterprise and connection with its management activities.

  • PDF

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

Topic Modeling of Suicide Papers using Text Mining (텍스트마이닝을 활용한 자살 관련 논문 토픽 모델링)

  • Cho, Kyoung Won;Kim, Ha-young;Kim, Mi-ri;Woo, Young Woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.275-277
    • /
    • 2019
  • The purpose of this study is to classify the topics related to the suicide papers published so far and to identify the proporations of the main topics and the trends of the topics over the past 20 years. For this purpose, a text mining technique used in big data analysis was used as a data base of the Korean Journal of Citation Index (KCI), where information sharing about the papers is most active. This study, which grasps the trends of suicide related research according to the changes of the times, will become a basic data for establishing a strategy to adapt the academic direction related to suicide in the future.

  • PDF

A Study on Keyword Information Characteristics of Product Names for Online Sales of Women's Jeans Using Text Mining (텍스트마이닝을 활용한 온라인 판매 여성 청바지 상품명에 나타난 키워드의 정보 특성 분석)

  • Yeo Sun Kang
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.1
    • /
    • pp.35-51
    • /
    • 2023
  • This study used text mining to extract 2,842 keywords from 7,397 product names and organized them into categories in order to analyze the characteristics of keywords appearing in the product names of jeans after 2020. The item category included denim and Chungbaji [청바지], and Ilja [일자], while the silhouette category included wide and bootcut. In addition, high-waist and banding comprised the making sector, and the materials category consisted of napping, spandex, and soft blue. Denim surpassed the others in frequency, co-occurrence frequency, and centrality, and co-appeared with various other keywords. Also, the co-appearance of item and silhouette was prominent, and there were many keyword combinations that showed characteristics related to (a) high waist; (b) hemline detail; (c) rubber band; and (d) partial tearing. Furthermore, idiom expressions such as 'slim fit' and 'back tearing', which were not highlighted in the co-occurrence frequency, were additionally confirmed through correlation. Therefore, the product name analysis effectively identified the detailed characteristics of the silhouette and the making of jeans preferred by consumers.

A Study on the Consumer Boycott Participation Experience: Using Text Mining Analysis and In-depth Interview (소비자불매운동 참여 경험에 관한 연구: 텍스트마이닝 분석과 심층면접기법의 활용)

  • Han, Juno;Li, Xu;Hwang, Hyesun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.2
    • /
    • pp.88-106
    • /
    • 2022
  • This study examined the social discourse on consumer boycott and explored consumer experience using text mining of mass media and social media data and the in-depth interview. The result showed that the topics of online news related to the boycott included the causes of the boycott, the responses of each actor in the process of the boycott, and the effects of the boycott. In the result of the in-depth interviews, it was found that the boycott has been decentralized and the participants had the experience of exploring and verifying information on their own. In the boycott process, there were mixed experiences due to the absence of substitutes and the marketing influence, and positive experiences of expressing one's thoughts and strengthening beliefs through the boycott.

Understanding of the Overview of Quality 4.0 Using Text Mining (텍스트마이닝을 활용한 품질 4.0 연구동향 분석)

  • Kim, Minjun
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.3
    • /
    • pp.403-418
    • /
    • 2023
  • Purpose: The acceleration of technological innovation, specifically Industry 4.0, has triggered the emergence of a quality management paradigm known as Quality 4.0. This study aims to provide a systematic overview of dispersed studies on Quality 4.0 across various disciplines and to stimulate further academic discussions and industrial transformations. Methods: Text mining and machine learning approaches are applied to learn and identify key research topics, and the suggested key references are manually reviewed to develop a state-of-the-art overview of Quality 4.0. Results: 1) A total of 27 key research topics were identified based on the analysis of 1234 research papers related to Quality 4.0. 2) A relationship among the 27 key research topics was identified. 3) A multilevel framework consisting of technological enablers, business methods and strategies, goals, application industries of Quality 4.0 was developed. 4) The trends of key research topics was analyzed. Conclusion: The identification of 27 key research topics and the development of the Quality 4.0 framework contribute to a better understanding of Quality 4.0. This research lays the groundwork for future academic and industrial advancements in the field and encourages further discussions and transformations within the industry.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Analysis on Research Trend of Productivity Using Text Mining - Focusing on KSCE Journal - (텍스트 마이닝을 통한 건설 생산성 분야의 연구동향 분석 - KSCE 저널을 중심으로 -)

  • Gu, Bongil;Huh, Youngki
    • Korean Journal of Construction Engineering and Management
    • /
    • v.21 no.2
    • /
    • pp.15-21
    • /
    • 2020
  • The relationship between keywords, found in all productivity related papers published in the KSCE journal for last 15 years, were analyzed in order to reveal a research trend in the area using text mining and A-Priori algorithm. As the results, it is found that the word of 'productivity' is most closely related to the words of 'work' and 'labor'. Futhermore, the word is somewhat related to those of 'factor', 'model', simulation', and 'work time'. It is also revealed that, on the other hand, the words of 'machine' and 'equipment' have little relationships with the keyword. This research will be a great help for academia to understand a research trend in the area of construction productivity.

The Research Trends and Keywords Modeling of Shoulder Rehabilitation using the Text-mining Technique (텍스트 마이닝 기법을 활용한 어깨 재활 연구분야 동향과 키워드 모델링)

  • Kim, Jun-hee;Jung, Sung-hoon;Hwang, Ui-jae
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.16 no.2
    • /
    • pp.91-100
    • /
    • 2021
  • PURPOSE: This study analyzed the trends and characteristics of shoulder rehabilitation research through keyword analysis, and their relationships were modeled using text mining techniques. METHODS: Abstract data of 10,121 articles in which abstracts were registered on the MEDLINE of PubMed with 'shoulder' and 'rehabilitation' as keywords were collected using python. By analyzing the frequency of words, 10 keywords were selected in the order of the highest frequency. Word-embedding was performed using the word2vec technique to analyze the similarity of words. In addition, the groups were classified and analyzed based on the distance (cosine similarity) through the t-SNE technique. RESULTS: The number of studies related to shoulder rehabilitation is increasing year after year, keywords most frequently used in relation to shoulder rehabilitation studies are 'patient', 'pain', and 'treatment'. The word2vec results showed that the words were highly correlated with 12 keywords from studies related to shoulder rehabilitation. Furthermore, through t-SNE, the keywords of the studies were divided into 5 groups. CONCLUSION: This study was the first study to model the keywords and their relationships that make up the abstracts of research in the MEDLINE of Pub Med related to 'shoulder' and 'rehabilitation' using text-mining techniques. The results of this study will help increase the diversifying research topics of shoulder rehabilitation studies to be conducted in the future.