• Title/Summary/Keyword: TF-IDF analysis

Search Result 196, Processing Time 0.025 seconds

An Analysis of Indications of Meridians in DongUiBoGam Using Data Mining (데이터마이닝을 이용한 동의보감에서 경락의 주치특성 분석)

  • Chae, Younbyoung;Ryu, Yeonhee;Jung, Won-Mo
    • Korean Journal of Acupuncture
    • /
    • v.36 no.4
    • /
    • pp.292-299
    • /
    • 2019
  • Objectives : DongUiBoGam is one of the representative medical literatures in Korea. We used text mining methods and analyzed the characteristics of the indications of each meridian in the second chapter of DongUiBoGam, WaeHyeong, which addresses external body elements. We also visualized the relationships between the meridians and the disease sites. Methods : Using the term frequency-inverse document frequency (TF-IDF) method, we quantified values regarding the indications of each meridian according to the frequency of the occurrences of 14 meridians and 14 disease sites. The spatial patterns of the indications of each meridian were visualized on a human body template according to the TF-IDF values. Using hierarchical clustering methods, twelve meridians were clustered into four groups based on the TF-IDF distributions of each meridian. Results : TF-IDF values of each meridian showed different constellation patterns at different disease sites. The spatial patterns of the indications of each meridian were similar to the route of the corresponding meridian. Conclusions : The present study identified spatial patterns between meridians and disease sites. These findings suggest that the constellations of the indications of meridians are primarily associated with the lines of the meridian system. We strongly believe that these findings will further the current understanding of indications of acupoints and meridians.

Construction of Onion Sentiment Dictionary using Cluster Analysis (군집분석을 이용한 양파 감성사전 구축)

  • Oh, Seungwon;Kim, Min Soo
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2917-2932
    • /
    • 2018
  • Many researches are accomplished as a result of the efforts of developing the production predicting model to solve the supply imbalance of onions which are vegetables very closely related to Korean food. But considering the possibility of storing onions, it is very difficult to solve the supply imbalance of onions only with predicting the production. So, this paper's purpose is trying to build a sentiment dictionary to predict the price of onions by using the internet articles which include the informations about the production of onions and various factors of the price, and these articles are very easy to access on our daily lives. Articles about onions are from 2012 to 2016, using TF-IDF for comparing with four kinds of TF-IDFs through the documents classification of wholesale prices of onions. As a result of classifying the positive/negative words for price by k-means clustering, DBSCAN (density based spatial cluster application with noise) clustering, GMM (Gaussian mixture model) clustering which are partitional clustering, GMM clustering is composed with three meaningful dictionaries. To compare the reasonability of these built dictionary, applying classified articles about the rise and drop of the price on logistic regression, and it shows 85.7% accuracy.

Analysis of User Reviews for Webtoon Applications Using Text Mining (텍스트 마이닝을 활용한 웹툰 애플리케이션 사용자 리뷰 분석)

  • Shin, Hyorim;Choi, Junho
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • With the rapid growth of the webtoon industry, a new model for webtoon applications has emerged. We have entered the era of webtoon application version 3.0 after ver 1.0 and ver 2.0. Despite these changes, research on user review analysis for webtoon applications is still insufficient. Therefore, this study aims to analyze user reviews for 'Kakao Webtoon (Daum Webtoon)' that presented the webtoon application 3.0 model. For analysis, 20,382 application reviews were collected and pre-processed, and TF-IDF, network analysis, topic modeling, and emotional analysis were conducted for each version. As a result, the user experience of the webtoon application for each version was analyzed and usability testing conducted.

Contents Recommendation Method Based on Social Network (소셜네트워크 기반의 콘텐츠 추천 방법)

  • Pei, Yun-Feng;Sohn, Jong-Soo;Chung, In-Jeong
    • The KIPS Transactions:PartB
    • /
    • v.18B no.5
    • /
    • pp.279-290
    • /
    • 2011
  • As the volume of internet and web contents have shown an explosive growth in recent years, lately contents recommendation system (CRS) has emerged as an important issue. Consequently, researches on contents recommendation method (CRM) for CRS have been conducted consistently. However, traditional CRMs have the limitations in that they are incapable of utilizing in web 2.0 environments where positions of content creators are important. In this paper, we suggest a novel way to recommend web contents of high quality using both degree of centrality and TF-IDF. For this purpose, we analyze TF-IDF and degree of centrality after collecting RSS and FOAF. Then we recommend contents using these two analyzed values. For the verification of the suggested method, we have developed the CRS and showed the results of contents recommendation. With the suggested idea we can analyze relations between users and contents on the entered query, and can consequently provide the appropriate contents to the user. Moreover, the implemented system we suggested in this paper can provide more reliable contents than traditional CRS because the importance of the role of content creators is reflected in the new system.

Rating and Comments Mining Using TF-IDF and SO-PMI for Improved Priority Ratings

  • Kim, Jinah;Moon, Nammee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5321-5334
    • /
    • 2019
  • Data mining technology is frequently used in identifying the intention of users over a variety of information contexts. Since relevant terms are mainly hidden in text data, it is necessary to extract them. Quantification is required in order to interpret user preference in association with other structured data. This paper proposes rating and comments mining to identify user priority and obtain improved ratings. Structured data (location and rating) and unstructured data (comments) are collected and priority is derived by analyzing statistics and employing TF-IDF. In addition, the improved ratings are generated by applying priority categories based on materialized ratings through Sentiment-Oriented Point-wise Mutual Information (SO-PMI)-based emotion analysis. In this paper, an experiment was carried out by collecting ratings and comments on "place" and by applying them. We confirmed that the proposed mining method is 1.2 times better than the conventional methods that do not reflect priorities and that the performance is improved to almost 2 times when the number to be predicted is small.

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

  • Kim, JungJin;Jeong, Hanseok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.236-236
    • /
    • 2021
  • 최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.

  • PDF

An Exploratory Study of Happiness and Unhappiness Among Koreans based on Text Mining Techniques (텍스트마이닝 기법을 활용한 한국인의 행복과 불행 탐색연구)

  • Park, Sanghyeon;Do, Kanghyuk;Kim, Hakyeong;Park, Gaeun;Yun, Jinhyeok;Kim, Kyungil
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.10-27
    • /
    • 2018
  • The purpose of this study is to explore the meaning of happiness and unhappiness in Korean society through text mining analysis. Similar words with keywords(happiness/unhappiness) from online news portal are extracted using Word2Vec and TF-IDF method. We also use the K-LIWC dictionary to perform the sentiment analysis of words associated with happiness and unhappiness. In TF-IDF analysis, happiness and unhappiness are highly related to social factors and social issues of the year. In Word2Vec analysis, 'Hope' has been similar with happiness for six years. In K-LIWC analysis, 'money/financial issues', 'school', 'communication' is highly related with happiness and unhappiness. In addition, 'physical condition and symptom' is highly related to unhappiness. Implications, limitations, and suggestions for future research are also discussed.

TF-IDF Based Association Rule Analysis System for Medical Data (의료 정보 추출을 위한 TF-IDF 기반의 연관규칙 분석 시스템)

  • Park, Hosik;Lee, Minsu;Hwang, Sungjin;Oh, Sangyoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.3
    • /
    • pp.145-154
    • /
    • 2016
  • Because of the recent interest in the u-Health and development of IT technology, a need of utilizing a medical information data has been increased. Among previous studies that utilize various data mining algorithms for processing medical information data, there are studies of association rule analysis. In the studies, an association between the symptoms with specified diseases is the target to discover, however, infrequent terms which can be important information for a disease diagnosis are not considered in most cases. In this paper, we proposed a new association rule mining system considering the importance of each term using TF-IDF weight to consider infrequent but important items. In addition, the proposed system can predict candidate diagnoses from medical text records using term similarity analysis based on medical ontology.

Comparison and Analysis of Domestic and Foreign Sports Brands Using Text Mining and Opinion Mining Analysis (텍스트 마이닝과 오피니언 마이닝 분석을 활용한 국내외 스포츠용품 브랜드 비교·분석 연구)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.6
    • /
    • pp.217-234
    • /
    • 2018
  • In this study, big data analysis was conducted for domestic and international sports goods brands. Text Mining, TF-IDF, Opinion Mining, interestity graph were conducted through the social matrix program Textom and the fashion data analysis platform MISP. In order to examine the recent recognition of sports brands, the period of study is limited to 1 year from January 1, 2017 to December 31, 2017. As a result of analysis, first, we could confirm the products representing each brand. Second, I could confirm the marketing that represents each brand. Third, the common words extracted from each brand were identified. Fourth, the emotions of positive and negative of each brand were confirmed.

An Analysis Scheme Design of Customer Spending Pattern using Text Mining (텍스트 마이닝을 이용한 소비자 소비패턴 분석 기법 설계)

  • Jeong, Eun-Hee;Lee, Byung-Kwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.2
    • /
    • pp.181-188
    • /
    • 2018
  • In this paper, we propose an analysis scheme of customer spending pattern using text mining. In proposed consumption pattern analysis scheme, first we analyze user's rating similarity using Pearson correlation, second we analyze user's review similarity using TF-IDF cosine similarity, third we analyze the consistency of the rating and review using Sendiwordnet. And we select the nearest neighbors using rating similarity and review similarity, and provide the recommended list that is proper with consumption pattern. The precision of recommended list are 0.79 for the Pearson correlation, 0.73 for the TF-IDF, and 0.82 for the proposed consumption pattern. That is, the proposed consumption pattern analysis scheme can more accurately analyze consumption pattern because it uses both quantitative rating and qualitative reviews of consumers.