• Title/Summary/Keyword: text mining analysis

Search Result 1,187, Processing Time 0.029 seconds

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

Investigation of the Possibility of Research on Medical Classics Applying Text Mining - Focusing on the Huangdi's Internal Classic - (텍스트마이닝(Text mining)을 활용한 한의학 원전 연구의 가능성 모색 -『황제내경(黃帝內經)』에 대한 적용례를 중심으로 -)

  • Bae, Hyo-jin;Kim, Chang-eop;Lee, Choong-yeol;Shin, Sang-won;Kim, Jong-hyun
    • Journal of Korean Medical classics
    • /
    • v.31 no.4
    • /
    • pp.27-46
    • /
    • 2018
  • Objectives : In this paper, we investigated the applicability of text mining to Korean Medical Classics and suggest that researchers of Medical Classics utilize this methodology. Methods : We applied text mining to the Huangdi's internal classic, a seminal text of Korean Medicine, and visualized networks which represent connectivity of terms and documents based on vector similarity. Then we compared this outcome to the prior knowledge generated through conventional qualitative analysis and examined whether our methodology could accurately reflect the keyword of documents, clusters of terms, and relationships between documents. Results : In the term network, we confirmed that Qi played a key role in the term network and that the theory development based on relativity between Yin and Yang was reflected. In the document network, Suwen and Lingshu are quite distinct from each other due to their differences in description form and topic. Also, Suwen showed high similarity between adjacent chapters. Conclusions : This study revealed that text mining method could yield a significant discovery which corresponds to prior knowledge about Huangdi's internal classic. Text mining can be used in a variety of research fields covering medical classics, literatures, and medical records. In addition, visualization tools can also be utilized for educational purposes.

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.6
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

Research on Methods for Processing Nonstandard Korean Words on Social Network Services (소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구)

  • Lee, Jong-Hwa;Le, Hoanh Su;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.35-46
    • /
    • 2016
  • Social network services (SNS) that help to build relationship network and share a particular interest or activity freely according to their interests by posting comments, photos, videos,${\ldots}$ on online communities such as blogs have adopted and developed widely as a social phenomenon. Several researches have been done to explore the pattern and valuable information in social networks data via text mining such as opinion mining and semantic analysis. For improving the efficiency of text mining, keyword-based approach have been applied but most of researchers argued the limitations of the rules of Korean orthography. This research aims to construct a database of non-standard Korean words which are difficulty in data mining such abbreviations, slangs, strange expressions, emoticons in order to improve the limitations in keyword-based text mining techniques. Based on the study of subjective opinions about specific topics on blogs, this research extracted non-standard words that were found useful in text mining process.

Violation Pattern Analysis for Good Manufacturing Practice for Medicine using t-SNE Based on Association Rule and Text Mining (우수 의약품 제조 기준 위반 패턴 인식을 위한 연관규칙과 텍스트 마이닝 기반 t-SNE분석)

  • Jun-O, Lee;So Young, Sohn
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.717-734
    • /
    • 2022
  • Purpose: The purpose of this study is to effectively detect violations that occur simultaneously against Good Manufacturing Practice, which were concealed by drug manufacturers. Methods: In this study, we present an analysis framework for analyzing regulatory violation patterns using Association Rule Mining (ARM), Text Mining, and t-distributed Stochastic Neighbor Embedding (t-SNE) to increase the effectiveness of on-site inspection. Results: A number of simultaneous violation patterns was discovered by applying Association Rule Mining to FDA's inspection data collected from October 2008 to February 2022. Among them there were 'concurrent violation patterns' derived from similar regulatory ranges of two or more regulations. These patterns do not help to predict violations that simultaneously appear but belong to different regulations. Those unnecessary patterns were excluded by applying t-SNE based on text-mining. Conclusion: Our proposed approach enables the recognition of simultaneous violation patterns during the on-site inspection. It is expected to decrease the detection time by increasing the likelihood of finding intentionally concealed violations.

An Enhanced Text Mining Approach using Ensemble Algorithm for Detecting Cyber Bullying

  • Z.Sunitha Bai;Sreelatha Malempati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.1-6
    • /
    • 2023
  • Text mining (TM) is most widely used to process the various unstructured text documents and process the data present in the various domains. The other name for text mining is text classification. This domain is most popular in many domains such as movie reviews, product reviews on various E-commerce websites, sentiment analysis, topic modeling and cyber bullying on social media messages. Cyber-bullying is the type of abusing someone with the insulting language. Personal abusing, sexual harassment, other types of abusing come under cyber-bullying. Several existing systems are developed to detect the bullying words based on their situation in the social networking sites (SNS). SNS becomes platform for bully someone. In this paper, An Enhanced text mining approach is developed by using Ensemble Algorithm (ETMA) to solve several problems in traditional algorithms and improve the accuracy, processing time and quality of the result. ETMA is the algorithm used to analyze the bullying text within the social networking sites (SNS) such as facebook, twitter etc. The ETMA is applied on synthetic dataset collected from various data a source which consists of 5k messages belongs to bullying and non-bullying. The performance is analyzed by showing Precision, Recall, F1-Score and Accuracy.

A Technical Approach for Suggesting Research Directions in Telecommunications Policy

  • Oh, Junseok;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.12
    • /
    • pp.4467-4488
    • /
    • 2014
  • The bibliometric analysis is widely used for understanding research domains, trends, and knowledge structures in a particular field. The analysis has majorly been used in the field of information science, and it is currently applied to other academic fields. This paper describes the analysis of academic literatures for classifying research domains and for suggesting empty research areas in the telecommunications policy. The application software is developed for retrieving Thomson Reuters' Web of Knowledge (WoK) data via web services. It also used for conducting text mining analysis from contents and citations of publications. We used three text mining techniques: the Keyword Extraction Algorithm (KEA) analysis, the co-occurrence analysis, and the citation analysis. Also, R software is used for visualizing the term frequencies and the co-occurrence network among publications. We found that policies related to social communication services, the distribution of telecommunications infrastructures, and more practical and data-driven analysis researches are conducted in a recent decade. The citation analysis results presented that the publications are generally received citations, but most of them did not receive high citations in the telecommunications policy. However, although recent publications did not receive high citations, the productivity of papers in terms of citations was increased in recent ten years compared to the researches before 2004. Also, the distribution methods of infrastructures, and the inequity and gap appeared as topics in important references. We proposed the necessity of new research domains since the analysis results implies that the decrease of political approaches for technical problems is an issue in past researches. Also, insufficient researches on policies for new technologies exist in the field of telecommunications. This research is significant in regard to the first bibliometric analysis with abstracts and citation data in telecommunications as well as the development of software which has functions of web services and text mining techniques. Further research will be conducted with Big Data techniques and more text mining techniques.

Comparison and Analysis of Domestic and Foreign Sports Brands Using Text Mining and Opinion Mining Analysis (텍스트 마이닝과 오피니언 마이닝 분석을 활용한 국내외 스포츠용품 브랜드 비교·분석 연구)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.6
    • /
    • pp.217-234
    • /
    • 2018
  • In this study, big data analysis was conducted for domestic and international sports goods brands. Text Mining, TF-IDF, Opinion Mining, interestity graph were conducted through the social matrix program Textom and the fashion data analysis platform MISP. In order to examine the recent recognition of sports brands, the period of study is limited to 1 year from January 1, 2017 to December 31, 2017. As a result of analysis, first, we could confirm the products representing each brand. Second, I could confirm the marketing that represents each brand. Third, the common words extracted from each brand were identified. Fourth, the emotions of positive and negative of each brand were confirmed.

Quantitative Text Mining for Social Science: Analysis of Immigrant in the Articles (사회과학을 위한 양적 텍스트 마이닝: 이주, 이민 키워드 논문 및 언론기사 분석)

  • Yi, Soo-Jeong;Choi, Doo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.118-127
    • /
    • 2020
  • The paper introduces trends and methodological challenges of quantitative Korean text analysis by using the case studies of academic and news media articles on "migration" and "immigration" within the periods of 2017-2019. The quantitative text analysis based on natural language processing technology (NLP) and this became an essential tool for social science. It is a part of data science that converts documents into structured data and performs hypothesis discovery and verification as the data and visualize data. Furthermore, we examed the commonly applied social scientific statistical models of quantitative text analysis by using Natural Language Processing (NLP) with R programming and Quanteda.

Competitive intelligence in Korean Ramen Market using Text Mining and Sentiment Analysis

  • Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.155-166
    • /
    • 2018
  • These days, online media, such as blogospheres, online communities, and social networking sites, provides the uncountable user-generated content (UGC) to discover market intelligence and business insight with. The business has been interested in consumers, and constantly requires the approach to identify consumers' opinions and competitive advantage in the competing market. Analyzing consumers' opinion about oneself and rivals can help decision makers to gain in-depth and fine-grained understanding on the human and social behavioral dynamics underlying the competition. In order to accomplish the comparison study for rival products and companies, we attempted to do competitive analysis using text mining with online UGC for two popular and competing ramens, a market leader and a market follower, in the Korean instant noodle market. Furthermore, to overcome the lack of the Korean sentiment lexicon, we developed the domain specific sentiment dictionary of Korean texts. We gathered 19,386 pieces of blogs and forum messages, developed the Korean sentiment dictionary, and defined the taxonomy for categorization. In the context of our study, we employed sentiment analysis to present consumers' opinion and statistical analysis to demonstrate the differences between the competitors. Our results show that the sentiment portrayed by the text mining clearly differentiate the two rival noodles and convincingly confirm that one is a market leader and the other is a follower. In this regard, we expect this comparison can help business decision makers to understand rich in-depth competitive intelligence hidden in the social media.