• Title/Summary/Keyword: Classification mining

Search Result 735, Processing Time 0.026 seconds

A Implementation of Optimal Multiple Classification System using Data Mining for Genome Analysis

  • Jeong, Yu-Jeong;Choi, Gwang-Mi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.12
    • /
    • pp.43-48
    • /
    • 2018
  • In this paper, more efficient classification result could be obtained by applying the combination of the Hidden Markov Model and SVM Model to HMSV algorithm gene expression data which simulated the stochastic flow of gene data and clustering it. In this paper, we verified the HMSV algorithm that combines independently learned algorithms. To prove that this paper is superior to other papers, we tested the sensitivity and specificity of the most commonly used classification criteria. As a result, the K-means is 71% and the SOM is 68%. The proposed HMSV algorithm is 85%. These results are stable and high. It can be seen that this is better classified than using a general classification algorithm. The algorithm proposed in this paper is a stochastic modeling of the generation process of the characteristics included in the signal, and a good recognition rate can be obtained with a small amount of calculation, so it will be useful to study the relationship with diseases by showing fast and effective performance improvement with an algorithm that clusters nodes by simulating the stochastic flow of Gene Data through data mining of BigData.

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

Text Mining in Online Social Networks: A Systematic Review

  • Alhazmi, Huda N
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.396-404
    • /
    • 2022
  • Online social networks contain a large amount of data that can be converted into valuable and insightful information. Text mining approaches allow exploring large-scale data efficiently. Therefore, this study reviews the recent literature on text mining in online social networks in a way that produces valid and valuable knowledge for further research. The review identifies text mining techniques used in social networking, the data used, tools, and the challenges. Research questions were formulated, then search strategy and selection criteria were defined, followed by the analysis of each paper to extract the data relevant to the research questions. The result shows that the most social media platforms used as a source of the data are Twitter and Facebook. The most common text mining technique were sentiment analysis and topic modeling. Classification and clustering were the most common approaches applied by the studies. The challenges include the need for processing with huge volumes of data, the noise, and the dynamic of the data. The study explores the recent development in text mining approaches in social networking by providing state and general view of work done in this research area.

A study on data mining techniques for soil classification methods using cone penetration test results

  • Junghee Park;So-Hyun Cho;Jong-Sub Lee;Hyun-Ki Kim
    • Geomechanics and Engineering
    • /
    • v.35 no.1
    • /
    • pp.67-80
    • /
    • 2023
  • Due to the nature of the conjunctive Cone Penetration Test(CPT), which does not verify the actual sample directly, geotechnical engineers commonly classify the underground geomaterials using CPT results with the classification diagrams proposed by various researchers. However, such classification diagrams may fail to reflect local geotechnical characteristics, potentially resulting in misclassification that does not align with the actual stratification in regions with strong local features. To address this, this paper presents an objective method for more accurate local CPT soil classification criteria, which utilizes C4.5 decision tree models trained with the CPT results from the clay-dominant southern coast of Korea and the sand-dominant region in South Carolina, USA. The results and analyses demonstrate that the C4.5 algorithm, in conjunction with oversampling, outlier removal, and pruning methods, can enhance and optimize the decision tree-based CPT soil classification model.

A Neuro-Fuzzy Model Approach for the Land Cover Classification

  • Han, Jong-Gyu;Chi, Kwang-Hoon;Suh, Jae-Young
    • Proceedings of the KSRS Conference
    • /
    • 1998.09a
    • /
    • pp.122-127
    • /
    • 1998
  • This paper presents the neuro-fuzzy classifier derived from the generic model of a 3-layer fuzzy perceptron and developed the classification software based on the neuro-fuzzl model. Also, a comparison of the neuro-fuzzy and maximum-likelihood classifiers is presented in this paper. The Airborne Multispectral Scanner(AMS) imagery of Tae-Duk Science Complex Town were used for this comparison. The neuro-fuzzy classifier was more considerably accurate in the mixed composition area like "bare soil" , "dried grass" and "coniferous tree", however, the "cement road" and "asphalt road" classified more correctly with the maximum-likelihood classifier than the neuro-fuzzy classifier. Thus, the neuro-fuzzy model can be used to classify the mixed composition area like the natural environment of korea peninsula. From this research we conclude that the neuro-fuzzy classifier was superior in suppression of mixed pixel classification errors, and more robust to training site heterogeneity and the use of class labels for land use that are mixtures of land cover signatures.

  • PDF

Receiver Operating Characteristic Analysis by Data Mining

  • Rhee Seong-Won;Lee Jea-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.195-197
    • /
    • 2001
  • Data Mining is used to discover patterns and relationships in huge amounts of data. Researchers in many different fields have shown great interest in data mining analysis. Using the classification technique of data mining analysis, the available model for Receiver Operating Characteristic(ROC) method is presented. We present that this may help analyze result of data mining techniques.

  • PDF

러프집합과 계층적 분류구조를 이용한 데이터마이닝에서 분류지식발견

  • Lee, Chul-Heui;Seo, Seon-Hak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.3
    • /
    • pp.202-209
    • /
    • 2002
  • This paper deals with simplification of classification rules for data mining and rule bases for control systems. Datamining that extracts useful information from such a large amount of data is one of important issues. There are various ways in classification methodologies for data mining such as the decision trees and neural networks, but the result should be explicit and understandable and the classification rules be short and clear. The rough sets theory is an effective technique in extracting knowledge from incomplete and inconsistent data and provides a good solution for classification and approximation by using various attributes effectively This paper investigates granularity of knowledge for reasoning of uncertain concopts by using rough set approximations and uses a hierarchical classification structure that is more effective technique for classification by applying core to upper level. The proposed classification methodology makes analysis of an information system eary and generates minimal classification rules.

MOTIF BASED PROTEIN FUNCTION ANALYSIS USING DATA MINING

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.812-815
    • /
    • 2006
  • Proteins are essential agents for controlling, effecting and modulating cellular functions, and proteins with similar sequences have diverged from a common ancestral gene, and have similar structures and functions. Function prediction of unknown proteins remains one of the most challenging problems in bioinformatics. Recently, various computational approaches have been developed for identification of short sequences that are conserved within a family of closely related protein sequence. Protein function is often correlated with highly conserved motifs. Motif is the smallest unit of protein structure and function, and intends to make core part among protein structural and functional components. Therefore, prediction methods using data mining or machine learning have been developed. In this paper, we describe an approach for protein function prediction of motif-based models using data mining. Our work consists of three phrases. We make training and test data set and construct classifier using a training set. Also, through experiments, we evaluate our classifier with other classifiers in point of the accuracy of resulting classification.

  • PDF

Design of a Forecasting Model for Customer Classification in the Telecommunication Industries (통신 산업의 고객 분류를 위한 예측 모델 설계)

  • Lee Byoung-Yup;Joh Kyu-Ha;Song Seok-Il;Yoo Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.1
    • /
    • pp.179-189
    • /
    • 2006
  • Recently, according to the development of computer technology, a large amount of customer data have been stored in database. Using such data, decision makers extract the useful information to make a valuable plan with data mining. In this paper, we design a forecasting model that classifies the exiting customers in the telecommunication industries using the classification rule, one of the data mining technologies. In other words, this paper builds a model of customer loyalty detection and analyzes customer patterns in mobile communication service market with data mining using neural network and regression methods. This model improves the relationship of customers and enterprises. As a result, the enterprise creates the profits from many customers and the customer receives more benefits from the enterprise.

  • PDF

A Study on the Improvement of the Defense-related International Patent Classification using Patent Mining (특허 마이닝을 이용한 국방관련 국제특허분류 개선 방안 연구)

  • Kim, Kyung-Soo;Cho, Nam-Wook
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.1
    • /
    • pp.21-33
    • /
    • 2022
  • Purpose: As most defense technologies are classified as confidential, the corresponding International Patent Classifications (IPCs) require special attention. Consequently, the list of defense-related IPCs has been managed by the government. This paper aims to evaluate the defense-related IPCs and propose a methodology to revalidate and improve the IPC classification scheme. Methods: The patents in military technology and their corresponding IPCs during 2009~2020 were utilized in this paper. Prior to the analysis, patents are divided into private and public sectors. Social network analysis was used to analyze the convergence structure and central defense technology, and association rule mining analysis was used to analyze the convergence pattern. Results: While the public sector was highly cohesive, the private sector was characterized by easy convergence between technologies. In addition, narrow convergence was observed in the public sector, and wide convergence was observed in the private sector. As a result of analyzing the core technologies of defense technology, defense-related IPC candidates were identified. Conclusion: This paper presents a comprehensive perspective on the structure of convergence of defense technology and the pattern of convergence. It is also significant because it proposed a method for revising defense-related IPCs. The results of this study are expected to be used as guidelines for preparing amendments to the government's defense-related IPC.