• Title/Summary/Keyword: Data Mining Algorithm

Search Result 750, Processing Time 0.029 seconds

A Study on the Hybrid Data Mining Mechanism Based on Association Rules and Fuzzy Neural Networks (연관규칙과 퍼지 인공신경망에 기반한 하이브리드 데이터마이닝 메커니즘에 관한 연구)

  • Kim Jin Sung
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.884-888
    • /
    • 2003
  • In this paper, we introduce the hybrid data mining mechanism based in association rule and fuzzy neural networks (FNN). Most of data mining mechanisms are depended in the association rule extraction algorithm. However, the basic association rule-based data mining has not the learning ability. In addition, sequential patterns of association rules could not represent the complicate fuzzy logic. To resolve these problems, we suggest the hybrid mechanism using association rule-based data mining, and fuzzy neural networks. Our hybrid data mining mechanism was consisted of four phases. First, we used general association rule mining mechanism to develop the initial rule-base. Then, in the second phase, we used the fuzzy neural networks to learn the past historical patterns embedded in the database. Third, fuzzy rule extraction algorithm was used to extract the implicit knowledge from the FNN. Fourth, we combine the association knowledge base and fuzzy rules. Our proposed hybrid data mining mechanism can reflect both association rule-based logical inference and complicate fuzzy logic.

  • PDF

Genetic Algorithm Application to Machine Learning

  • Han, Myung-mook;Lee, Yill-byung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.633-640
    • /
    • 2001
  • In this paper we examine the machine learning issues raised by the domain of the Intrusion Detection Systems(IDS), which have difficulty successfully classifying intruders. There systems also require a significant amount of computational overhead making it difficult to create robust real-time IDS. Machine learning techniques can reduce the human effort required to build these systems and can improve their performance. Genetic algorithms are used to improve the performance of search problems, while data mining has been used for data analysis. Data Mining is the exploration and analysis of large quantities of data to discover meaningful patterns and rules. Among the tasks for data mining, we concentrate the classification task. Since classification is the basic element of human way of thinking, it is a well-studied problem in a wide variety of application. In this paper, we propose a classifier system based on genetic algorithm, and the proposed system is evaluated by applying it to IDS problem related to classification task in data mining. We report our experiments in using these method on KDD audit data.

  • PDF

Industrial Waste Database Analysis Using Data Mining

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.241-251
    • /
    • 2006
  • Data mining is the method to find useful information for large amounts of data in database It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. We analyze industrial waste database using data mining technique. We use k-means algorithm for clustering and C5.0 algorithm for decision tree and Apriori algorithm for association rule. We can use these analysis outputs for environmental preservation and environmental improvement.

  • PDF

Short-term Water Demand Forecasting Algorithm Based on Kalman Filtering with Data Mining (데이터 마이닝과 칼만필터링에 기반한 단기 물 수요예측 알고리즘)

  • Choi, Gee-Seon;Shin, Gang-Wook;Lim, Sang-Heui;Chun, Myung-Geun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.10
    • /
    • pp.1056-1061
    • /
    • 2009
  • This paper proposes a short-term water demand forecasting algorithm based on kalman filtering with data mining for sustainable water supply and effective energy saving. The proposed algorithm utilizes a mining method of water supply data and a decision tree method with special days like Chuseok. And the parameters of MLAR (Multi Linear Auto Regression) model are estimated by Kalman filtering algorithm. Thus, we can achieve the practicality of the proposed forecasting algorithm through the good results applied to actual operation data.

A Study of Data Mining Optimization Model for the Credit Evaluation

  • Kim, Kap-Sik;Lee, Chang-Soon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.825-836
    • /
    • 2003
  • Based on customer information and financing processes in capital market, we derived individual models by applying multi-layered perceptrons, MDA, and decision tree. Further, the results from the existing single models were compared with the results from the integrated model that was developed using genetic algorithm. This study contributes not only to verifying the existing individual models and but also to overcoming the limitations of the existing approaches. We have depended upon the approaches that compare individual models and search for the best-fit model. However, this study presents a methodology to build an integrated data mining model using genetic algorithm.

  • PDF

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Performance Comparison of Decision Trees of J48 and Reduced-Error Pruning

  • Jin, Hoon;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.5 no.1
    • /
    • pp.30-33
    • /
    • 2016
  • With the advent of big data, data mining is more increasingly utilized in various decision-making fields by extracting hidden and meaningful information from large amounts of data. Even as exponential increase of the request of unrevealing the hidden meaning behind data, it becomes more and more important to decide to select which data mining algorithm and how to use it. There are several mainly used data mining algorithms in biology and clinics highlighted; Logistic regression, Neural networks, Supportvector machine, and variety of statistical techniques. In this paper it is attempted to compare the classification performance of an exemplary algorithm J48 and REPTree of ML algorithms. It is confirmed that more accurate classification algorithm is provided by the performance comparison results. More accurate prediction is possible with the algorithm for the goal of experiment. Based on this, it is expected to be relatively difficult visually detailed classification and distinction.

An Efficient Algorithm for Mining Frequent Closed Itemsets Using Transaction Link Structure (트랜잭션 연결 구조를 이용한 빈발 Closed 항목집합 마이닝 알고리즘)

  • Han, Kyong Rok;Kim, Jae Yearn
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.3
    • /
    • pp.242-252
    • /
    • 2006
  • Data mining is the exploration and analysis of huge amounts of data to discover meaningful patterns. One of the most important data mining problems is association rule mining. Recent studies of mining association rules have proposed a closure mechanism. It is no longer necessary to mine the set of all of the frequent itemsets and their association rules. Rather, it is sufficient to mine the frequent closed itemsets and their corresponding rules. In the past, a number of algorithms for mining frequent closed itemsets have been based on items. In this paper, we use the transaction itself for mining frequent closed itemsets. An efficient algorithm is proposed that is based on a link structure between transactions. Our experimental results show that our algorithm is faster than previously proposed methods. Furthermore, our approach is significantly more efficient for dense databases.

Distributed Incremental Approximate Frequent Itemset Mining Using MapReduce

  • Mohsin Shaikh;Irfan Ali Tunio;Syed Muhammad Shehram Shah;Fareesa Khan Sohu;Abdul Aziz;Ahmad Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.207-211
    • /
    • 2023
  • Traditional methods for datamining typically assume that the data is small, centralized, memory resident and static. But this assumption is no longer acceptable, because datasets are growing very fast hence becoming huge from time to time. There is fast growing need to manage data with efficient mining algorithms. In such a scenario it is inevitable to carry out data mining in a distributed environment and Frequent Itemset Mining (FIM) is no exception. Thus, the need of an efficient incremental mining algorithm arises. We propose the Distributed Incremental Approximate Frequent Itemset Mining (DIAFIM) which is an incremental FIM algorithm and works on the distributed parallel MapReduce environment. The key contribution of this research is devising an incremental mining algorithm that works on the distributed parallel MapReduce environment.

A Process Mining using Association Rule and Sequence Pattern (연관규칙과 순차패턴을 이용한 프로세스 마이닝)

  • Chung, So-Young;Kwon, Soo-Tae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.31 no.2
    • /
    • pp.104-111
    • /
    • 2008
  • A process mining is considered to support the discovery of business process for unstructured process model, and a process mining algorithm by using the associated rule and sequence pattern of data mining is developed to extract information about processes from event-log, and to discover process of alternative, concurrent and hidden activities. Some numerical examples are presented to show the effectiveness and efficiency of the algorithm.