• Title/Summary/Keyword: Data Mining Algorithm

Search Result 750, Processing Time 0.025 seconds

A Study on Data Association-Rules Mining of Content-Based Multimedia (내용 기반의 멀티미디어 데이터 연관규칙 마이닝에 대한 연구)

  • Kim, Jin-Ok;Hwang, Dae-Jun
    • The KIPS Transactions:PartD
    • /
    • v.9D no.1
    • /
    • pp.57-64
    • /
    • 2002
  • Few studies have been systematically pursued on a multimedia data mining in despite of the overwhelming amounts of multimedia data by the development of computer capacity, storage technology and Internet. Based on the preliminary image processing and content-based image retrieval technology, this paper presents the methods for discovering association rules from recurrent items with spatial relationships in huge data repositories. Furthermore, multimedia mining algorithm is proposed to find implicit association rules among objects of which content-based descriptors such as color, texture, shape and etc. are recurrent and of which descriptors have spatial relationships. The algorithm with recurrent items in images shows high efficiency to find set of frequent items as compared to the Apriori algorithm. The multimedia association-rules algorithm is specially effective when the collection of images is homogeneous and it can be applied to many multimedia-related application fields.

Efficient Mining of Frequent Subgraph with Connectivity Constraint

  • Moon, Hyun-S.;Lee, Kwang-H.;Lee, Do-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.267-271
    • /
    • 2005
  • The goal of data mining is to extract new and useful knowledge from large scale datasets. As the amount of available data grows explosively, it became vitally important to develop faster data mining algorithms for various types of data. Recently, an interest in developing data mining algorithms that operate on graphs has been increased. Especially, mining frequent patterns from structured data such as graphs has been concerned by many research groups. A graph is a highly adaptable representation scheme that used in many domains including chemistry, bioinformatics and physics. For example, the chemical structure of a given substance can be modelled by an undirected labelled graph in which each node corresponds to an atom and each edge corresponds to a chemical bond between atoms. Internet can also be modelled as a directed graph in which each node corresponds to an web site and each edge corresponds to a hypertext link between web sites. Notably in bioinformatics area, various kinds of newly discovered data such as gene regulation networks or protein interaction networks could be modelled as graphs. There have been a number of attempts to find useful knowledge from these graph structured data. One of the most powerful analysis tool for graph structured data is frequent subgraph analysis. Recurring patterns in graph data can provide incomparable insights into that graph data. However, to find recurring subgraphs is extremely expensive in computational side. At the core of the problem, there are two computationally challenging problems. 1) Subgraph isomorphism and 2) Enumeration of subgraphs. Problems related to the former are subgraph isomorphism problem (Is graph A contains graph B?) and graph isomorphism problem(Are two graphs A and B the same or not?). Even these simplified versions of the subgraph mining problem are known to be NP-complete or Polymorphism-complete and no polynomial time algorithm has been existed so far. The later is also a difficult problem. We should generate all of 2$^n$ subgraphs if there is no constraint where n is the number of vertices of the input graph. In order to find frequent subgraphs from larger graph database, it is essential to give appropriate constraint to the subgraphs to find. Most of the current approaches are focus on the frequencies of a subgraph: the higher the frequency of a graph is, the more attentions should be given to that graph. Recently, several algorithms which use level by level approaches to find frequent subgraphs have been developed. Some of the recently emerging applications suggest that other constraints such as connectivity also could be useful in mining subgraphs : more strongly connected parts of a graph are more informative. If we restrict the set of subgraphs to mine to more strongly connected parts, its computational complexity could be decreased significantly. In this paper, we present an efficient algorithm to mine frequent subgraphs that are more strongly connected. Experimental study shows that the algorithm is scaling to larger graphs which have more than ten thousand vertices.

  • PDF

The Transfer Technique among Decision Tree Models for Distributed Data Mining (분산형 데이터마이닝 구현을 위한 의사결정나무 모델 전송 기술)

  • Kim, Choong-Gon;Woo, Jung-Geun;Baik, Sung-Wook
    • Journal of Digital Contents Society
    • /
    • v.8 no.3
    • /
    • pp.309-314
    • /
    • 2007
  • A decision tree algorithm should be modified to be suitable in distributed and collaborative environments for distributed data mining. The distributed data mining system proposed in this paper consists of several agents and a mediator. Each agent deals with a local data mining for data in each local site and communicates with one another to build the global decision tree model. The mediator helps several agents to efficiently communicate among them. One of advantages in distributed data mining is to save much time to analyze huge data with several agents. The paper focuses on a transfer technique among agents dealing with each local decision tree model to reduce huge overhead in communication among them.

  • PDF

Intelligent Service Reasoning Model Using Data Mining In Smart Home Environments (스마트 홈 환경에서 데이터 마이닝 기법을 이용한 지능형 서비스 추론 모델)

  • Kang, Myung-Seok;Kim, Hag-Bae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.12B
    • /
    • pp.767-778
    • /
    • 2007
  • In this paper, we propose a Intelligent Service Reasoning (ISR) model using data mining in smart home environments. Our model creates a service tree used for service reasoning on the basis of C4.5 algorithm, one of decision tree algorithms, and reasons service that will be offered to users through quantitative weight estimation algorithm that uses quantitative characteristic rule and quantitative discriminant rule. The effectiveness in the performance of the developed model is validated through a smart home-network simulation.

Estimation of Smart Election System data

  • Park, Hyun-Sook;Hong, You-Sik
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.67-72
    • /
    • 2018
  • On the internal based search, the big data inference, which is failed in the president's election in the United States of America in 2016, is failed, because the prediction method is used on the base of the searching numerical value of a candidate for the presidency. Also the Flu Trend service is opened by the Google in 2008. But the Google was embarrassed for the fame's failure for the killing flu prediction system in 2011 and the prediction of presidential election in 2016. In this paper, using the virtual vote algorithm for virtual election and data mining method, the election prediction algorithm is proposed and unpacked. And also the WEKA DB is unpacked. Especially in this paper, using the K means algorithm and XEDOS tools, the prediction of election results is unpacked efficiently. Also using the analysis of the WEKA DB, the smart election prediction system is proposed in this paper.

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.803-816
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

A Design of FHIDS(Fuzzy logic based Hybrid Intrusion Detection System) using Naive Bayesian and Data Mining (나이브 베이지안과 데이터 마이닝을 이용한 FHIDS(Fuzzy Logic based Hybrid Intrusion Detection System) 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.5 no.3
    • /
    • pp.158-163
    • /
    • 2012
  • This paper proposes an FHIDS(Fuzzy logic based Hybrid Intrusion Detection System) design that detects anomaly and misuse attacks by using a Naive Bayesian algorithm, Data Mining, and Fuzzy Logic. The NB-AAD(Naive Bayesian based Anomaly Attack Detection) technique using a Naive Bayesian algorithm within the FHIDS detects anomaly attacks. The DM-MAD(Data Mining based Misuse Attack Detection) technique using Data Mining within it analyzes the correlation rules among packets and detects new attacks or transformed attacks by generating the new rule-based patterns or by extracting the transformed rule-based patterns. The FLD(Fuzzy Logic based Decision) technique within it judges the attacks by using the result of the NB-AAD and DM-MAD. Therefore, the FHIDS is the hybrid attack detection system that improves a transformed attack detection ratio, and reduces False Positive ratio by making it possible to detect anomaly and misuse attacks.

Data Mining for Uncertain Data Based on Difference Degree of Concept Lattice

  • Qian Wang;Shi Dong;Hamad Naeem
    • Journal of Information Processing Systems
    • /
    • v.20 no.3
    • /
    • pp.317-327
    • /
    • 2024
  • Along with the rapid development of the database technology, as well as the widespread application of the database management systems are more and more large. Now the data mining technology has already been applied in scientific research, financial investment, market marketing, insurance and medical health and so on, and obtains widespread application. We discuss data mining technology and analyze the questions of it. Therefore, the research in a new data mining method has important significance. Some literatures did not consider the differences between attributes, leading to redundancy when constructing concept lattices. The paper proposes a new method of uncertain data mining based on the concept lattice of connotation difference degree (c_diff). The method defines the two rules. The construction of a concept lattice can be accelerated by excluding attributes with poor discriminative power from the process. There is also a new technique of calculating c_diff, which does not scan the full database on each layer, therefore reducing the number of database scans. The experimental outcomes present that the proposed method can save considerable time and improve the accuracy of the data mining compared with U-Apriori algorithm.

Design and Implementation of Mobile CRM Utilizing Big Data Analysis Techniques (빅데이터 분석 기법을 활용한 모바일 CRM 설계 및 구현)

  • Kim, Young-Il;Yang, Seung-Su;Lee, Sang-Soon;Park, Seok-Cheon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.289-294
    • /
    • 2014
  • In the recent enterprises and are utilizing the CRM using data mining techniques and new marketing plan. However, data mining techniques are necessary expertise, general public access is difficult, it will be subject to constraints of time and space. in this paper, in order to solve this problem, we have proposed a Mobile CRM applying the data mining method. Thus, to analyze the structure of an existing CRM system, and defines the data flow and format. Also, define the process of the system, was designed sales trend analysis algorithm and customer sales recommendation algorithm using data mining techniques. Evaluation of the proposed system, through the test scenario to ensure proper operation, it was carried out the comparison and verification with the existing system. Results of the test, the value of existing programs and data matches to verify the reliability and use queries the proposed statistical tables to reduce the analysis time of data, it was verified rapidity.

Design and Implementation of Intelligent Society Member Management System (지능형 학회관리 시스템 설계 및 구현)

  • Jo Yung-Ki;Baik Sung-Wook;Bang Kee-Chun
    • Journal of Digital Contents Society
    • /
    • v.5 no.3
    • /
    • pp.205-212
    • /
    • 2004
  • This paper presents a design and implementation example of intelligent society member management system that is constructed to induce various research activity. Based on members data and society activity record, the system executed data mining. In the process of data mining useful society activity rules was produced and in result members could effectively interact with the system. Decision Tree Algorithm was used in the process, which is one of the methods of data mining. We presemts a plan for personalization website to provide user oriented administration policy and dynamic interface by using analyzed information of society activity rules produced.

  • PDF