• Title/Summary/Keyword: Distributed Data Mining

Search Result 110, Processing Time 0.028 seconds

Contribution to Improve Database Classification Algorithms for Multi-Database Mining

  • Miloudi, Salim;Rahal, Sid Ahmed;Khiat, Salim
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.709-726
    • /
    • 2018
  • Database classification is an important preprocessing step for the multi-database mining (MDM). In fact, when a multi-branch company needs to explore its distributed data for decision making, it is imperative to classify these multiple databases into similar clusters before analyzing the data. To search for the best classification of a set of n databases, existing algorithms generate from 1 to ($n^2-n$)/2 candidate classifications. Although each candidate classification is included in the next one (i.e., clusters in the current classification are subsets of clusters in the next classification), existing algorithms generate each classification independently, that is, without taking into account the use of clusters from the previous classification. Consequently, existing algorithms are time consuming, especially when the number of candidate classifications increases. To overcome the latter problem, we propose in this paper an efficient approach that represents the problem of classifying the multiple databases as a problem of identifying the connected components of an undirected weighted graph. Theoretical analysis and experiments on public databases confirm the efficiency of our algorithm against existing works and that it overcomes the problem of increase in the execution time.

Probabilistic Models for Local Patterns Analysis

  • Salim, Khiat;Hafida, Belbachir;Ahmed, Rahal Sid
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.145-161
    • /
    • 2014
  • Recently, many large organizations have multiple data sources (MDS') distributed over different branches of an interstate company. Local patterns analysis has become an effective strategy for MDS mining in national and international organizations. It consists of mining different datasets in order to obtain frequent patterns, which are forwarded to a centralized place for global pattern analysis. Various synthesizing models [2,3,4,5,6,7,8,26] have been proposed to build global patterns from the forwarded patterns. It is desired that the synthesized rules from such forwarded patterns must closely match with the mono-mining results (i.e., the results that would be obtained if all of the databases are put together and mining has been done). When the pattern is present in the site, but fails to satisfy the minimum support threshold value, it is not allowed to take part in the pattern synthesizing process. Therefore, this process can lose some interesting patterns, which can help the decider to make the right decision. In such situations we propose the application of a probabilistic model in the synthesizing process. An adequate choice for a probabilistic model can improve the quality of patterns that have been discovered. In this paper, we perform a comprehensive study on various probabilistic models that can be applied in the synthesizing process and we choose and improve one of them that works to ameliorate the synthesizing results. Finally, some experiments are presented in public database in order to improve the efficiency of our proposed synthesizing method.

EDF: An Interactive Tool for Event Log Generation for Enabling Process Mining in Small and Medium-sized Enterprises

  • Frans Prathama;Seokrae Won;Iq Reviessay Pulshashi;Riska Asriana Sutrisnowati
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.101-112
    • /
    • 2024
  • In this paper, we present EDF (Event Data Factory), an interactive tool designed to assist event log generation for process mining. EDF integrates various data connectors to improve its capability to assist users in connecting to diverse data sources. Our tool employs low-code/no-code technology, along with graph-based visualization, to help non-expert users understand process flow and enhance the user experience. By utilizing metadata information, EDF allows users to efficiently generate an event log containing case, activity, and timestamp attributes. Through log quality metrics, our tool enables users to assess the generated event log quality. We implement EDF under a cloud-based architecture and run a performance evaluation. Our case study and results demonstrate the usability and applicability of EDF. Finally, an observational study confirms that EDF is easy to use and beneficial, expanding small and medium-sized enterprises' (SMEs) access to process mining applications.

Design and Implementation of a USN Middleware for Context-Aware and Sensor Stream Mining

  • Jin, Cheng-Hao;Lee, Yang-Koo;Lee, Seong-Ho;Yun, Un-il;Ryu, Keun-Ho
    • Spatial Information Research
    • /
    • v.19 no.1
    • /
    • pp.127-133
    • /
    • 2011
  • Recently, with the advances in sensor techniques and net work computing, Ubiquitous Sensor Network (USN) has been received a lot of attentions from various communities. The sensor nodes distributed in the sensor network tend to continuously generate a large amount of data, which is called stream data. Sensor stream data arrives in an online manner so that it is characterized as high-speed, real-time and unbounded and it requires fast data processing to get the up-to-date results. The data stream has many application domains such as traffic analysis, physical distribution, U-healthcare and so on. Therefore, there is an overwhelming need of a USN middleware for processing such online stream data to provide corresponding services to diverse applications. In this paper, we propose a novel USN middleware which can provide users both context-aware service and meaningful sequential patterns. Our proposed USN middleware is mainly focused on location based applications which use stream location data. We also show the implementation of our proposed USN middleware. By using the proposed USN middleware, we can save the developing cost of providing context aware services and stream sequential patterns mainly in location based applications.

(Design of data mining IDS for new intrusion pattern) (새로운 침입 패턴을 위한 데이터 마이닝 침입 탐지 시스템 설계)

  • 편석범;정종근;이윤배
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.1
    • /
    • pp.77-82
    • /
    • 2002
  • IDS has been studied mainly in the field of the detection decision and collecting of audit data. The detection decision should decide whether successive behaviors are intrusions or not , the collecting of audit data needs ability that collects precisely data for intrusion decision. Artificial methods such as rule based system and neural network are recently introduced in order to solve this problem. However, these methods have simple host structures and defects that can't detect changed new intrusion patterns. So, we propose the method using data mining that can retrieve and estimate the patterns and retrieval of user's behavior in the distributed different hosts.

Learning Method for minimize false positive in IDS (침입탐지시스템에서 긍정적 결함을 최소화하기 위한 학습 방법)

  • 정종근;김철원
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.5
    • /
    • pp.978-985
    • /
    • 2003
  • The implementation of abnormal behavior detection IDS is more difficult than the implementation of misuse behavior detection IDS because usage patterns are various. Therefore, most of commercial IDS is misuse behavior detection IDS. However, misuse behavior detection IDS cannot detect system intrusion in case of modified intrusion patterns occurs. In this paper, we apply data mining so as to detect intrusion with only audit data related in intrusion among many audit data. The agent in the distributed IDS can collect log data as well as monitoring target system. False positive should be minimized in order to make detection accuracy high, that is, core of intrusion detection system. So We apply data mining algorithm for prediction of modified intrusion pattern in the level of audit data learning.

Design of data mining IDS for transformed intrusion pattern (변형 침입 패턴을 위한 데이터 마이닝 침입 탐지 시스템 설계)

  • 김용호;정종근;이윤배;김판구;염순자
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.479-482
    • /
    • 2001
  • IDS has been studied mainly in the field of the detection decision and collecting of audit data. The detection decision should decide whether successive behaviors are intrusions or not, the collecting of audit data needs ability that collects precisely data for intrusion decision. Artificial methods such as rule based system and neural network are recently introduced in order to solve this problem. However, these methods have simple host structures and defects that can't detect transformed intrusion patterns. So, we propose the method using data mining that can retrieve and estimate the patterns and retrieval of user's behavior in the distributed different hosts.

  • PDF

Design of Intrusion Detection System applying for data mining agent (데이터 마이닝 에이전트를 적용한 침입 탐지 시스템 설계)

  • 정종근;구제영;김용호;오근탁;이윤배
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.05a
    • /
    • pp.619-622
    • /
    • 2002
  • IDS has been studied mainly in the field of the detection derision and collecting of audit data. The detection decision should decide whether successive behaviors are intrusions or not , the collecting of audit data needs ability that collects precisely data for intrusion decision. Artificial methods such as rule based system and neural network are recently introduced in order to solve this problem. However, these methods have simple host structures and defects that can't detect transformed intrusion patterns. So, we propose the method using data mining agent that can retrieve and estimate the patterns and retrieval of user's behavior in the distributed different hosts.

  • PDF

SoFA: A Distributed File System for Search-Oriented Systems (SoFA: 검색 지향 시스템을 위한 분산 파일 시스템)

  • Choi, Eun-Mi;Tran, Doan Thanh;Upadhyaya, Bipin;Azimov, Fahriddin;Luu, Hoang Long;Truong, Phuong;Kim, Sang-Bum;Kim, Pil-Sung
    • Journal of the Korea Society for Simulation
    • /
    • v.17 no.4
    • /
    • pp.229-239
    • /
    • 2008
  • A Distributed File System (DFS) provides a mechanism in which a file can be stored across several physical computer nodes ensuring replication transparency and failure transparency. Applications that process large volumes of data (such as, search engines, grid computing applications, data mining applications, etc.) require a backend infrastructure for storing data. And the distributed file system is the central component for such storing data infrastructure. There have been many projects focused on network computing that have designed and implemented distributed file systems with a variety of architectures and functionalities. In this paper, we describe a complete distributed file system which can be used in large-scale search-oriented systems.

  • PDF

Design and Implementation of an Open Object Management System for Spatial Data Mining (공간 데이타 마이닝을 위한 개방형 객체 관리 시스템의 설계 및 구현)

  • Yun, Jae-Kwan;Oh, Byoung-Woo;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.1 no.1 s.1
    • /
    • pp.5-18
    • /
    • 1999
  • Recently, the necessity of automatic knowledge extraction from spatial data stored in spatial databases has been increased. Spatial data mining can be defined as the extraction of implicit knowledge, spatial relationships, or other knowledge not explicitly stored in spatial databases. In order to extract useful knowledge from spatial data, an object management system that can store spatial data efficiently, provide very fast indexing & searching mechanisms, and support a distributed computing environment is needed. In this paper, we designed and implemented an open object management system for spatial data mining, that supports efficient management of spatial, aspatial, and knowledge data. In order to develop this system, we used Open OODB that is a widely used object management system. However, the lark of facilities for spatial data mining in Open OODB, we extended it to support spatial data type, dynamic class generation, object-oriented inheritance, spatial index, spatial operations, etc. In addition, for further increasement of interoperability with other spatial database management systems or data mining systems, we adopted international standards such as ODMG 2.0 for data modeling, SDTS(Spatial Data Transfer Standard) for modeling and exchanging spatial data, and OpenGIS Simple Features Specification for CORBA for connecting clients and servers efficiently.

  • PDF