• Title/Summary/Keyword: Distributed data mining

Search Result 110, Processing Time 0.021 seconds

Students' Performance Prediction in Higher Education Using Multi-Agent Framework Based Distributed Data Mining Approach: A Review

  • M.Nazir;A.Noraziah;M.Rahmah
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.135-146
    • /
    • 2023
  • An effective educational program warrants the inclusion of an innovative construction which enhances the higher education efficacy in such a way that accelerates the achievement of desired results and reduces the risk of failures. Educational Decision Support System (EDSS) has currently been a hot topic in educational systems, facilitating the pupil result monitoring and evaluation to be performed during their development. Insufficient information systems encounter trouble and hurdles in making the sufficient advantage from EDSS owing to the deficit of accuracy, incorrect analysis study of the characteristic, and inadequate database. DMTs (Data Mining Techniques) provide helpful tools in finding the models or forms of data and are extremely useful in the decision-making process. Several researchers have participated in the research involving distributed data mining with multi-agent technology. The rapid growth of network technology and IT use has led to the widespread use of distributed databases. This article explains the available data mining technology and the distributed data mining system framework. Distributed Data Mining approach is utilized for this work so that a classifier capable of predicting the success of students in the economic domain can be constructed. This research also discusses the Intelligent Knowledge Base Distributed Data Mining framework to assess the performance of the students through a mid-term exam and final-term exam employing Multi-agent system-based educational mining techniques. Using single and ensemble-based classifiers, this study intends to investigate the factors that influence student performance in higher education and construct a classification model that can predict academic achievement. We also discussed the importance of multi-agent systems and comparative machine learning approaches in EDSS development.

Distributed Incremental Approximate Frequent Itemset Mining Using MapReduce

  • Mohsin Shaikh;Irfan Ali Tunio;Syed Muhammad Shehram Shah;Fareesa Khan Sohu;Abdul Aziz;Ahmad Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.207-211
    • /
    • 2023
  • Traditional methods for datamining typically assume that the data is small, centralized, memory resident and static. But this assumption is no longer acceptable, because datasets are growing very fast hence becoming huge from time to time. There is fast growing need to manage data with efficient mining algorithms. In such a scenario it is inevitable to carry out data mining in a distributed environment and Frequent Itemset Mining (FIM) is no exception. Thus, the need of an efficient incremental mining algorithm arises. We propose the Distributed Incremental Approximate Frequent Itemset Mining (DIAFIM) which is an incremental FIM algorithm and works on the distributed parallel MapReduce environment. The key contribution of this research is devising an incremental mining algorithm that works on the distributed parallel MapReduce environment.

The Transfer Technique among Decision Tree Models for Distributed Data Mining (분산형 데이터마이닝 구현을 위한 의사결정나무 모델 전송 기술)

  • Kim, Choong-Gon;Woo, Jung-Geun;Baik, Sung-Wook
    • Journal of Digital Contents Society
    • /
    • v.8 no.3
    • /
    • pp.309-314
    • /
    • 2007
  • A decision tree algorithm should be modified to be suitable in distributed and collaborative environments for distributed data mining. The distributed data mining system proposed in this paper consists of several agents and a mediator. Each agent deals with a local data mining for data in each local site and communicates with one another to build the global decision tree model. The mediator helps several agents to efficiently communicate among them. One of advantages in distributed data mining is to save much time to analyze huge data with several agents. The paper focuses on a transfer technique among agents dealing with each local decision tree model to reduce huge overhead in communication among them.

  • PDF

Distributed FTP Server for Log Mining System on ACE (분산 FTP 서버의 ACE 기반 로그 마이닝 시스템)

  • Min, Su-Hong;Cho, Dong-Sub
    • Proceedings of the KIEE Conference
    • /
    • 2002.11c
    • /
    • pp.465-468
    • /
    • 2002
  • Today large corporations are constructing distributed server environment. Many corporations are respectively operating Web server, FTP server, Mail server and DB server on heterogeneous operation. However, there is the problem that a manager must manage each server individually. In this paper, we present distributed FTP server for log mining system on ACE. Proposed log mining system is based upon ACE (Adaptive Communication Environment) framework and data mining techniques. This system provides a united operation with distributed FTP server.

  • PDF

Parallel Data Mining with Distributed Frequent Pattern Trees (분산형 FP트리를 활용한 병렬 데이터 마이닝)

  • 조두산;김동승
    • Proceedings of the IEEK Conference
    • /
    • 2003.07c
    • /
    • pp.2561-2564
    • /
    • 2003
  • Data mining is an effective method of the discovery of useful information such as rules and previously unknown patterns existing in large databases. The discovery of association rules is an important data mining problem. We have developed a new parallel mining called Distributed Frequent Pattern Tree (abbreviated by DFPT) algorithm on a distributed shared nothing parallel system to detect association rules. DFPT algorithm is devised for parallel execution of the FP-growth algorithm. It needs only two full disk data scanning of the database by eliminating the need for generating the candidate items. We have achieved good workload balancing throughout the mining process by distributing the work equally to all processors. We implemented the algorithm on a PC cluster system, and observed that the algorithm outperformed the Improved Count Distribution scheme.

  • PDF

Design and Implementation of a Distributed Data Mining Framework (분산된 데이터마이닝을 위한 프레임워크의 설계 및 구현)

  • Kadel, Prakash;Choi, Ho-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.336-340
    • /
    • 2007
  • We envisage that grid computing environments allow us to implement distributed data mining services, that is, those applications which analyze large sets of geographically distributed databases and information using the computational power and resources of a grid environment. This paper describes an experimental framework towards such a distributed data mining approach, including design considerations and a prototype implementation. Based on the "Knowledge Grid" architecture suggested by Cannataro et al., we identify four major components - user node, broker node, data node, and computation node - and define their individual roles. For implementing the prototype, we have investigated methods for utilizing distributed resources within a grid computing environment, e.g., communication and coordination among the various resources available.

  • PDF

A New Model to Enhance Efficiency in Distributed Data Mining Using Mobile Agent

  • Bardab, Saeed Ngmaldin;Ahmed, Tarig Mohamed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.275-286
    • /
    • 2021
  • As a result of the vast amount of data that is geographically found in different locations. Distributed data mining (DDM) has taken a center stage in data mining. The use of mobile agents to enhance efficiency in DDM has gained the attention of industries, commerce and academia because it offers serious suggestions on how to solve inherent problems associated with DDM. In this paper, a novel DDM model has been proposed by using a mobile agent to enhance efficiency. The main idea behind the model is to use the Naive Bayes algorithm to give the mobile agent the ability to learn, compare, get and store the results on it from each server which has different datasets and we found that the accuracy increased roughly by 0.9% which is our main target.

Implementation of Data Preparation System for Data Mining on Heterogenious Distributed Environment (이기종 분산환경에서 데이터마이닝을 위한 데이터준비 시스템 구현)

  • Lee sang hee;Lee won sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.9 no.3
    • /
    • pp.109-113
    • /
    • 2004
  • This paper is to investigate the efficiency of the process of data preparation for existing data mining tools, and present a design principle for a new efficient data preparation system . We compare the often used data mining tools based on the access method to local and remote databases, and on the exchange of information resources between different computers. The compared data mining tools are Answer Tree, Clementine, Enterprise Miner, and Weka. We propose a design principle for an efficient system for data preparation for data mining on the distributed networks.

  • PDF

Data Server Mining applied Neural Networks in Distributed Environment (분산 환경에서 신경망을 응용한 데이터 서버 마이닝)

  • 박민기;김귀태;이재완
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.473-476
    • /
    • 2003
  • Nowaday, Internet is doing the role of a large distributed information service tenter and various information and database servers managing it are in distributed network environment. However, the we have several difficulties in deciding the server to disposal input data depending on data properties. In this paper, we designed server mining mechanism and Intellectual data mining system architecture for the best efficiently dealing with input data pattern by using neural network among the various data in distributed environment. As a result, the new input data pattern could be operated after deciding the destination server according to dynamic binding method implemented by neural network. This mechanism can be applied Datawarehous, telecommunication and load pattern analysis, population census analysis and medical data analysis.

  • PDF

Pattern mining for large distributed dataset: A parallel approach (PMLDD)

  • Pal, Amrit;Kumar, Manish
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5287-5303
    • /
    • 2018
  • Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.