• Title/Summary/Keyword: Data Mining Technique

Search Result 632, Processing Time 0.027 seconds

A Study on the Development of Internet Purchase Support Systems Based on Data Mining and Case-Based Reasoning (데이터마이닝과 사례기반추론 기법에 기반한 인터넷 구매지원 시스템 구축에 관한 연구)

  • 김진성
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.28 no.3
    • /
    • pp.135-148
    • /
    • 2003
  • In this paper we introduce the Internet-based purchase support systems using data mining and case-based reasoning (CBR). Internet Business activity that involves the end user is undergoing a significant revolution. The ability to track users browsing behavior has brought the vendor and end customer's closer than ever before. It is now possible for a vendor to personalize his product message for individual customers at massive scale. Most of former researchers, in this research arena, used data mining techniques to pursue the customer's future behavior and to improve the frequency of repurchase. The area of data mining can be defined as efficiently discovering association rules from large collections of data. However, the basic association rule-based data mining technique was not flexible. If there were no inference rules to track the customer's future behavior, association rule-based data mining systems may not present more information. To resolve this problem, we combined association rule-based data mining with CBR mechanism. CBR is used in reasoning for customer's preference searching and training through the cases. Data mining and CBR-based hybrid purchase support mechanism can reflect both association rule-based logical inference and case-based information reuse. A Web-log data gathered in the real-world Internet shopping mall is given to illustrate the quality of the proposed systems.

A Fusion of Data Mining Techniques for Predicting Movement of Mobile Users

  • Duong, Thuy Van T.;Tran, Dinh Que
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.568-581
    • /
    • 2015
  • Predicting locations of users with portable devices such as IP phones, smart-phones, iPads and iPods in public wireless local area networks (WLANs) plays a crucial role in location management and network resource allocation. Many techniques in machine learning and data mining, such as sequential pattern mining and clustering, have been widely used. However, these approaches have two deficiencies. First, because they are based on profiles of individual mobility behaviors, a sequential pattern technique may fail to predict new users or users with movement on novel paths. Second, using similar mobility behaviors in a cluster for predicting the movement of users may cause significant degradation in accuracy owing to indistinguishable regular movement and random movement. In this paper, we propose a novel fusion technique that utilizes mobility rules discovered from multiple similar users by combining clustering and sequential pattern mining. The proposed technique with two algorithms, named the clustering-based-sequential-pattern-mining (CSPM) and sequential-pattern-mining-based-clustering (SPMC), can deal with the lack of information in a personal profile and avoid some noise due to random movements by users. Experimental results show that our approach outperforms existing approaches in terms of efficiency and prediction accuracy.

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.1-11
    • /
    • 2006
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF

Dummy Data Insert Scheme for Privacy Preserving Frequent Itemset Mining in Data Stream (데이터 스트림 빈발항목 마이닝의 프라이버시 보호를 위한 더미 데이터 삽입 기법)

  • Jung, Jay Yeol;Kim, Kee Sung;Jeong, Ik Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.3
    • /
    • pp.383-393
    • /
    • 2013
  • Data stream mining is a technique to obtain the useful information by analyzing the data generated in real time. In data stream mining technology, frequent itemset mining is a method to find the frequent itemset while data is transmitting, and these itemsets are used for the purpose of pattern analyze and marketing in various fields. Existing techniques of finding frequent itemset mining are having problems when a malicious attacker sniffing the data, it reveals data provider's real-time information. These problems can be solved by using a method of inserting dummy data. By using this method, a attacker cannot distinguish the original data from the transmitting data. In this paper, we propose a method for privacy preserving frequent itemset mining by using the technique of inserting dummy data. In addition, the proposed method is effective in terms of calculation because it does not require encryption technology or other mathematical operations.

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.10a
    • /
    • pp.59-69
    • /
    • 2005
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF

A Design of FHIDS(Fuzzy logic based Hybrid Intrusion Detection System) using Naive Bayesian and Data Mining (나이브 베이지안과 데이터 마이닝을 이용한 FHIDS(Fuzzy Logic based Hybrid Intrusion Detection System) 설계)

  • Lee, Byung-Kwan;Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.5 no.3
    • /
    • pp.158-163
    • /
    • 2012
  • This paper proposes an FHIDS(Fuzzy logic based Hybrid Intrusion Detection System) design that detects anomaly and misuse attacks by using a Naive Bayesian algorithm, Data Mining, and Fuzzy Logic. The NB-AAD(Naive Bayesian based Anomaly Attack Detection) technique using a Naive Bayesian algorithm within the FHIDS detects anomaly attacks. The DM-MAD(Data Mining based Misuse Attack Detection) technique using Data Mining within it analyzes the correlation rules among packets and detects new attacks or transformed attacks by generating the new rule-based patterns or by extracting the transformed rule-based patterns. The FLD(Fuzzy Logic based Decision) technique within it judges the attacks by using the result of the NB-AAD and DM-MAD. Therefore, the FHIDS is the hybrid attack detection system that improves a transformed attack detection ratio, and reduces False Positive ratio by making it possible to detect anomaly and misuse attacks.

A Study on the Data Fusion Method using Decision Rule for Data Enrichment (의사결정 규칙을 이용한 데이터 통합에 관한 연구)

  • Kim S.Y.;Chung S.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.291-303
    • /
    • 2006
  • Data mining is the work to extract information from existing data file. So, the one of best important thing in data mining process is the quality of data to be used. In this thesis, we propose the data fusion technique using decision rule for data enrichment that one phase to improve data quality in KDD process. Simulations were performed to compare the proposed data fusion technique with the existing techniques. As a result, our data fusion technique using decision rule is characterized with low MSE or misclassification rate in fusion variables.

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Recent Technique Analysis, Infant Commodity Pattern Analysis Scenario and Performance Analysis of Incremental Weighted Maximal Representative Pattern Mining (점진적 가중화 맥시멀 대표 패턴 마이닝의 최신 기법 분석, 유아들의 물품 패턴 분석 시나리오 및 성능 분석)

  • Yun, Unil;Yun, Eunmi
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.39-48
    • /
    • 2020
  • Data mining techniques have been suggested to find efficiently meaningful and useful information. Especially, in the big data environments, as data becomes accumulated in several applications, related pattern mining methods have been proposed. Recently, instead of analyzing not only static data stored already in files or databases, mining dynamic data incrementally generated in a real time is considered as more interesting research areas because these dynamic data can be only one time read. With this reason, researches of how these dynamic data are mined efficiently have been studied. Moreover, approaches of mining representative patterns such as maximal pattern mining have been proposed since a huge number of result patterns as mining results are generated. As another issue, to discover more meaningful patterns in real world, weights of items in weighted pattern mining have been used, In real situation, profits, costs, and so on of items can be utilized as weights. In this paper, we analyzed weighted maximal pattern mining approaches for data generated incrementally. Maximal representative pattern mining techniques, and incremental pattern mining methods. And then, the application scenarios for analyzing the required commodity patterns in infants are presented by applying weighting representative pattern mining. Furthermore, the performance of state-of-the-art algorithms have been evaluated. As a result, we show that incremental weighted maximal pattern mining technique has better performance than incremental weighted pattern mining and weighted maximal pattern mining.

Integrated Corporate Bankruptcy Prediction Model Using Genetic Algorithms (유전자 알고리즘 기반의 기업부실예측 통합모형)

  • Ok, Joong-Kyung;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.99-121
    • /
    • 2009
  • Recently, there have been many studies that predict corporate bankruptcy using data mining techniques. Although various data mining techniques have been investigated, some researchers have tried to combine the results of each data mining technique in order to improve classification performance. In this study, we classify 4 types of data mining techniques via their characteristics and select representative techniques of each type then combine them using a genetic algorithm. The genetic algorithm may find optimal or near-optimal solution because it is a global optimization technique. This study compares the results of single models, typical combination models, and the proposed integration model using the genetic algorithm.

  • PDF