• Title/Summary/Keyword: Data Mining Technique

Search Result 638, Processing Time 0.029 seconds

Automated Generation Algorithm of the Penetration Scenarios using Association Mining Technique (연관 마이닝 기법을 이용한 침입 시나리오 자동생성 알고리즘)

  • 정경훈;주정은;황현숙;김창수
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1999.05a
    • /
    • pp.203-207
    • /
    • 1999
  • In this paper we propose the automated generation algorithm of penetration scenario using association mining technique. Until now known intrusion detections are classified into anomaly detection and misuse detection. The former uses statistical method, features selection, neural network method in order to decide intrusion, the latter uses conditional probability, expert system, state transition analysis, pattern matching for deciding intrusion. In proposed many intrusion detection algorithms unknown penetrations are created and updated by security experts. Our algorithm automatically generates penetration scenarios applying association mining technique to state transition technique. Association mining technique discovers efficient and useful unknown information in existing data. In this paper the algorithm we propose can automatically generate penetration scenarios to have been produced by security experts and is easy to cope with intrusions when it is compared to existing intrusion algorithms. Also It has advantage that maintenance cost is not high.

  • PDF

A Data Mining System for Supporting of Business Intelligence in e-Business (e-Business에서의 BI지원 데이타마이닝 시스템)

  • Lee, Jun-Wook;Baek, Ok-Hyun;Ryu, Keun-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.5
    • /
    • pp.489-500
    • /
    • 2002
  • As the interest in business interest is increased, data mining is increasingly used in BI as the core technique. To support Business Intelligence in e-business environment, the integrated data mining system which included in various mining operations should be able to flexibly integrate with database system and also it must provide the easy and efficient interface to implement the marketing process in various business applications. In this paper, we have implemented the EC-DaMiner system to support business intelligence in e-business area. The implemented system can be integrated with the conventional database system with the standard interface. Business applications can use MQL mining query language to discover the rules and mining result is modeled in marketing database, and the EC-DaMiner system make the implementation of business marketing process more easy.

Discovering Relationships between Skin Type and Life Style Using Data Mining Techniques: A Case Study of Korea

  • Kim, Taeheung;Ha, Jihyun;Lee, Jong-Seok;Oh, Younhak;Cho, Yong Ju
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.1
    • /
    • pp.110-121
    • /
    • 2016
  • With the growing interest in skincare and maintenance, there are increasing numbers of studies on the classification of skin type and the factors influencing each type. This study presents a novel methodology by using data mining, for the determination of the relationships between skin type, lifestyle, and patterns of cosmetic utilization. Eight skin-specific factors, which are moisture, sebum in U-zone (both cheeks), sebum in T-zone (forehead, nose, and chin), pore, melanin, wrinkle, acne, hemoglobin, were measured in 1,246 subjects living in South Korea, in conjunction with a questionnaire survey analyzing their lifestyles and pattern of cosmetic utilization. Using various multivariate statistical methods and data mining techniques, we classified the skin types based on the skin-specific values, determined the relationship between skin type and lifestyle, and accordingly sorted the subjects into clusters. Logistic regression analysis revealed gender-related differences in the skin; therefore, separate analyses were performed for males and females. Using the Gaussian Mixture Modeling (GMM) technique, we classified the subjects based on skin type (two male and four female). Using the ANOVA and decision tree techniques, we attempted to characterize the relationship between each skin type and the lifestyles of the subjects. Menstruation, eating habits, stress, and smoking were identified as the major factors affecting the skin.

Adaptive Data Mining Model using Fuzzy Performance Measures (퍼지 성능 측정자를 이용한 적응 데이터 마이닝 모델)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.541-546
    • /
    • 2006
  • Data Mining is the process of finding hidden patterns inside a large data set. Cluster analysis has been used as a popular technique for data mining. It is a fundamental process of data analysis and it has been Playing an important role in solving many problems in pattern recognition and image processing. If fuzzy cluster analysis is to make a significant contribution to engineering applications, much more attention must be paid to fundamental decision on the number of clusters in data. It is related to cluster validity problem which is how well it has identified the structure that Is present in the data. In this paper, we design an adaptive data mining model using fuzzy performance measures. It discovers clusters through an unsupervised neural network model based on a fuzzy objective function and evaluates clustering results by a fuzzy performance measure. We also present the experimental results on newsgroup data. They show that the proposed model can be used as a document classifier.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

A Study on Utilization of Korea Science Citation Database(KSCD) Based on Data Mining Techniques (데이터마이닝 기술을 이용한 한국과학기술인용색인DB 활용 방안 연구)

  • Park, Jong-Hyun;Choi, Seon-Heui;Kim, Byung-Kyu
    • Journal of Information Management
    • /
    • v.43 no.4
    • /
    • pp.191-210
    • /
    • 2012
  • Scholarly science citation data is typically of large volume and consists of a variety of data. Moreover, the volume of data is increasing more and more. Therefore, there are some requirements to store and manage the data efficiently and Korea Institute of Science and Technology Information (KISTI) develops Korea Science Citation Database (KSCD) which manage and serve very large-volume of korea science technique information including citation data. However, current services based on KSCD are not enough for various users. Thus, it is important issue to offer a variety of services using KSCD. For example, if a user searches articles described by a specific author, then a user may want to find not only the articles cited by a certain author but also those articles that study similar topics. However, it is not always easy to provide these services with citation data. Therefore, this paper surveys studies about services using citation data in order to find approaches for better utilizing KSCD. Especially, this paper considers data mining techniques, because data mining is one of the main techniques to extracting semantic information from big data. Therefore, this paper discusses methods for utilizing large volume of KSCD based on data mining technique.

Anomaly Detection Scheme Using Data Mining Methods (데이터마이닝 기법을 이용한 비정상행위 탐지 방법 연구)

  • 박광진;유황빈
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.13 no.2
    • /
    • pp.99-106
    • /
    • 2003
  • Intrusions pose a serious security risk in a network environment. For detecting the intrusion effectively, many researches have developed data mining framework for constructing intrusion detection modules. Traditional anomaly detection techniques focus on detecting anomalies in new data after training on normal data. To detect anomalous behavior, Precise normal Pattern is necessary. This training data is typically expensive to produce. For this, the understanding of the characteristics of data on network is inevitable. In this paper, we propose to use clustering and association rules as the basis for guiding anomaly detection. For applying entropy to filter noisy data, we present a technique for detecting anomalies without training on normal data. We present dynamic transaction for generating more effectively detection patterns.

Spatiotemporal Pattern Mining Technique for Location-Based Service System

  • Vu, Nhan Thi Hong;Lee, Jun-Wook;Ryu, Keun-Ho
    • ETRI Journal
    • /
    • v.30 no.3
    • /
    • pp.421-431
    • /
    • 2008
  • In this paper, we offer a new technique to discover frequent spatiotemporal patterns from a moving object database. Though the search space for spatiotemporal knowledge is extremely challenging, imposing spatial and timing constraints on moving sequences makes the computation feasible. The proposed technique includes two algorithms, AllMOP and MaxMOP, to find all frequent patterns and maximal patterns, respectively. In addition, to support the service provider in sending information to a user in a push-driven manner, we propose a rule-based location prediction technique to predict the future location of the user. The idea is to employ the algorithm AllMOP to discover the frequent movement patterns in the user's historical movements, from which frequent movement rules are generated. These rules are then used to estimate the future location of the user. The performance is assessed with respect to precision and recall. The proposed techniques could be quite efficiently applied in a location-based service (LBS) system in which diverse types of data are integrated to support a variety of LBSs.

  • PDF

Mining Frequent Itemsets using Time Unit Grouping (시간 단위 그룹핑을 이용한 빈발 아이템셋 마이닝)

  • Hwang, Jeong Hee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.647-653
    • /
    • 2022
  • Data mining is a technique that explores knowledge such as relationships and patterns between data by exploring and analyzing data. Data that occurs in the real world includes a temporal attribute. Temporal data mining research to find useful knowledge from data with temporal properties can be effectively utilized for predictive judgment that can predict the future. In this paper, we propose an algorithm using time-unit grouping to classify the database into regular time period units and discover frequent pattern itemsets in time units. The proposed algorithm organizes the transaction and items included in the time unit into a matrix, and discovers frequent items in the time unit through grouping. In the experimental results for the performance evaluation, it was found that the execution time was 1.2 times that of the existing algorithm, but more than twice the frequent pattern itemsets were discovered.

Research of Patent Technology Trends in Textile Materials: Text Mining Methodology Using DETM & STM (섬유소재 분야 특허 기술 동향 분석: DETM & STM 텍스트마이닝 방법론 활용)

  • Lee, Hyun Sang;Jo, Bo Geun;Oh, Se Hwan;Ha, Sung Ho
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.201-216
    • /
    • 2021
  • Purpose The purpose of this study is to analyze the trend of patent technology in textile materials using text mining methodology based on Dynamic Embedded Topic Model and Structural Topic Model. It is expected that this study will have positive impact on revitalizing and developing textile materials industry as finding out technology trends. Design/methodology/approach The data used in this study is 866 domestic patent text data in textile material from 1974 to 2020. In order to analyze technology trends from various aspect, Dynamic Embedded Topic Model and Structural Topic Model mechanism were used. The word embedding technique used in DETM is the GloVe technique. For Stable learning of topic modeling, amortized variational inference was performed based on the Recurrent Neural Network. Findings As a result of this analysis, it was found that 'manufacture' topics had the largest share among the six topics. Keyword trend analysis found the fact that natural and nanotechnology have recently been attracting attention. The metadata analysis results showed that manufacture technologies could have a high probability of patent registration in entire time series, but the analysis results in recent years showed that the trend of elasticity and safety technology is increasing.