• 제목/요약/키워드: database mining

검색결과 572건 처리시간 0.025초

트리 구조를 이용한 연관규칙의 효율적 탐색 (An Efficient Tree Structure Method for Mining Association Rules)

  • 김창오;안광일;김성집;김재련
    • 대한산업공학회지
    • /
    • 제27권1호
    • /
    • pp.30-36
    • /
    • 2001
  • We present a new algorithm for mining association rules in the large database. Association rules are the relationships of items in the same transaction. These rules provide useful information for marketing. Since Apriori algorithm was introduced in 1994, many researchers have worked to improve Apriori algorithm. However, the drawback of Apriori-based algorithm is that it scans the transaction database repeatedly. The algorithm which we propose scans the database twice. The first scanning of the database collects frequent length l-itemsets. And then, the algorithm scans the database one more time to construct the data structure Common-Item Tree which stores the information about frequent itemsets. To find all frequent itemsets, the algorithm scans Common-Item Tree instead of the database. As scanning Common-Item Tree takes less time than scanning the database, the algorithm proposed is more efficient than Apriori-based algorithm.

  • PDF

데이터 마이닝에서 샘플링 기법을 이용한 연속패턴 알고리듬 (An Algorithm for Sequential Sampling Method in Data Mining)

  • 홍지명;김낙현;김성집
    • 산업경영시스템학회지
    • /
    • 제21권45호
    • /
    • pp.101-112
    • /
    • 1998
  • Data mining, which is also referred to as knowledge discovery in database, means a process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as knowledge rules, constraints, regularities) from data in databases. The discovered knowledge can be applied to information management, decision making, and many other applications. In this paper, a new data mining problem, discovering sequential patterns, is proposed which is to find all sequential patterns using sampling method. Recognizing that the quantity of database is growing exponentially and transaction database is frequently updated, sampling method is a fast algorithm reducing time and cost while extracting the trend of customer behavior. This method analyzes the fraction of database but can in general lead to results of a very high degree of accuracy. The relaxation factor, as well as the sample size, can be properly adjusted so as to improve the result accuracy while minimizing the corresponding execution time. The superiority of the proposed algorithm will be shown through analyzing accuracy and efficiency by comparing with Apriori All algorithm.

  • PDF

I-Tree: A Frequent Patterns Mining Approach without Candidate Generation or Support Constraint

  • Tanbeer, Syed Khairuzzaman;Sarkar, Jehad;Jeong, Byeong-Soo;Lee, Young-Koo;Lee, Sung-Young
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2007년도 춘계학술발표대회
    • /
    • pp.31-33
    • /
    • 2007
  • Devising an efficient one-pass frequent pattern mining algorithm has been an issue in data mining research in recent past. Pattern growth algorithms like FP-Growth which are found more efficient than candidate generation and test algorithms still require two database scans. Moreover, FP-growth approach requires rebuilding the base-tree while mining with different support counts. In this paper we propose an item-based tree, called I-Tree that not only efficiently mines frequent patterns with single database scan but also provides multiple mining scopes with multiple support thresholds. The 'build-once-mine-many' property of I-Tree allows it to construct the tree only once and perform mining operation several times with the variation of support count values.

  • PDF

데이터 큐브를 이용한 연관규칙 발견 알고리즘 (-An Algorithm for Cube-based Mining Association Rules and Application to Database Marketing)

  • 한경록;김재련
    • 산업경영시스템학회지
    • /
    • 제23권54호
    • /
    • pp.27-36
    • /
    • 2000
  • The problem of discovering association rules is an emerging research area, whose goal is to extract significant patterns or interesting rules from large databases and several algorithms for mining association rules have been applied to item-oriented sales transaction databases. Data warehouses and OLAP engines are expected to be widely available. OLAP and data mining are complementary; both are important parts of exploiting data. Our study shows that data cube is an efficient structure for mining association rules. OLAP databases are expected to be a major platform for data mining in the future. In this paper, we present an efficient and effective algorithm for mining association rules using data cube. The algorithm can be applicable to enhance the power of competitiveness of business organizations by providing rapid decision support and efficient database marketing through customer segmentation.

  • PDF

EMI database analysis focusing on relationship between density and mechanical properties of sedimentary rocks

  • Burkhardt, Michael;Kim, Eunhye;Nelson, Priscilla P.
    • Geomechanics and Engineering
    • /
    • 제14권5호
    • /
    • pp.491-498
    • /
    • 2018
  • The Earth Mechanics Institute (EMI) was established at the Colorado School of Mines (CSM) in 1974 to develop innovations in rock mechanics research and education. During the last four decades, extensive rock mechanics research has been conducted at the EMI. Results from uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), point load index (PLI), punch penetration (PP), and many other types of tests have been recorded in a database that has been unexamined for research purposes. The EMI database includes over 20,000 tests from over 1,000 different projects including mining and underground construction, and analysis of this database to identify relationships has been started with preliminary results reported here. Overall, statistically significant correlations are identified between bulk density and mechanical strength properties through UCS, BTS, PLI, and PP testing of sedimentary, igneous, and metamorphic rocks. In this paper, bulk density is considered as a surrogate metric that reflects both mineralogy and porosity. From this analysis, sedimentary rocks show the strongest correlation between the UCS and bulk density, whereas metamorphic rocks exhibit the strongest correlation between UCS and PP. Data trends in the EMI database also reveal a linear relationship between UCS and BTS tests. For the singular case of rock coral, the database permits correlations between bulk density of the core versus the deposition depth and porosity. The EMI database will continue under analysis, and will provide additional insightful and comprehensive understanding of the variation and predictability of rock mechanical strength properties and density. This knowledge will contribute significantly toward the increasingly safe and cost-effective geostructures and construction.

From The Discovery Challenge on Thrombosis Data

  • Takabayashi, Katsuhiko;Tsumoto, Shusaku
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2001년도 The Pacific Aisan Confrence On Intelligent Systems 2001
    • /
    • pp.361-363
    • /
    • 2001
  • Although data mining promises a new paradigm to discover medical knowledge form a database, there are many problems to be solved before real application is feasible. We had the chance to provide a data set to be analyzed as a discovery challenge by using various data mining techniques at the PKDD conference. As data providers, we evaluated and discussed results and clarified problems.

  • PDF

Self-Evolving Expert Systems based on Fuzzy Neural Network and RDB Inference Engine

  • Kim, Jin-Sung
    • 지능정보연구
    • /
    • 제9권2호
    • /
    • pp.19-38
    • /
    • 2003
  • In this research, we propose the mechanism to develop self-evolving expert systems (SEES) based on data mining (DM), fuzzy neural networks (FNN), and relational database (RDB)-driven forward/backward inference engine. Most researchers had tried to develop a text-oriented knowledge base (KB) and inference engine (IE). However, this approach had some limitations such as 1) automatic rule extraction, 2) manipulation of ambiguousness in knowledge, 3) expandability of knowledge base, and 4) speed of inference. To overcome these limitations, knowledge engineers had tried to develop an automatic knowledge extraction mechanism. As a result, the adaptability of the expert systems was improved. Nonetheless, they didn't suggest a hybrid and generalized solution to develop self-evolving expert systems. To this purpose, we propose an automatic knowledge acquisition and composite inference mechanism based on DM, FNN, and RDB-driven inference engine. Our proposed mechanism has five advantages. First, it can extract and reduce the specific domain knowledge from incomplete database by using data mining technology. Second, our proposed mechanism can manipulate the ambiguousness in knowledge by using fuzzy membership functions. Third, it can construct the relational knowledge base and expand the knowledge base unlimitedly with RDBMS (relational database management systems) module. Fourth, our proposed hybrid data mining mechanism can reflect both association rule-based logical inference and complicate fuzzy relationships. Fifth, RDB-driven forward and backward inference time is shorter than the traditional text-oriented inference time.

  • PDF

Framework for False Alarm Pattern Analysis of Intrusion Detection System using Incremental Association Rule Mining

  • Chon Won Yang;Kim Eun Hee;Shin Moon Sun;Ryu Keun Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.716-718
    • /
    • 2004
  • The false alarm data in intrusion detection systems are divided into false positive and false negative. The false positive makes bad effects on the performance of intrusion detection system. And the false negative makes bad effects on the efficiency of intrusion detection system. Recently, the most of works have been studied the data mining technique for analysis of alert data. However, the false alarm data not only increase data volume but also change patterns of alert data along the time line. Therefore, we need a tool that can analyze patterns that change characteristics when we look for new patterns. In this paper, we focus on the false positives and present a framework for analysis of false alarm pattern from the alert data. In this work, we also apply incremental data mining techniques to analyze patterns of false alarms among alert data that are incremental over the time. Finally, we achieved flexibility by using dynamic support threshold, because the volume of alert data as well as included false alarms increases irregular.

  • PDF

Contribution to Improve Database Classification Algorithms for Multi-Database Mining

  • Miloudi, Salim;Rahal, Sid Ahmed;Khiat, Salim
    • Journal of Information Processing Systems
    • /
    • 제14권3호
    • /
    • pp.709-726
    • /
    • 2018
  • Database classification is an important preprocessing step for the multi-database mining (MDM). In fact, when a multi-branch company needs to explore its distributed data for decision making, it is imperative to classify these multiple databases into similar clusters before analyzing the data. To search for the best classification of a set of n databases, existing algorithms generate from 1 to ($n^2-n$)/2 candidate classifications. Although each candidate classification is included in the next one (i.e., clusters in the current classification are subsets of clusters in the next classification), existing algorithms generate each classification independently, that is, without taking into account the use of clusters from the previous classification. Consequently, existing algorithms are time consuming, especially when the number of candidate classifications increases. To overcome the latter problem, we propose in this paper an efficient approach that represents the problem of classifying the multiple databases as a problem of identifying the connected components of an undirected weighted graph. Theoretical analysis and experiments on public databases confirm the efficiency of our algorithm against existing works and that it overcomes the problem of increase in the execution time.

Data Mining with Constructing Database and Researching Trend Investigation Related with the Field of Nonlinear Problem

  • Niimi, Ayahiko
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 ISIS 2003
    • /
    • pp.292-295
    • /
    • 2003
  • In this paper, we propose an approach which contains with constructing a bibliography information database, extracting the fields of research, and researching trend of them, using data mining. To apply our approach to IEICE Technical Report (nonlinear problem society), the database was constructed based on its report, keywords were analyzed using the frequency analysis and the association analysis, and we discussed about the result. We could extract some field of research from the result.

  • PDF