• Title/Summary/Keyword: Knowledge-Based Data Mining

Search Result 263, Processing Time 0.027 seconds

Data Mining Algorithm Based on Fuzzy Decision Tree for Pattern Classification (퍼지 결정트리를 이용한 패턴분류를 위한 데이터 마이닝 알고리즘)

  • Lee, Jung-Geun;Kim, Myeong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1314-1323
    • /
    • 1999
  • 컴퓨터의 사용이 일반화됨에 따라 데이타를 생성하고 수집하는 것이 용이해졌다. 이에 따라 데이타로부터 자동적으로 유용한 지식을 얻는 기술이 필요하게 되었다. 데이타 마이닝에서 얻어진 지식은 정확성과 이해성을 충족해야 한다. 본 논문에서는 데이타 마이닝을 위하여 퍼지 결정트리에 기반한 효율적인 퍼지 규칙을 생성하는 알고리즘을 제안한다. 퍼지 결정트리는 ID3와 C4.5의 이해성과 퍼지이론의 추론과 표현력을 결합한 방법이다. 특히, 퍼지 규칙은 속성 축에 평행하게 판단 경계선을 결정하는 방법으로는 어려운 속성 축에 평행하지 않는 경계선을 갖는 패턴을 효율적으로 분류한다. 제안된 알고리즘은 첫째, 각 속성 데이타의 히스토그램 분석을 통해 적절한 소속함수를 생성한다. 둘째, 주어진 소속함수를 바탕으로 ID3와 C4.5와 유사한 방법으로 퍼지 결정트리를 생성한다. 또한, 유전자 알고리즘을 이용하여 소속함수를 조율한다. IRIS 데이타, Wisconsin breast cancer 데이타, credit screening 데이타 등 벤치마크 데이타들에 대한 실험 결과 제안된 방법이 C4.5 방법을 포함한 다른 방법보다 성능과 규칙의 이해성에서 보다 효율적임을 보인다.Abstract With an extended use of computers, we can easily generate and collect data. There is a need to acquire useful knowledge from data automatically. In data mining the acquired knowledge needs to be both accurate and comprehensible. In this paper, we propose an efficient fuzzy rule generation algorithm based on fuzzy decision tree for data mining. We combine the comprehensibility of rules generated based on decision tree such as ID3 and C4.5 and the expressive power of fuzzy sets. Particularly, fuzzy rules allow us to effectively classify patterns of non-axis-parallel decision boundaries, which are difficult to do using attribute-based classification methods.In our algorithm we first determine an appropriate set of membership functions for each attribute of data using histogram analysis. Given a set of membership functions then we construct a fuzzy decision tree in a similar way to that of ID3 and C4.5. We also apply genetic algorithm to tune the initial set of membership functions. We have experimented our algorithm with several benchmark data sets including the IRIS data, the Wisconsin breast cancer data, and the credit screening data. The experiment results show that our method is more efficient in performance and comprehensibility of rules compared with other methods including C4.5.

Spatial Statistic Data Release Based on Differential Privacy

  • Cai, Sujin;Lyu, Xin;Ban, Duohan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.10
    • /
    • pp.5244-5259
    • /
    • 2019
  • With the continuous development of LBS (Location Based Service) applications, privacy protection has become an urgent problem to be solved. Differential privacy technology is based on strict mathematical theory that provides strong privacy guarantees where it supposes that the attacker has the worst-case background knowledge and that knowledge has been applied to different research directions such as data query, release, and mining. The difficulty of this research is how to ensure data availability while protecting privacy. Spatial multidimensional data are usually released by partitioning the domain into disjointed subsets, then generating a hierarchical index. The traditional data-dependent partition methods need to allocate a part of the privacy budgets for the partitioning process and split the budget among all the steps, which is inefficient. To address such issues, a novel two-step partition algorithm is proposed. First, we partition the original dataset into fixed grids, inject noise and synthesize a dataset according to the noisy count. Second, we perform IH-Tree (Improved H-Tree) partition on the synthetic dataset and use the resulting partition keys to split the original dataset. The algorithm can save the privacy budget allocated to the partitioning process and obtain a more accurate release. The algorithm has been tested on three real-world datasets and compares the accuracy with the state-of-the-art algorithms. The experimental results show that the relative errors of the range query are considerably reduced, especially on the large scale dataset.

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model (텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.231-250
    • /
    • 2014
  • Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

Anomaly Detection Model Using THRE-KBANN (THRE-KBANN을 이용한 이상현상탐지모델)

  • Shim, Dong-Hee
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.5
    • /
    • pp.37-43
    • /
    • 2001
  • Since Internet has been used anywhere, illegal intrusion to a certain host or network become the ciritical factor in security. Although many anomaly detection models have been proposed using the statistical analysis, data mining, genetic algorithm/programming to detect illegal intrusions, these models has defects to detect new types of intrusions. THRE-KBANN (theory-refinement knowledge-based artificial neural network) which can learn continuously based on KBANN, is proposed for the anomaly detection model in this paper. The performance of this model is compared with that of the model based on data mining using the experimental data. The ability of continual learning for the detection of new types of intrusions is also evaluated.

  • PDF

A Study on Designing Intelligent Military Decision Aiding System in a Network Computing Environment (네트웍 컴퓨팅 환경하에서의 지능형 군사적 의사결정시스템 구축에 관한 연구)

  • 김용효;박상찬
    • Journal of the military operations research society of Korea
    • /
    • v.24 no.1
    • /
    • pp.18-40
    • /
    • 1998
  • This paper is aimed to design an intelligent military decision aiding system in a network computing environment, especially focusing on designing an intelligent analytic system that has data mining tools and inference engine. Through this study, we concluded that the intelligent analytic system can aid military decision making processes. Highlights of the proposed system are as follows : 1) Decision making time can be reduced by the On-line and Real-time analysis ; 2) Intelligent analysis on military decision problems in network computing environments in enabled; 3) The WWW-based implementation models, which provide a standard user interface with seamless information sharing and integration capability and knowledge repository.

  • PDF

Customer Relationship Management Techniques Based on Dynamic Customer Analysis Utilizing Data Mining (데이터마이닝을 활용한 동적인 고객분석에 따른 고객관계관리 기법)

  • 하성호;이재신
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.3
    • /
    • pp.23-47
    • /
    • 2003
  • Traditional studies for customer relationship management (CRM) generally focus on static CRM in a specific time frame. The static CRM and customer behavior knowledge derived could help marketers to redirect marketing resources fur profit gain at that given point in time. However, as time goes, the static knowledge becomes obsolete. Therefore, application of CRM to an online retailer should be done dynamically in time. Customer-based analysis should observe the past purchase behavior of customers to understand their current and likely future purchase patterns in consumer markets, and to divide a market into distinct subsets of customers, any of which may conceivably be selected as a market target to be reached with a distinct marketing mix. Though the concept of buying-behavior-based CRM was advanced several decades ago, virtually little application of the dynamic CRM has been reported to date. In this paper, we propose a dynamic CRM model utilizing data mining and a Monitoring Agent System (MAS) to extract longitudinal knowledge from the customer data and to analyze customer behavior patterns over time for the Internet retailer. The proposed model includes an extensive analysis about a customer career path that observes behaviors of segment shifts of each customer: prediction of customer careers, identification of dominant career paths that most customers show and their managerial implications, and about the evolution of customer segments over time. furthermore, we show that dynamic CRM could be useful for solving several managerial problems which any retailers may face.

  • PDF

Investigation of Trend in Virtual Reality-based Workplace Convergence Research: Using Pathfinder Network and Parallel Neighbor Clustering Methodology (가상현실 기반 업무공간 융복합 분야 연구 동향 분석 : 패스파인더 네트워크와 병렬 최근접 이웃 클러스터링 방법론 활용)

  • Ha, Jae Been;Kang, Ju Young
    • The Journal of Information Systems
    • /
    • v.31 no.2
    • /
    • pp.19-43
    • /
    • 2022
  • Purpose Due to the COVID-19 pandemic, many companies are building virtual workplaces based on virtual reality technology. Through this study, we intend to identify the trend of convergence and convergence research between virtual reality technology and work space, and suggest future promising fields based on this. Design/methodology/approach For this purpose, 12,250 bibliographic data of research papers related to Virtual Reality (VR) and Workplace were collected from Scopus from 1982 to 2021. The bibliographic data of the collected papers were analyzed using Text Mining and Pathfinder Network, Parallel Neighbor Clustering, Nearest Neighbor Centrality, and Triangle Betweenness Centrality. Through this, the relationship between keywords by period was identified, and network analysis and visualization work were performed for virtual reality-based workplace research. Findings Through this study, it is expected that the main keyword knowledge structure flow of virtual reality-based workplace convergence research can be identified, and the relationship between keywords can be identified to provide a major measure for designing directions in subsequent studies.

A Study of Knowledge Creating Organizational Memory (지식 창조적 조직메모리에 관한 연구)

  • 장재경
    • Journal of the Korean Society for information Management
    • /
    • v.15 no.3
    • /
    • pp.133-150
    • /
    • 1998
  • For the purpose of new‘organizational knowledge centric knowledge management’, this paper proposes the knowledge creating organizational memory which shows the knowledge creation in organization according to the dialectical circulation between the domain knowledge and the task knowledge, based on the Yin Yang theory. This paper defines two kinds of organizational knowledge such as the domain knowledge and task knowledge and designs them in the pursuit of its lifecycle. Knowledge creating organizational memory is designed to three knowledge components that circulate through the domain knowledge and the task knowledge according to the object-oriented methodology. Organizational knowledge is designed into the graphical structure of ( i ) knowledge ( ⅱ ) relation between knowledge objects and ( ⅲ ) degree of relation, which receive the legacy of organizational knowledge such as data schema, process model and knowledge base. This design of organizational knowledge can be applied to CBR(Case Based Reasoning), one of knowledge mining tools to create new organizational knowledge.

  • PDF

An Investigation on Expanding Traditional Sequential Analysis Method by Considering the Reversion of Purchase Realization Order (구매의도 생성 순서와 구매실현 순서의 역전 현상을 감안한 확장된 순차분석 방법론)

  • Kim, Minseok;Kim, Namgyu
    • The Journal of Information Systems
    • /
    • v.22 no.3
    • /
    • pp.25-42
    • /
    • 2013
  • Recently various kinds of Information Technology services are created and the quantities of the data flow are increase rapidly. Not only that, but the data patterns that we deal with also slowly becoming diversity. As a result, the demand of discover the meaningful knowledge/information through the various mining analysis such as linkage analysis, sequencing analysis, classification and prediction, has been steadily increasing. However, solving the business problems using data mining analysis does not always concerning, one of the major causes of these limitations is there are some analyzed data can't accurately reflect the real world phenomenon. For example, although the time gap of purchasing the two products is very short, by using the traditional sequencing analysis, the precedence relationship of the two products is clearly reflected. But in the real world, with the very short time interval, the precedence relationship of the two purchases might not be defined. What was worse, the sequence of the purchase intention and the sequence of the purchase realization of the two products might be mutually be reversed. Therefore, in this study, an expanded sequencing analysis methodology has been proposed in order to reflect this situation. In this proposed methodology, the purchases that being made in a very short time interval among the purchase order which might not important will be notice, and the analysis which included the original sequence and reversed sequence will be used to extend the analysis of the data. Also, to some extent a very short time interval can be defined as the time interval, so an experiment were carried out to determine the varying based on the time interval for the actual data.

The Multi-Agent Simulation of Archaic State Formation (다중 에이전트 기반의 고대 국가 형성 시뮬레이션)

  • S. Kim;A. Lazar;R.G. Reynolds
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2003.06a
    • /
    • pp.91-100
    • /
    • 2003
  • In this paper we investigate the role that warfare played In the formation of the network of alliances between sites that are associated with the formation of the state in the Valley of Oaxaca, Mexico. A model of state formation proposed by Marcos and Flannery (1996) is used as the basis for an agent-based simulation model. Agents reside in sites and their actions are constrained by knowledge extracted from the Oaxaca Surface Archaeological Survey (Kowalewski 1989). The simulation is run with two different sets of constraint rules for the agents. The first set is based upon the raw data collected in the surface survey. This represents a total of 79 sites and constitutes a minimal level of warfare (raiding) in the Valley. The other site represents the generalization of these constraints to sites with similar locational characteristics. This set corresponds to 987 sites and represents a much more active role for warfare in the Valley. The rules were produced by a data mining technique, Decision Trees, guided by Genetic Algorithms. Simulations were run using the two different rule sets and compared with each other and the archaeological data for the Valley. The results strongly suggest that warfare was a necessary process in the aggregations of resources needed to support the emergence of the state in the Valley.

  • PDF