• Title/Summary/Keyword: Knowledge-Based Data Mining

Search Result 263, Processing Time 0.021 seconds

Second-Order Learning for Complex Forecasting Tasks: Case Study of Video-On-Demand (복잡한 예측문제에 대한 이차학습방법 : Video-On-Demand에 대한 사례연구)

  • 김형관;주종형
    • Journal of Intelligence and Information Systems
    • /
    • v.3 no.1
    • /
    • pp.31-45
    • /
    • 1997
  • To date, research on data mining has focused primarily on individual techniques to su, pp.rt knowledge discovery. However, the integration of elementary learning techniques offers a promising strategy for challenging a, pp.ications such as forecasting nonlinear processes. This paper explores the utility of an integrated a, pp.oach which utilizes a second-order learning process. The a, pp.oach is compared against individual techniques relating to a neural network, case based reasoning, and induction. In the interest of concreteness, the concepts are presented through a case study involving the prediction of network traffic for video-on-demand.

  • PDF

A Web-Based Domain Ontology Construction Modelling and Application in the Wetland Domain

  • Xing, Jun;Han, Min
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.6
    • /
    • pp.754-759
    • /
    • 2007
  • Methodology of ontology building based on Web resources will not only reduce significantly the ontology construction period, but also enhance the quality of the ontology. Remarkable progress has been achieved in this regard, but they encounter similar difficulties, such as the Web data extraction and knowledge acquisition. This paper researches on the characteristics of ontology construction data, including dynamics, largeness, variation and openness and other features, and the fundamental issue of ontology construction - formalized representation method. Then, the key technologies used in and the difficulties with ontology construction are summarized. A software Model-OntoMaker (Ontology Maker) is designed. The model is innovative in two regards: (1) the improvement of generality: the meta learning machine will dynamically pick appropriate ontology learning methodologies for data of different domains, thus optimizing the results; (2) the merged processing of (semi-) structural and non-structural data. In addition, as known to all wetland researchers, information sharing is vital to wetland exploitation and protection, while wetland ontology construction is the basic task for information sharing. OntoMaker constructs the wetland ontologies, and the model in this work can also be referred to other environmental domains.

  • PDF

Operational Big Data Analytics platform for Smart Factory (스마트팩토리를 위한 운영빅데이터 분석 플랫폼)

  • Bae, Hyerim;Park, Sanghyuck;Choi, Yulim;Joo, Byeongjun;Sutrisnowati, Riska Asriana;Pulshashi, Iq Reviessay;Putra, Ahmad Dzulfikar Adi;Adi, Taufik Nur;Lee, Sanghwa;Won, Seokrae
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.9-19
    • /
    • 2016
  • Since ICT convergence became a major issue, German government has carried forward a policy 'Industry 4.0' that triggered ICT convergence with manufacturing. Now this trend gets into our stride. From this facts, we can expect great leap up to quality perfection in low cost. Recently Korean government also enforces policy with 'Manufacturing 3.0' for upgrading Korean manufacturing industry with being accelerated by many related technologies. We, in the paper, developed a custom-made operational big data analysis platform for the implementation of operational intelligence to improve industry capability. Our platform is designed based on spring framework and web. In addition, HDFS and spark architectures helps our system analyze massive data on the field with streamed data processed by process mining algorithm. Extracted knowledge from data will support enhancement of manufacturing performance.

  • PDF

Analysis of Consulting Research Trends Using Topic Modeling (토픽 모델링을 활용한 컨설팅 연구동향 분석)

  • Kim, Min Kwan;Lee, Yong;Han, Chang Hee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.46-54
    • /
    • 2017
  • 'Consulting', which is the main research topic of the knowledge service industry, is a field of study that is essential for the growth and development of companies and proliferation to specialized fields. However, it is difficult to grasp the current status of international research related to consulting, mainly on which topics are being studied, and what are the latest research topics. The purpose of this study is to analyze the research trends of academic research related to 'consulting' by applying quantitative analysis such as topic modeling and statistic analysis. In this study, we collected statistical data related to consulting in the Scopus DB of Elsevier, which is a representative academic database, and conducted a quantitative analysis on 15,888 documents. We scientifically analyzed the research trends related to consulting based on the bibliographic data of academic research published all over the world. Specifically, the trends of the number of articles published in the major countries including Korea, the author key word trend, and the research topic trend were compared by country and year. This study is significant in that it presents the result of quantitative analysis based on bibliographic data in the academic DB in order to scientifically analyze the trend of academic research related to consulting. Especially, it is meaningful that the traditional frequency-based quantitative bibliographic analysis method and the text mining (topic modeling) technique are used together and analyzed. The results of this study can be used as a tool to guide the direction of research in consulting field. It is expected that it will help to predict the promising field, changes and trends of consulting industry related research through the trend analysis.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Detection of Protein Subcellular Localization based on Syntactic Dependency Paths (구문 의존 경로에 기반한 단백질의 세포 내 위치 인식)

  • Kim, Mi-Young
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.375-382
    • /
    • 2008
  • A protein's subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and WordNet thesaurus. In the first step, we constructed syntactic dependency paths from each protein to its location candidate, and then converted the syntactic dependency paths into dependency trees. In the second step, we retrieved root information of the syntactic dependency trees. In the final step, we extracted syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extracted syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees from the training data, we extracted (protein, localization) pairs from the test sentences. Even with no biomolecular knowledge, our method showed reasonable performance in experimental results using Medline abstract data. Our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12-25%.

Comparative analysis of Lecture Evaluation using Decision Tree: Ways to Improve University Classes after COVID-19

  • Bok-Ju Jung;Sang-Chul Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.197-208
    • /
    • 2023
  • In this study, we attempted to examine the changing ways of thinking about lecture evaluation before and after COVID-19. To this end, decision tree analysis(Decision Tree) was used among data mining techniques based on lecture evaluation data for liberal arts and major classes conducted before and after COVID-19 for A university. According to the results of the study, liberal arts changed from 'method' to 'content', and 'knowledge improvement' was an important factor both before and after majors. In particular, 'Assignment' was found to be an important factor after the COVID-19 in common in the evaluation of lectures in the liberal arts department, which means that in the future, professors will be provided with appropriate teaching methods during class, interaction with students, and feedback on assignments or test results, indicates the need for competence. Based on the results of this study, a plan to improve communication with students and activation of blended learning was suggested.

Data Mining based Forest Fires Prediction Models using Meteorological Data (기상 데이터를 이용한 데이터 마이닝 기반의 산불 예측 모델)

  • Kim, Sam-Keun;Ahn, Jae-Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.8
    • /
    • pp.521-529
    • /
    • 2020
  • Forest fires are one of the most important environmental risks that have adverse effects on many aspects of life, such as the economy, environment, and health. The early detection, quick prediction, and rapid response of forest fires can play an essential role in saving property and life from forest fire risks. For the rapid discovery of forest fires, there is a method using meteorological data obtained from local sensors installed in each area by the Meteorological Agency. Meteorological conditions (e.g., temperature, wind) influence forest fires. This study evaluated a Data Mining (DM) approach to predict the burned area of forest fires. Five DM models, e.g., Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Decision Tree (DT), Random Forests (RF), and Deep Neural Network (DNN), and four feature selection setups (using spatial, temporal, and weather attributes), were tested on recent real-world data collected from Gyeonggi-do area over the last five years. As a result of the experiment, a DNN model using only meteorological data showed the best performance. The proposed model was more effective in predicting the burned area of small forest fires, which are more frequent. This knowledge derived from the proposed prediction model is particularly useful for improving firefighting resource management.

Construction of Management Performance Data-Mining System for CEO′s Efficient/Effective Decision Making (CEO의 효율적/유효적 의사결정을 위한 경영성과 데이터마이닝 시스템의 구축)

  • 조성훈;안동규;김제홍
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.4
    • /
    • pp.41-47
    • /
    • 2000
  • In modern dynamic management environment, there is growing recognition that information & knowledge management systems are essential for CEO's efficient/effective decision making. As a key component to cope with this current, we suggest the management performance data-mining system based on IT(Information Technology). This system measures management performance that is considered with both VA(Value-Added), which represents stakeholder's point of view and EVA(Economic Value-Added), which represents shareholder's point of view. The relationship between management performance and 85 financial ratios is analyzed, and then important financial ratios are drawn out. In analyzing the relationship, we applied the explanation-based Gas(Genetic Algorithms) that consider predictability, understanability (lucidity) and reasonability factors simultaneously. To demonstrate the performance of the system, we conducted a case study using financial data over the 16-years from 1981 to 1996 of Korean automobile industry which is taken from database of KISFAS(Korea Investors Services Financial Analysis System).

  • PDF

Probabilistic Neural Network for Prediction of Leakage in Water Distribution Network (급배수관망 누수예측을 위한 확률신경망)

  • Ha, Sung-Ryong;Ryu, Youn-Hee;Park, Sang-Young
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.20 no.6
    • /
    • pp.799-811
    • /
    • 2006
  • As an alternative measure to replace reactive stance with proactive one, a risk based management scheme has been commonly applied to enhance public satisfaction on water service by providing a higher creditable solution to handle a rehabilitation problem of pipe having high potential risk of leaks. This study intended to examine the feasibility of a simulation model to predict a recurrence probability of pipe leaks. As a branch of the data mining technique, probabilistic neural network (PNN) algorithm was applied to infer the extent of leaking recurrence probability of water network. PNN model could classify the leaking level of each unit segment of the pipe network. Pipe material, diameter, C value, road width, pressure, installation age as input variable and 5 classes by pipe leaking probability as output variable were built in PNN model. The study results indicated that it is important to pay higher attention to the pipe segment with the leak record. By increase the hydraulic pipe pressure to meet the required water demand from each node, simulation results indicated that about 6.9% of total number of pipe would additionally be classified into higher class of recurrence risk than present as the reference year. Consequently, it was convinced that the application of PNN model incorporated with a data base management system of pipe network to manage municipal water distribution network could make a promise to enhance the management efficiency by providing the essential knowledge for decision making rehabilitation of network.