• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.034 seconds

A Study on the Stereotype of ICT SMEs' R&D: Empirical Evidence from Korea (ICT 중소기업 R&D의 스테레오타입에 대한 연구 : 한국의 사례를 중심으로)

  • Jun, Seung-pyo;Choi, San;Jung, JaeOong
    • Journal of Korea Technology Innovation Society
    • /
    • v.20 no.2
    • /
    • pp.334-367
    • /
    • 2017
  • The ICT industry has been the main driver of Korea's economy with international competitiveness and is expected to be the growth engine that will revitalize the currently depressed economy. A broad range of different perspectives and opinions on the industry exist in Korea and overseas. Some of these are stereotypes, not all of which are based on objective evidence. Stereotypes refer to widely-held fixed opinions on a specific group and do not necessarily have negative connotations. However, they should not be viewed lightly because they can substantially affect decision-making process. In this regard, this study sought to review the stereotypes of ICT industry and identify objective and relative stereotypes. In the study, a decision-tree analysis was conducted on a survey result of 3,300 small and medium-sized enterprises (SMEs) in order to identify Korean ICT companies' characteristics that distinguish them from other technology companies. The decision-tree analysis, a data mining process based on machine learning, took a total of 291 variables into account in 10 subjects such as: corporate business in general, technology development activities as well as organization and people in technology development. Identifying the variables that distinguish ICT companies from other technology companies with the decision-tree analysis, the study then came up with a list of objective stereotypes of ICT companies. The findings from the stereotypes of Korean ICT companies are as follows. First, the companies are in need of technology policies that help R&D planning and market penetration. Second, policies must better support the companies working to sell new products or explore new business. Third, the companies need policies that support secure protection of development outcomes and proper management of IP rights. Fourth, the administrative procedures related to governmental support for ICT companies' R&D projects must be simplified. It is hoped that the outcome of this study will provide meaningful guidance in establishment, implementation and evaluation of technology policies for ICT SMEs, particularly to policymakers or researchers in relevant government agencies who determine R&D policies for ICT SMEs.

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Evaluation for Rehabilitation Countermeasures of Coal-mined Spoils and Denuded Lands (폐탄광지(廢炭鑛地)의 산림훼손지복구(山林毁損地復舊) 및 폐석유실방지대책(廢石流失防止對策)에 관한 연구(硏究))

  • Woo, Bo-Myeong
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.3 no.2
    • /
    • pp.24-34
    • /
    • 2000
  • The project for rehabilitation and revegetation of the abandoned coal-mine lands is a very important national environmental restoration project in the view point of rehabilitation and revegetation of denuded forest-lands caused by coal mining as well as restoration of disturbed natural environment and control of the variable pollutions. In Korea, because a large number of coal mines had been developed in order to fill up abundantly consumption of coal as a major energy source in the developing period, a lot of denuded forest-lands caused by coal mining had distributed in the whole country. And, due to the absence of effective rehabilitation and revegetation works on the denuded forestlands caused by coal-mining, most of them had been remained with being damaged. In 1990, area of the abandoned coal-mine lands, requiring the rehabilitation and revegetation works, was about 1,437.1 ha. For the past ten years ('90~'99), about 1,081.8 ha out of them had been rehabilitated and revegetated, and the rehabilitation planning area was about 33.0 ha in 2000. So, remaining area out of abandoned coal-mine lands will be about 322.3 ha after 2000. In principle, after abandoning coal-mine, mine owners must carry out the rehabilitation and revegetation works on the abandoned mine lands by themselves. But, most of mine owners were in financial difficulty after abandoning coal-mine, so that principle couldn't have obtained the desired effects. To solve this problem, from 1995, Coal Industry Promotion Board (CIPB) have carried out the rehabilitation and revegetation works on the abandoned coal-mine lands at government budgets, and they have obtained good results in the construction area. However, due to application of the "conventional erosion control measures and techniques" to the rehabilitation and revegetation measures on the abandoned coal-mine lands, the results and effects of the works excuted have not been successful. Therefore, unique measures and techniques for rehabilitation and revegetation of the abandoned coal-mine lands will have to be developed, especially including development of new techniques on the soil-dressing and soil-covering, seed spray and hydro-seeding measures with seed-fertilizer-soil materials as the mechanized measures, and using of new materials for the tree planting and seedling measures.

  • PDF

Data-driven Co-Design Process for New Product Development: A Case Study on Smart Heating Jacket (신제품 개발을 위한 데이터 기반 공동 디자인 프로세스: 스마트 난방복 사례 연구)

  • Leem, Sooyeon;Lee, Sang Won
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.133-141
    • /
    • 2021
  • This research suggests a design process that effectively complements the human-centered design through an objective data-driven approach. The subjective human-centered design process can often lack objectivity and can be supplemented by the data-driven approaches to effectively discover hidden user needs. This research combines the data mining analysis with co-design process and verifies its applicability through the case study on the smart heating jacket. In the data mining process, the clustering can group the users which is the basis for selecting the target groups and the decision tree analysis primarily identifies the important user perception attributes and values. The broad point of view based on the data analysis is modified through the co-design process which is the deeper human-centered design process by using the developed workbook. In the co-design process, the journey maps, needs and pain points, ideas, values for the target user groups are identified and finalized. They can become the basis for starting new product development.

Examining and Analyzing Influential Factors of Ego-resilience: By Applying Data Mining Analysis (자아탄력성의 영향요인 탐색: 데이터 마이닝 분석의 적용)

  • Ju-Yeon LEE;Ji-Hyeon KANG;Sung-Yae JANG;Soo-Jin YOO
    • The Journal of Counseling Psychology Education Welfare
    • /
    • v.6 no.1
    • /
    • pp.125-136
    • /
    • 2019
  • This study was conducted to examine the significant factors affecting ego-resilience using the data mining technique for large-scale data from the Korean Children & Youth Panel Survey (KCYPS). The KCYPS data of this study were the data elementary school students in the their 5th survey (2,070 8th grade students). The purpose of the study was to analyze the influence factors of elementary, middle, and high school panel subjects and to analyze the trends by year. The results of this study are as follows. First, in order to find the factors affecting ego-resilience in middle school students, the correlation showed that individual development factors such as emotional problems, self-esteem, self-identity, life goals, and satisfaction and developmental environment factors such as parenting style, peer attachment, and school life adaptation were correlated. Second, decision tree analysis was conducted to examine the influence of ego-resilience on middle school students and the results showed that individual development factors and environmental factors were found to be influential. The results of this study suggest a future direction for research related to the ego-resilience of adolescents through examining the factors that affect their ego-resilience from middle school and analyzing the factors affecting ego-resilience.

A Rule Generation Technique Utilizing a Parallel Expansion Method (병렬확장을 활용한 규칙생성 기법)

  • Lee, Kee-Cheol;Kim, Jin-Bong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.4
    • /
    • pp.942-950
    • /
    • 1998
  • Extraction of knowledge, especially in the form of rules, from raw data is very important in data mining, the aim of which is to help users who feel the lack of knowledge in spite of the abundance of data. Logic minimization tools are ones which derive optimized knowledge given ON set and DC set. First, the parallel expansion scheme of logic minimization is extracted and used to obtain intial knowledge to get final rules, which are successfully applicable to real world data. The prototype system based on this new approach has been experimented with real world data to show that it is as practical as conventional long studied decision tree methods like C4.5 system.

  • PDF

An Event-Driven Failure Analysis System for Real-Time Prognosis (실시간 고장 예방을 위한 이벤트 기반 결함원인분석 시스템)

  • Lee, Yang Ji;Kim, Duck Young;Hwang, Min Soon;Cheong, Young Soo
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.4
    • /
    • pp.250-257
    • /
    • 2013
  • This paper introduces a failure analysis procedure that underpins real-time fault prognosis. In the previous study, we developed a systematic eventization procedure which makes it possible to reduce the original data size into a manageable one in the form of event logs and eventually to extract failure patterns efficiently from the reduced data. Failure patterns are then extracted in the form of event sequences by sequence-mining algorithms, (e.g. FP-Tree algorithm). Extracted patterns are stored in a failure pattern library, and eventually, we use the stored failure pattern information to predict potential failures. The two practical case studies (marine diesel engine and SIRIUS-II car engine) provide empirical support for the performance of the proposed failure analysis procedure. This procedure can be easily extended for wide application fields of failure analysis such as vehicle and machine diagnostics. Furthermore, it can be applied to human health monitoring & prognosis, so that human body signals could be efficiently analyzed.

유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용

  • Jang, Yeong-Sik;Kim, Jong-U;Heo, Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.309-320
    • /
    • 2007
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. It causes low prediction accuracy of the minority class because classifiers tend to assign instances to major classes and ignore the minor class to reduce overall misclassification rate. In order to solve the data imbalance problem, there has been proposed a number of techniques based on resampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Agriculture Big Data Analysis System Based on Korean Market Information

  • Chuluunsaikhan, Tserenpurev;Song, Jin-Hyun;Yoo, Kwan-Hee;Rah, Hyung-Chul;Nasridinov, Aziz
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.217-224
    • /
    • 2019
  • As the world's population grows, how to maintain the food supply is becoming a bigger problem. Now and in the future, big data will play a major role in decision making in the agriculture industry. The challenge is how to obtain valuable information to help us make future decisions. Big data helps us to see history clearer, to obtain hidden values, and make the right decisions for the government and farmers. To contribute to solving this challenge, we developed the Agriculture Big Data Analysis System. The system consists of agricultural big data collection, big data analysis, and big data visualization. First, we collected structured data like price, climate, yield, etc., and unstructured data, such as news, blogs, TV programs, etc. Using the data that we collected, we implement prediction algorithms like ARIMA, Decision Tree, LDA, and LSTM to show the results in data visualizations.

Machine Learning Model of Gyro Sensor Data for Drone Flight Control (드론 비행 조종을 위한 자이로센서 데이터 기계학습 모델)

  • Ha, Hyunsoo;Hwang, Byung-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.6
    • /
    • pp.927-934
    • /
    • 2017
  • As the technology of drone develops, the use of drone is increasing, In addition, the types of sensors that are inside of smart phones are becoming various and the accuracy is enhancing day by day. Various of researches are being progressed. Therefore, we need to control drone by using smart phone's sensors. In this paper, we propose the most suitable machine learning model that matches the gyro sensor data with drone's moving. First, we classified drone by it's moving of the gyro sensor value of 4 and 8 degree of freedom. After that, we made it to study machine learning. For the method of machine learning, we applied the One-Rule, Neural Network, Decision Tree, and Navie Bayesian. According to the result of experiment that we designated the value from gyro sensor as the attribute, we had the 97.3 percent of highest accuracy that came out from Naive Bayesian method using 2 attributes in 4 degree of freedom. On and the same, in 8 degree of freedom, Naive Bayesian method using 2 attributes showed the highest accuracy of 93.1 percent.