• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.029 seconds

Accumulation of Heavy Metals(Cd, Cu, Zn, and Pb) in Five Tree Species in Relation to Contamination of Soil near Two Closed Zinc-Mining Sites (아연폐광산(亞鉛廢鑛山) 주변(周邊) 토양(土壤)의 중금속(重金屬) (Cd, Cu, Zn, Pb) 오염(汚染)에 따른 5개(個) 수종(樹種)의 부위별(部位別) 중금속(重金屬) 축적(蓄積))

  • Han, Sim Hee;Hyun, Jung Oh;Lee, Kyung Joon;Cho, Duck Hyun
    • Journal of Korean Society of Forest Science
    • /
    • v.87 no.3
    • /
    • pp.466-474
    • /
    • 1998
  • This study was conducted to evaluate heavy metal concentrations(Cd, Cu, Z, and Pb) in the soil of two zinc mines and to correlate heavy metal contents between the soils and trees growing near the mines. Soils and leaves, stems, and roots of five tree species(Corylus heterophylla, Pinus rigida. Populus alba${\times}$glandulosa, Rhododendron mucronulatum, and Robinia pseudoacacia) were collected from Sambo Zinc Mine located in Hwasung and Gahak Zinc Mine in Kwangmyung city in Kyonggido. Soils near two zinc mines were not seriously contaminated by heavy metals, but Zn and Pb concentrations were at toxic level. The heavy metal concentration in soils decreased in the order of Zn, Pb, Cu, and Cd, and decreased with increasing distance from zinc mining sites. Among the five tree species, Populus alba ${\times}$ glandulosa showed the highest heavy metal concentrations in the tissue except for Pb. Particulars, leaves of the species contained the high concentrations of heavy metals to reach the maximum of 91ppm Zn. The roots of Corylus heterophylla contained high concentrations of Cu and Pb. The order of heavy metal concentrations in the tree species was Zn, Cu, Pb and Cd. The concentration of heavy metals in the tree tissues showed a positive correlation with that in soil in which trees are growing. The ratio of heavy metal concentration of trees to that of soils(concentration factor : CF) was highest in Zn and lowest in Pb. Populus alba${\times}$glandulosa had the highest CF value among the five tree species. It was concluded that Populus alba${\times}$glandulosa, based on the high metal uptake ability, could be used for decontaminating of heavy metals from contaminated soils, and Pinus rigida could be used to reflect the level of contamination in soils.

  • PDF

Advanced Improvement for Frequent Pattern Mining using Bit-Clustering (비트 클러스터링을 이용한 빈발 패턴 탐사의 성능 개선 방안)

  • Kim, Eui-Chan;Kim, Kye-Hyun;Lee, Chul-Yong;Park, Eun-Ji
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.1
    • /
    • pp.105-115
    • /
    • 2007
  • Data mining extracts interesting knowledge from a large database. Among numerous data mining techniques, research work is primarily concentrated on clustering and association rules. The clustering technique of the active research topics mainly deals with analyzing spatial and attribute data. And, the technique of association rules deals with identifying frequent patterns. There was an advanced apriori algorithm using an existing bit-clustering algorithm. In an effort to identify an alternative algorithm to improve apriori, we investigated FP-Growth and discussed the possibility of adopting bit-clustering as the alternative method to solve the problems with FP-Growth. FP-Growth using bit-clustering demonstrated better performance than the existing method. We used chess data in our experiments. Chess data were used in the pattern mining evaluation. We made a creation of FP-Tree with different minimum support values. In the case of high minimum support values, similar results that the existing techniques demonstrated were obtained. In other cases, however, the performance of the technique proposed in this paper showed better results in comparison with the existing technique. As a result, the technique proposed in this paper was considered to lead to higher performance. In addition, the method to apply bit-clustering to GML data was proposed.

  • PDF

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

EEG Classification for depression patients using decision tree and possibilistic support vector machines (뇌파의 의사 결정 트리 분석과 가능성 기반 서포트 벡터 머신 분석을 통한 우울증 환자의 분류)

  • Sim, Woo-Hyeon;Lee, Gi-Yeong;Chae, Jeong-Ho;Jeong, Jae-Seung;Lee, Do-Heon
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.134-138
    • /
    • 2006
  • Depression is the most common and widespread mood disorder. About 20% of the population might suffer a major, incapacitating episode of depression during their lifetime. This disorder can be classified into two types: major depressive disorders and bipolar disorder. Since pharmaceutical treatments are different according to types of depression disorders, correct and fast classification is quite critical for depression patients. Yet, classical statistical method, such as minnesota multiphasic personality inventory (MMPI), have some difficulties in applying to depression patients, because the patients suffer from concentration. We used electroencephalogram (EEG) analysis method fer classification of depression. We extracted nonlinearity of information flows between channels and estimated approximate entropy (ApEn) for the EEG at each channel. Using these attributes, we applied two types of data mining classification methods: decision tree and possibilistic support vector machines (PSVM). We found that decision tree showed 85.19% accuracy and PSVM exhibited 77.78% accuracy for classification of depression, 30 patients with major depressive disorder and 24 patients having bipolar disorder.

  • PDF

A Study on Improvement Plans for Technology Protection of SMEs in Korea (중소기업 기술보호 개선방안에 대한 연구)

  • Lee, Jang Hoon;Shin, Wan Seon;Park, Hyun Ju
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.37 no.2
    • /
    • pp.77-84
    • /
    • 2014
  • The purpose of this research is to identify and develop technology protection plans for small and medium-sized enterprises (SMEs) by analyzing past technology leakage patterns which were experienced by SMEs. We identified factors which affect the technology leakage, and analyzed patterns of the influences using a data mining algorithms. A decision tree analysis showed several significant factors which lead to technology leakage, so we conclude that preemptive actions must be put in place for prevention. We expect that this research will contribute to determining the priority of activities necessary to prevent technology leakage accidents in Korean SMEs. We expect that this research will help SMEs to determine the priority of preemptive actions necessary to prevent technology leakage accidents within their respective companies.

Analysis of Feature Variables for Breast Cancer Diagnosis

  • Jung, Yong Gyu;Kim, Jang Il;Sihn, Sung Chul;Heo, Jun
    • International journal of advanced smart convergence
    • /
    • v.2 no.2
    • /
    • pp.36-39
    • /
    • 2013
  • It is becoming more important as the growing of health information and increasing in cancer patients diagnose over the time gradually. Among the various types of cancer, we focuses on breast cancer diagnosis. The accuracy of breast cancer diagnosis is increasing when the diagnosis is based on evidence and statistics. To do this we use the weka data mining tools and analysis algorithms significantly associated with the decision tree uses rules. In addition, the data pre-processing and cross-validation are used to increase the reliability of the results. The number and cause of the disease becomes important to increase evidence-based medical doctors. As the evidence-based medical, the data obtained from patients in the past through the disease by calculating the probability for future patients to diagnose and predict disease and treatment plan. It can be found by improving the survival rate plays an important role.

A Study on Improving the predict accuracy rate of Hybrid Model Technique Using Error Pattern Modeling : Using Logistic Regression and Discriminant Analysis

  • Cho, Yong-Jun;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.269-278
    • /
    • 2006
  • This paper presents the new hybrid data mining technique using error pattern, modeling of improving classification accuracy. The proposed method improves classification accuracy by combining two different supervised learning methods. The main algorithm generates error pattern modeling between the two supervised learning methods(ex: Neural Networks, Decision Tree, Logistic Regression and so on.) The Proposed modeling method has been applied to the simulation of 10,000 data sets generated by Normal and exponential random distribution. The simulation results show that the performance of proposed method is superior to the existing methods like Logistic regression and Discriminant analysis.

  • PDF

데이터마이닝을 이용한 스틸 파이프 생산 수율 측정에 관한 연구

  • Kim, Woong-Kyung;Kim, Jong-Wan;Nam, In-Gil
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2009.05a
    • /
    • pp.144-149
    • /
    • 2009
  • 본 논문은 스틸 파이프 생산시 저불량율, 고수율을 가진 제품을 분류하고 모형화하기 위하여 과거 스틸 파이프 생산이력을 비교, 분석하여 주요 특성들이 불량율, 수율에 어떠한 영향을 미치는가를 파악함으로써 향후 스틸 파이프 생산 공정에서 저불량율, 고수율의 제품을 생산하는데 주요한 지표로 활용코자 하는데 그 목적이 있다. 과거 스틸 파이프 산업에 대한 주요 특성별 수율 측정에 대한 연구가 미흡하였으나, 본 논문에서 이를 구체화 하여 주요 특성별 불량률과 수율이 어떠한 형태를 나타내는지를 분류하고, 그 영향정도를 구분하고자 한다.

  • PDF

Reservoir Classification using Data Mining Technology for Survivor Function

  • Park, Mee-Jeong;Lee, Joon-Gu;Lee, Jeong-Jae
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.47 no.7
    • /
    • pp.13-22
    • /
    • 2005
  • Main purpose of this article is to classify reservoirs corresponding to their physical characteristics, for example, dam height, dam width, age, repair-works history. First of all, data set of 13,976 reservoirs was analyzed using k means and self organized maps. As a result of these analysis, lots of reservoirs have been classified into four clusters. Factors and their critical values to classify the reservoirs into four groups have been founded by generating a decision tree. The path rules to each group seem reasonable since their survivor function showed unique pattern.