• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.029 seconds

Industrial Safety Risk Analysis Using Spatial Analytics and Data Mining (공간분석·데이터마이닝 융합방법론을 통한 산업안전 취약지 등급화 방안)

  • Ko, Kyeongseok;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.147-153
    • /
    • 2017
  • The mortality rate in industrial accidents in South Korea was 11 per 100,000 workers in 2015. It's five times higher than the OECD average. Economic losses due to industrial accidents continue to grow, reaching 19 trillion won much more than natural disaster losses equivalent to 1.1 trillion won. It requires fundamental changes according to industrial safety management. In this study, We classified the risk of accidents in industrial complex of Ulju-gun using spatial analytics and data mining. We collected 119 data on accident data, factory characteristics data, company information such as sales amount, capital stock, building information, weather information, official land price, etc. Through the pre-processing and data convergence process, the analysis dataset was constructed. Then we conducted geographically weighted regression with spatial factors affecting fire incidents and calculated the risk of fire accidents with analytical model for combining Boosting and CART (Classification and Regression Tree). We drew the main factors that affect the fire accident. The drawn main factors are deterioration of buildings, capital stock, employee number, officially assessed land price and height of building. Finally the predicted accident rates were divided into four class (risk category-alert, hazard, caution, and attention) with Jenks Natural Breaks Classification. It is divided by seeking to minimize each class's average deviation from the class mean, while maximizing each class's deviation from the means of the other groups. As the analysis results were also visualized on maps, the danger zone can be intuitively checked. It is judged to be available in different policy decisions for different types, such as those used by different types of risk ratings.

Major gene identification for FASN gene in Korean cattles by data mining (데이터마이닝을 이용한 한우의 우수 지방산합성효소 유전자 조합 선별)

  • Kim, Byung-Doo;Kim, Hyun-Ji;Lee, Seong-Won;Lee, Jea-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1385-1395
    • /
    • 2014
  • Economic traits of livestock are affected by environmental factors and genetic factors. In addition, it is not affected by one gene, but is affected by interaction of genes. We used a linear regression model in order to adjust environmental factors. And, in order to identify gene-gene interaction effect, we applied data mining techniques such as neural network, logistic regression, CART and C5.0 using five-SNPs (single nucleotide polymorphism) of FASN (fatty acid synthase). We divided total data into training (60%) and testing (40%) data, and applied the model which was designed by training data to testing data. By the comparison of prediction accuracy, C5.0 was identified as the best model. It were selected superior genotype using the decision tree.

A Data Mining Approach for Selecting Bitmap Join Indices

  • Bellatreche, Ladjel;Missaoui, Rokia;Necir, Hamid;Drias, Habiba
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.2
    • /
    • pp.177-194
    • /
    • 2007
  • Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.

Keyword Analysis of Arboretums and Botanical Gardens Using Social Big Data

  • Shin, Hyun-Tak;Kim, Sang-Jun;Sung, Jung-Won
    • Journal of People, Plants, and Environment
    • /
    • v.23 no.2
    • /
    • pp.233-243
    • /
    • 2020
  • This study collects social big data used in various fields in the past 9 years and explains the patterns of major keywords of the arboretums and botanical gardens to use as the basic data to establish operational strategies for future arboretums and botanical gardens. A total of 6,245,278 cases of data were collected: 4,250,583 from blogs (68.1%), 1,843,677 from online cafes (29.5%), and 151,018 from knowledge search engine (2.4%). As a result of refining valid data, 1,223,162 cases were selected for analysis. We came up with keywords through big data, and used big data program Textom to derive keywords of arboretums and botanical gardens using text mining analysis. As a result, we identified keywords such as 'travel', 'picnic', 'children', 'festival', 'experience', 'Garden of Morning Calm', 'program', 'recreation forest', 'healing', and 'museum'. As a result of keyword analysis, we found that keywords such as 'healing', 'tree', 'experience', 'garden', and 'Garden of Morning Calm' received high public interest. We conducted word cloud analysis by extracting keywords with high frequency in total 6,245,278 titles on social media. The results showed that arboretums and botanical gardens were perceived as spaces for relaxation and leisure such as 'travel', 'picnic' and 'recreation', and that people had high interest in educational aspects with keywords such as 'experience' and 'field trip'. The demand for rest and leisure space, education, and things to see and enjoy in arboretums and botanical gardens increased than in the past. Therefore, there must be differentiation and specialization strategies such as plant collection strategies, exhibition planning and programs in establishing future operation strategies.

Model Development for Specific Degradation Using Data Mining and Geospatial Analysis of Erosion and Sedimentation Features

  • Kang, Woochul;Kang, Joongu;Jang, Eunkyung;Julien, Piere Y.
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.85-85
    • /
    • 2020
  • South Korea experiences few large scale erosion and sedimentation problems, however, there are numerous local sedimentation problems. A reliable and consistent approach to modelling and management for sediment processes are desirable in the country. In this study, field measurements of sediment concentration from 34 alluvial river basins in South Korea were used with the Modified Einstein Procedure (MEP) to determine the total sediment load at the sampling locations. And then the Flow Duration-Sediment Rating Curve (FD-SRC) method was used to estimate the specific degradation for all gauging stations. The specific degradation of most rivers were found to be typically 50-300 tons/㎢·yr. A model tree data mining technique was applied to develop a model for the specific degradation based on various watershed characteristics of each watershed from GIS analysis. The meaningful parameters are: 1) elevation at the middle relative area of the hypsometric curve [m], 2) percentage of wetland and water [%], 3) percentage of urbanized area [%], and 4) Main stream length [km]. The Root Mean Square Error (RMSE) of existing models is in excess of 1,250 tons/㎢·yr and the RMSE of the proposed model with 6 additional validations decreased to 65 tons/㎢·yr. Erosion loss maps from the Revised Universal Soil Loss Equation (RUSLE), satellite images, and aerial photographs were used to delineate the geospatial features affecting erosion and sedimentation. The results of the geospatial analysis clearly shows that the high risk erosion area (hill slopes and construction sites at urbanized area) and sedimentation features (wetlands and agricultural reservoirs). The result of physiographical analysis also indicates that the watershed morphometric characteristic well explain the sediment transport. Sustainable management with the data mining methodologies and geospatial analysis could be helpful to solve various erosion and sedimentation problems under different conditions.

  • PDF

Copper Tolerance of Novel Rhodotorula sp. Yeast Isolated from Gold Mining Ore in Gia Lai, Vietnam

  • Kim Cuc Thi Nguyen;Phuc Hung Truong;Cuong Tu Ho;Cong Tuan Le;Khoa Dang Tran;Tien Long Nguyen;Manh Tuan Nguyen;Phu Van Nguyen
    • Mycobiology
    • /
    • v.51 no.6
    • /
    • pp.379-387
    • /
    • 2023
  • In this study, twenty-five yeast strains were isolated from soil samples collected in the gold mining ore in Gia Lai, Vietnam. Among them, one isolate named GL1T could highly tolerate Cu2+ up to 10 mM, and the isolates could also grow in a wide range of pH (3-7), and temperature (10-40 ℃). Dried biomass of GL1 was able to remove Cu2+ effectively up to 90.49% with a maximal biosorption capacity of 18.1 mg/g at pH 6, temperature 30 ℃, and incubation time 60 min. Sequence analysis of rDNA indicated this strain was closely related to Rhodotorula mucilaginosa but with 1.53 and 3.46% nucleotide differences in the D1/D2 domain of the 28S rRNA gene and the ITS1-5.8S rRNA gene-ITS2 region sequence, respectively. Based on phylogenetic tree analysis and the biochemical characteristics, the strain appears to be a novel Rhodotorula species, and the name Rhodotorula aurum sp. nov. is proposed. This study provides us with more information about heavy metal-tolerant yeasts and it may produce a new tool for environmental control and metal recovery operations.

A Recommending System for Care Plan(Res-CP) in Long-Term Care Insurance System (데이터마이닝 기법을 활용한 노인장기요양급여 권고모형 개발)

  • Han, Eun-Jeong;Lee, Jung-Suk;Kim, Dong-Geon;Ka, Im-Ok
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1229-1237
    • /
    • 2009
  • In the long-term care insurance(LTCI) system, the question of how to provide the most appropriate care has become a major issue for the elderly, their family, and for policy makers. To help beneficiaries use LTC services appropriately to their needs of care, National Health Insurance Corporation(NHIC) provide them with the individualized care plan, named the Long-term Care User Guide. It includes recommendations for beneficiaries' most appropriate type of care. The purpose of this study is to develop a recommending system for care plan(Res-CP) in LTCI system. We used data set for Long-term Care User Guide in the 3rd long-term care insurance pilot programs. To develop the model, we tested four models, including a decision-tree model in data-mining, a logistic regression model, and a boosting and boosting techniques in an ensemble model. A decision-tree model was selected to describe the Res-CP, because it may be easy to explain the algorithm of Res-CP to the working groups. Res-CP might be useful in an evidence-based care planning in LTCI system and may contribute to support use of LTC services efficiently.

Convergence analysis of determinants affecting on geographic variations in the prevalence of arthritis in Korean women using data mining (데이터마이닝을 이용한 여성 관절염 유병률 소지역 간 변이의 융복합 요인분석)

  • Kim, Yoo-Mi;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.5
    • /
    • pp.277-288
    • /
    • 2015
  • This study aims to identify determinants affecting on geographic variations in the prevalence of arthritis in Korean women using data mining. Data from Korean Community Health Survey 2012 with 249 small districts were analyzed. Socio-demographic, health behavior and status, and morbidity status measures were analyzed using conventional regression model and convergence analysis method such as decision tree for convergence analysis. Rate of workers in agriculture, forestry, and fishing, salaried workers, persons higher than high school graduates, non-treatment of needing care, non-treatment of care because of economic reason, obesity, heavy drunkers, complaining persons of chewing difficulty, persons with experiencing depression, persons with perceiving stress, and persons with diagnosing hypertension and angina pectoris were variation determinants of prevalence of arthritis in 249 small districts and these districts were classified 10 area groups by decision tree model. Our finding suggest that the approach based characteristics by small area groups rather than national wide or individual level would be effective to reduce in variations of prevalence of arthritis.

Characterizing Patterns of Experience of Harmful Shops among Adolescents Using Decision Tree Models (데이터마이닝을 이용한 청소년 유해업소 출입경험에 영향을 주는 요인)

  • Sohn, Aeree
    • Korean Journal of Health Education and Promotion
    • /
    • v.31 no.3
    • /
    • pp.15-26
    • /
    • 2014
  • Objective: This study was conducted in order to explore the predictive model of the experience of harmful shops in middle and high school students. Methods: The survey was conducted using a self-administered questionnaire method online via the homepage of the education ministry's student health information center. Participants were 1,888 middle school students and 1,563 high school students from 107 schools in Korea. The collected data were processed using the SPSS classification trees 18.0 program and examined using data mining decision tree model. Results: In this study, 6.9% of all subjects were found to have been to sex industry harmful place and 81.8% game place. The results revealed that smoking, living with parents, and school grade were significant predictors for experience of sex industry harmful place. The perception of study disrupts, drinking, living with parents, stress, and satisfaction of school life were significant predictors for experience of game harmful place. Conclusions: These results suggest that an educational approach should be developed by tailored conditions to prevent the access to harmful shops.

Analysis on the Usage of Internet Games for Children with Decision Tree Rules (의사결정규칙을 이용한 아동의 교육용 인터넷 게임 활용실태 분석)

  • Kim, Yong-Dae;Jung, Hui-Suk;Choi, Eun-Jeong;Park, Byung-Sun;Han, Jeong-Hye
    • Journal of The Korean Association of Information Education
    • /
    • v.5 no.3
    • /
    • pp.389-400
    • /
    • 2001
  • The Internet Game is widespreaded quickly on web, and there are many kinds of funny games for users to use easily, so that can be applied to ICT(Information Communication Technology)education. In this paper, we provide the analysis on the usage of Internet games for children and teachers that is conducted by the decision tree algorithm, which is one of the popular data mining techniques. The results show the pattern of children's and teachers' usages of Internet games.

  • PDF