• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.026 seconds

A performance improvement methodology of web document clustering using FDC-TCT (FDC-TCT를 이용한 웹 문서 클러스터링 성능 개선 기법)

  • Ko, Suc-Bum;Youn, Sung-Dae
    • The KIPS Transactions:PartD
    • /
    • v.12D no.4 s.100
    • /
    • pp.637-646
    • /
    • 2005
  • There are various problems while applying classification or clustering algorithm in that document classification which requires post processing or classification after getting as a web search result due to my keyword. Among those, two problems are severe. The first problem is the need to categorize the document with the help of the expert. And, the second problem is the long processing time the document classification takes. Therefore we propose a new method of web document clustering which can dramatically decrease the number of times to calculate a document similarity using the Transitive Closure Tree(TCT) and which is able to speed up the processing without loosing the precision. We also compare the effectivity of the proposed method with those existing algorithms and present the experimental results.

Fault Detection, Diagnosis, and Optimization of Wafer Manufacturing Processes utilizing Knowledge Creation

  • Bae Hyeon;Kim Sung-Shin;Woo Kwang-Bang;May Gary S.;Lee Duk-Kwon
    • International Journal of Control, Automation, and Systems
    • /
    • v.4 no.3
    • /
    • pp.372-381
    • /
    • 2006
  • The purpose of this study was to develop a process management system to manage ingot fabrication and improve ingot quality. The ingot is the first manufactured material of wafers. Trace parameters were collected on-line but measurement parameters were measured by sampling inspection. The quality parameters were applied to evaluate the quality. Therefore, preprocessing was necessary to extract useful information from the quality data. First, statistical methods were used for data generation. Then, modeling was performed, using the generated data, to improve the performance of the models. The function of the models is to predict the quality corresponding to control parameters. Secondly, rule extraction was performed to find the relation between the production quality and control conditions. The extracted rules can give important information concerning how to handle the process correctly. The dynamic polynomial neural network (DPNN) and decision tree were applied for data modeling and rule extraction, respectively, from the ingot fabrication data.

A Study on Approximation Model for Optimal Predicting Model of Industrial Accidents (산업재해의 최적 예측모형을 위한 근사모형에 관한 연구)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.3
    • /
    • pp.1-9
    • /
    • 2006
  • Recently data mining techniques have been used for analysis and classification of data related to industrial accidents. The main objective of this study is to compare algorithms for data analysis of industrial accidents and this paper provides an optimal predicting model of 5 kinds of algorithms including CHAID, CART, C4.5, LR (Logistic Regression) and NN (Neural Network) with ROC chart, lift chart and response threshold. Also, this paper provides an approximation model for an optimal predicting model based on NN. The approximation model provided in this study can be utilized for easy interpretation of data analysis using NN. This study uses selected ten independent variables to group injured people according to a dependent variable in a way that reduces variation. In order to find an optimal predicting model among 5 algorithms, a retrospective analysis was performed in 67,278 subjects. The sample for this work chosen from data related to industrial accidents during three years ($2002\;{\sim}\;2004$) in korea. According to the result analysis, NN has excellent performance for data analysis and classification of industrial accidents.

An Application of gCRM Using Customer Information (고객정보를 이용한 gCRM의 활용)

  • Lee Sun-Soon;Lee Hong-Seok;Lee Joong-Hwan;Kim Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.567-581
    • /
    • 2005
  • Geographical Customer Relationship Management (gCRM) is an integrated solution of Geographic Information System (GIS) and Customer Relationship Management (CRM). In gCRM, GIS is used to show multi-dimensional analytical results of customer information geographically. When customer information is geographically presented, more valuable information appears. In this research we briefly introduce gCRM and show real examples of customer segmentation applied to company.

Sentiment Analysis using Latent Structural SVM (잠재 구조적 SVM을 활용한 감성 분석기)

  • Yang, Seung-Won;Lee, Changki
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.5
    • /
    • pp.240-245
    • /
    • 2016
  • In this study, comments on restaurants, movies, and mobile devices, as well as tweet messages regardless of specific domains were analyzed for sentimental information content. We proposed a system for extraction of objects (or aspects) and opinion words from each sentence and the subsequent evaluation. For the sentiment analysis, we conducted a comparative evaluation between the Structural SVM algorithm and the Latent Structural SVM. As a result, the latter showed better performance and was able to extract objects/aspects and opinion words using VP/NP analyzed by the dependency parser tree. Lastly, we also developed and evaluated the sentiment detector model for use in practical services.

Effect of Mycorrhizal Treatment on Growth of Acacia spp. On Sandy BRIS Soils in Peninsular Malaysia

  • Lee, Su See;Mansor, Patahayah;Koter, Rosdi;Lee, Don Koo
    • Journal of Korean Society of Forest Science
    • /
    • v.95 no.5
    • /
    • pp.516-523
    • /
    • 2006
  • Marginal soils such as BRlS (Beach Ridges Interspersed with Swales) soils and ex-tin mining land make up approximately 0.5 million ha or about 2% of Malaysia's land area. In the coastal areas of the east coast of Peninsular Malaysia impoverished sandy BRIS dominates the landscape with most lying idle as there is no national management plan for their utilization. A field study was carried out to see whether mycorrhizal application had any effect on the growth of three exotic Acacia spp., i.e. Acacia auriculiformis, A. mangium and Acacia hybrid (A. auriculiformis ${\times}$ A. mangium) on BRIS soils. Two types of mycorrhizal inoculum, namely, a commercially available arbuscular mycorrhizal inoculum marketed as $MycoGold^{TM}$ and an indigenous ectomycorrhizal Tomentella sp. inoculum were tested. In the initial six months, height growth of all three tree species inoculated with the arbuscular mycorrhizal inoculum was significantly improved compared to the ectomycorrhizal inoculated and uninoculated control plants. The mycorrhizal effect was not evident thereafter and repeated application of the arbuscular mycorrhizal inoculum may be necessary for continued growth enhancement. Of the three species, A. mangium had the highest relative height growth rate over the 24 months on BRlS soils.

Machine Learning Approach to Classifying Fatal and Non-Fatal Accidents in Industries (사망사고와 부상사고의 산업재해분류를 위한 기계학습 접근법)

  • Kang, Sungsik;Chang, Seong Rok;Suh, Yongyoon
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.5
    • /
    • pp.52-60
    • /
    • 2021
  • As the prevention of fatal accidents is considered an essential part of social responsibilities, both government and individual have devoted efforts to mitigate the unsafe conditions and behaviors that facilitate accidents. Several studies have analyzed the factors that cause fatal accidents and compared them to those of non-fatal accidents. However, studies on mathematical and systematic analysis techniques for identifying the features of fatal accidents are rare. Recently, various industrial fields have employed machine learning algorithms. This study aimed to apply machine learning algorithms for the classification of fatal and non-fatal accidents based on the features of each accident. These features were obtained by text mining literature on accidents. The classification was performed using four machine learning algorithms, which are widely used in industrial fields, including logistic regression, decision tree, neural network, and support vector machine algorithms. The results revealed that the machine learning algorithms exhibited a high accuracy for the classification of accidents into the two categories. In addition, the importance of comparing similar cases between fatal and non-fatal accidents was discussed. This study presented a method for classifying accidents using machine learning algorithms based on the reports on previous studies on accidents.

Digital mapping of soil carbon stock in Jeolla province using cubist model

  • Park, Seong-Jin;Lee, Chul-Woo;Kim, Seong-Heon;Oh, Taek-Keun
    • Korean Journal of Agricultural Science
    • /
    • v.47 no.4
    • /
    • pp.1097-1107
    • /
    • 2020
  • Assessment of soil carbon stock is essential for climate change mitigation and soil fertility. The digital soil mapping (DSM) is well known as a general technique to estimate the soil carbon stocks and upgrade previous soil maps. The aim of this study is to calculate the soil carbon stock in the top soil layer (0 to 30 cm) in Jeolla Province of South Korea using the DSM technique. To predict spatial carbon stock, we used Cubist, which a data-mining algorithm model base on tree regression. Soil samples (130 in total) were collected from three depths (0 to 10 cm, 10 to 20 cm, 20 to 30 cm) considering spatial distribution in Jeolla Province. These data were randomly divided into two sets for model calibration (70%) and validation (30%). The results showed that clay content, topographic wetness index (TWI), and digital elevation model (DEM) were the most important environmental covariate predictors of soil carbon stock. The predicted average soil carbon density was 3.88 kg·m-2. The R2 value representing the model's performance was 0.6, which was relatively high compared to a previous study. The total soil carbon stocks at a depth of 0 to 30 cm in Jeolla Province were estimated to be about 81 megatons.

Risk Factors for Sarcopenia, Sarcopenic Obesity, and Sarcopenia Without Obesity in Older Adults

  • Kim, Seo-hyun;Yi, Chung-hwi;Lim, Jin-seok
    • Physical Therapy Korea
    • /
    • v.28 no.3
    • /
    • pp.177-185
    • /
    • 2021
  • Background: Muscle undergoes change continuously with aging. Sarcopenia, in which muscle mass decrease with aging, is associated with various diseases, the risk of falling, and the deterioration of quality of life. Obesity and sarcopenia also have a synergy effect on the disease of the older adults. Objects: This study examined the risk factors for sarcopenia, sarcopenic obesity, and sarcopenia without obesity and developed prediction models. Methods: This machine-learning study used the 2008-2011 Korea National Health and Nutrition Examination Surveys in the analysis. After data curation, 5,563 older participants were selected, of whom 1,169 had sarcopenia, 538 had sarcopenic obesity, and 631 had sarcopenia without obesity; the remaining 4,394 were normal. Decision tree and random forest models were used to identify risk factors. Results: The risk factors for sarcopenia chosen by both methods were body mass index (BMI) and duration of moderate physical activity; those for sarcopenic obesity were sex, BMI, and duration of moderate physical activity; and those for sarcopenia without obesity were BMI and sex. The areas under the receiver operating characteristic curves of all prediction models exceeded 0.75. BMI could predict sarcopenia-related disease. Conclusion: Risk factors for sarcopenia-related diseases should be identified and programs for sarcopenia-related disease prevention should be developed. Data-mining research using population data should be conducted to enhance the effectiveness of early treatment for people with sarcopenia-related diseases through predictive models.

Modelling the deflection of reinforced concrete beams using the improved artificial neural network by imperialist competitive optimization

  • Li, Ning;Asteris, Panagiotis G.;Tran, Trung-Tin;Pradhan, Biswajeet;Nguyen, Hoang
    • Steel and Composite Structures
    • /
    • v.42 no.6
    • /
    • pp.733-745
    • /
    • 2022
  • This study proposed a robust artificial intelligence (AI) model based on the social behaviour of the imperialist competitive algorithm (ICA) and artificial neural network (ANN) for modelling the deflection of reinforced concrete beams, abbreviated as ICA-ANN model. Accordingly, the ICA was used to adjust and optimize the parameters of an ANN model (i.e., weights and biases) aiming to improve the accuracy of the ANN model in modelling the deflection reinforced concrete beams. A total of 120 experimental datasets of reinforced concrete beams were employed for this aim. Therein, applied load, tensile reinforcement strength and the reinforcement percentage were used to simulate the deflection of reinforced concrete beams. Besides, five other AI models, such as ANN, SVM (support vector machine), GLMNET (lasso and elastic-net regularized generalized linear models), CART (classification and regression tree) and KNN (k-nearest neighbours), were also used for the comprehensive assessment of the proposed model (i.e., ICA-ANN). The comparison of the derived results with the experimental findings demonstrates that among the developed models the ICA-ANN model is that can approximate the reinforced concrete beams deflection in a more reliable and robust manner.