• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.031 seconds

Wage Determinants Analysis by Quantile Regression Tree

  • Chang, Young-Jae
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.293-301
    • /
    • 2012
  • Quantile regression proposed by Koenker and Bassett (1978) is a statistical technique that estimates conditional quantiles. The advantage of using quantile regression is the robustness in response to large outliers compared to ordinary least squares(OLS) regression. A regression tree approach has been applied to OLS problems to fit flexible models. Loh (2002) proposed the GUIDE algorithm that has a negligible selection bias and relatively low computational cost. Quantile regression can be regarded as an analogue of OLS, therefore it can also be applied to GUIDE regression tree method. Chaudhuri and Loh (2002) proposed a nonparametric quantile regression method that blends key features of piecewise polynomial quantile regression and tree-structured regression based on adaptive recursive partitioning. Lee and Lee (2006) investigated wage determinants in the Korean labor market using the Korean Labor and Income Panel Study(KLIPS). Following Lee and Lee, we fit three kinds of quantile regression tree models to KLIPS data with respect to the quantiles, 0.05, 0.2, 0.5, 0.8, and 0.95. Among the three models, multiple linear piecewise quantile regression model forms the shortest tree structure, while the piecewise constant quantile regression model has a deeper tree structure with more terminal nodes in general. Age, gender, marriage status, and education seem to be the determinants of the wage level throughout the quantiles; in addition, education experience appears as the important determinant of the wage level in the highly paid group.

Safety Analysis on the Tritium Release Accidents

  • Yang, Hee joong
    • Journal of Korean Society for Quality Management
    • /
    • v.19 no.2
    • /
    • pp.96-107
    • /
    • 1991
  • At the design stage of a plant, the plausible causes and pathways of release of hazardous materials are not clearly known. Thus there exist large amount of uncertainties on the consequences resulting from the operation of a fusion plant. In order to better handle such uncertain circumstances, we utilize the Probabilistic Risk Assessment(PRA) for the safety analyses on fusion power plant. In this paper, we concentrate on the tritium release accident. We develop a simple model that describes the process and flow of tritium, by which we figure out the locations of tritium inventory and their vulnerability. We construct event tree models that lead to various levels of tritium release from abnormal initiating events. Branch parameters on the event tree are assessed from the fault tree analysis. Based on the event tree models we construct influence diagram models which are more useful for the parameter updating and analysis. We briefly discuss the parameter updating scheme, and finally develop the methodology to obtain the predictive distribution of consequences resulting from the operating a fusion power plant. We also discuss the way to utilize the results of testing on sub-systems to reduce the uncertain ties on over all system.

  • PDF

Selection of Important Variables in the Classification Model for Successful Flight Training (조종사 비행훈련 성패예측모형 구축을 위한 중요변수 선정)

  • Lee, Sang-Heon;Lee, Sun-Doo
    • IE interfaces
    • /
    • v.20 no.1
    • /
    • pp.41-48
    • /
    • 2007
  • The main purpose of this paper is cost reduction in absurd pilot positive expense and human accident prevention which is caused by in the pilot selection process. We use classification models such as logistic regression, decision tree, and neural network based on aptitude test results of 505 ROK Air Force applicants in 2001~2004. First, we determine the reliability and propriety against the aptitude test system which has been improved. Based on this conference flight simulator test item was compared to the new aptitude test item in order to make additional yes or no decision from different models in terms of classification accuracy, ROC and Response Threshold side. Decision tree was selected as the most efficient for each sequential flight training result and the last flight training results predict excellent. Therefore, we propose that the standard of pilot selection be adopted by the decision tree and it presents in the aptitude test item which is new a conference flight simulator test.

Forecasting Corporate Bankruptcy with Artificial Intelligence (인공지능기법을 이용한 기업부도 예측)

  • Oh, Woo-Seok;Kim, Jin-Hwa
    • Journal of Industrial Convergence
    • /
    • v.15 no.1
    • /
    • pp.17-32
    • /
    • 2017
  • The purpose of this study is to evaluate financial models that can predict corporate bankruptcy with diverse studies on evaluation models. The study uses discriminant analysis, logistic model, decision tree, neural networks as analyses tools with 18 input variables as major financial factors. The study found meaningful variables such as current ratio, return on investment, ordinary income to total assets, total debt turn over rate, interest expenses to sales, net working capital to total assets and it also found that prediction performance of suggested method is a bit low compared to that in literature review. It is because the studies in the past uses the data set on the listed companies or companies audited from outside. And this study uses data on the companies whose credibility is not verified enough. Another finding is that models based on decision tree analysis and discriminant analysis showed the highest performance among many bankruptcy forecasting models.

  • PDF

Decision Tree-Based Feature-Selective Neural Network Model: Case of House Price Estimation (의사결정나무를 활용한 신경망 모형의 입력특성 선택: 주택가격 추정 사례)

  • Yoon Han-Seong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.19 no.1
    • /
    • pp.109-118
    • /
    • 2023
  • Data-based analysis methods have become used more for estimating or predicting housing prices, and neural network models and decision trees in the field of big data are also widely used more and more. Neural network models are often evaluated to be superior to existing statistical models in terms of estimation or prediction accuracy. However, there is ambiguity in determining the input feature of the input layer of the neural network model, that is, the type and number of input features, and decision trees are sometimes used to overcome these disadvantages. In this paper, we evaluate the existing methods of using decision trees and propose the method of using decision trees to prioritize input feature selection in neural network models. This can be a complementary or combined analysis method of the neural network model and decision tree, and the validity was confirmed by applying the proposed method to house price estimation. Through several comparisons, it has been summarized that the selection of appropriate input characteristics according to priority can increase the estimation power of the model.

CAD Scheme To Detect Brain Tumour In MR Images using Active Contour Models and Tree Classifiers

  • Helen, R.;Kamaraj, N.
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.2
    • /
    • pp.670-675
    • /
    • 2015
  • Medical imaging is one of the most powerful tools for gaining information about internal organs and tissues. It is a challenging task to develop sophisticated image analysis methods in order to improve the accuracy of diagnosis. The objective of this paper is to develop a Computer Aided Diagnostics (CAD) scheme for Brain Tumour detection from Magnetic Resonance Image (MRI) using active contour models and to investigate with several approaches for improving CAD performances. The problem in clinical medicine is the automatic detection of brain Tumours with maximum accuracy and in less time. This work involves the following steps: i) Segmentation performed by Fuzzy Clustering with Level Set Method (FCMLSM) and performance is compared with snake models based on Balloon force and Gradient Vector Force (GVF), Distance Regularized Level Set Method (DRLSE). ii) Feature extraction done by Shape and Texture based features. iii) Brain Tumour detection performed by various tree classifiers. Based on investigation FCMLSM is well suited segmentation method and Random Forest is the most optimum classifier for this problem. This method gives accuracy of 97% and with minimum classification error. The time taken to detect Tumour is approximately 2 mins for an examination (30 slices).

Improved Decision Tree-Based State Tying In Continuous Speech Recognition System (연속 음성 인식 시스템을 위한 향상된 결정 트리 기반 상태 공유)

  • ;Xintian Wu;Chaojun Liu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.49-56
    • /
    • 1999
  • In many continuous speech recognition systems based on HMMs, decision tree-based state tying has been used for not only improving the robustness and accuracy of context dependent acoustic modeling but also synthesizing unseen models. To construct the phonetic decision tree, standard method performs one-level pruning using just single Gaussian triphone models. In this paper, two novel approaches, two-level decision tree and multi-mixture decision tree, are proposed to get better performance through more accurate acoustic modeling. Two-level decision tree performs two level pruning for the state tying and the mixture weight tying. Using the second level, the tied states can have different mixture weights based on the similarities in their phonetic contexts. In the second approach, phonetic decision tree continues to be updated with training sequence, mixture splitting and re-estimation. Multi-mixture Gaussian as well as single Gaussian models are used to construct the multi-mixture decision tree. Continuous speech recognition experiment using these approaches on BN-96 and WSJ5k data showed a reduction in word error rate comparing to the standard decision tree based system given similar number of tied states.

  • PDF

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF

Market Timing and Seasoned Equity Offering (마켓 타이밍과 유상증자)

  • Sung Won Seo
    • Asia-Pacific Journal of Business
    • /
    • v.15 no.1
    • /
    • pp.145-157
    • /
    • 2024
  • Purpose - In this study, we propose an empirical model for predicting seasoned equity offering (SEO here after) using machine learning methods. Design/methodology/approach - The models utilize the random forest method based on decision trees that considers non-linear relationships, as well as the gradient boosting tree model. SEOs incur significant direct and indirect costs. Therefore, CEOs' decisions of seasoned equity issuances are made only when the benefits outweigh the costs, which leads to a non-linear relationship between SEOs and a determinant of them. Particularly, a variable related to market timing effectively exhibit such non-linear relations. Findings - To account for these non-linear relationships, we hypothesize that decision tree-based random forest and gradient boosting tree models are more suitable than the linear methodologies due to the non-linear relations. The results of this study support this hypothesis. Research implications or Originality - We expect that our findings can provide meaningful information to investors and policy makers by classifying companies to undergo SEOs.

Crown Ratio Models for Tectona grandis (Linn. f) Stands in Osho Forest Reserve, Oyo State, Nigeria

  • Popoola, F.S.;Adesoye, P.O.
    • Journal of Forest and Environmental Science
    • /
    • v.28 no.2
    • /
    • pp.63-67
    • /
    • 2012
  • Crown ratio is the ratio of live crown length to tree height. It is often used as an important predictor variable for tree growth equation. It indicates tree vigor and is a useful parameter in forest health assessment. The objective of the study was to develop crown ratio prediction models for Tectona grandis. Based on the data set from the temporary sample plots, several non linear equations including logistics, Chapman Richard and exponential functions were tested. These functions were evaluated in terms of coefficient of determination ($R^2$) and standard error of the estimate (SEE). The significance of the estimated parameters was also verified. Plot of residuals against estimated crown ratios were observed. Although the logistic model had the highest $R^2$ and the least SEE, Chapman-Richard and Exponential functions were observed to be more consistent in their predictive ability; and were therefore recommended for predicting crown ratio in the stand.