• Title/Summary/Keyword: data-based model

Search Result 21,105, Processing Time 0.048 seconds

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

Development of Data-Driven Science Inquiry Model and Strategy for Cultivating Knowledge-Information-Processing Competency (지식정보처리역량 함양을 위한 데이터 기반 과학탐구 모형 개발)

  • Son, Mihyun;Jeong, Daehong
    • Journal of The Korean Association For Science Education
    • /
    • v.40 no.6
    • /
    • pp.657-670
    • /
    • 2020
  • The knowledge-information-processing competency is the most essential competency in a knowledge-information-based society and is the most fundamental competency in the new problem-solving ability. Data-driven science inquiry, which emphasizes how to find and solve problems using vast amounts of data and information, is a way to cultivate the problem-solving ability in a knowledge-information-based society. Therefore, this study aims to develop a teaching-learning model and strategy for data-driven science inquiry and to verify the validity of the model in terms of knowledge information processing competency. This study is developmental research. Based on literature, the initial model and strategy were developed, and the final model and teaching strategy were completed by securing external validity through on-site application and internal validity through expert advice. The development principle of the inquiry model is the literature study on science inquiry, data science, and a statistical problem-solving model based on resource-based learning theory, which is known to be effective for the knowledge-information-processing competency and critical thinking. This model is titled "Exploratory Scientific Data Analysis" The model consisted of selecting tools, collecting and analyzing data, finding problems and exploring problems. The teaching strategy is composed of seven principles necessary for each stage of the model, and is divided into instructional strategies and guidelines for environment composition. The development of the ESDA inquiry model and teaching strategy is not easy to generalize to the whole school level because the sample was not large, and research was qualitative. While this study has a limitation that a quantitative study over large number of students could not be carried out, it has significance that practical model and strategy was developed by approaching the knowledge-information-processing competency with respect of science inquiry.

An ANP-Based Performance Model for ERP System's Implementation

  • Ko, Je-Suk;Park, Soon-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.401-409
    • /
    • 2007
  • This paper addresses a performance evaluation model for ERP system's implementation using Analytic Network Process (ANP) technique. In this study, the performance variables are identified as the perspectives of cost, business process, systems operation, and change management, respectively. The empirical study also investigated factors that affect the performance variables to find out the causal relationship between them using the ANP approach. The data for the empirical analysis were collected from manufacturing companies that have implemented ERP systems. The research findings indicate the proposed model is powerful in proposing that the indirect relationship between influencing factors and managerial effectiveness, mediated by employee satisfaction, is an important one.

  • PDF

Bayesian Model Selection in Weibull Populations

  • Kang, Sang-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1123-1134
    • /
    • 2007
  • This article addresses the problem of testing whether the shape parameters in k independent Weibull populations are equal. We propose a Bayesian model selection procedure for equality of the shape parameters. The noninformative prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. So we propose the objective Bayesian model selection procedure based on the fractional Bayes factor and the intrinsic Bayes factor under the reference prior. Simulation study and a real example are provided.

  • PDF

Bayesian Estimation Using Noninformative Priors in Hierarchical Model

  • Kim, Dal-Ho;Choi, Jin-Kap;Choi, Hee-Jo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.1033-1043
    • /
    • 2004
  • We consider the simultaneous Bayesian estimation for the normal means based on different noninformative type hyperpriors in hierarchical model. We provide numerical example using the famous baseball data in Efron and Morris (1975) for illustration.

  • PDF

Power System Stabilizer using the Free Model

  • Kim, Ho-Chan;Oh, Seong-Bo;Lee, Kwang-Yeon
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.139.3-139
    • /
    • 2001
  • The free-model concept is introduced as an alternative intelligent system technique to design a controller with input and output data only. The idea of free model comes from the Taylor series approximation, where an output can be estimated when such data as position, velocity, and acceleration are known. The parameters in the free model can be estimated using the input-output data and a controller can be designed based on the free model. The free model thus developed is shown to be controllable, observable, and robust. The accuracy of the free-model approximation can be improved by increasing the observation window and the order of the free model. The LQR method is applied to the free model to design power system stabilizers ...

  • PDF

Conjunctive Use of SWAT and WASP Models for the Water Quality Prediction in a Rural Watershed (농촌유역 하천의 수질예측을 위한 SWAT모형과 WASP모형의 연계운영)

  • 권명준;권순국;홍성구
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.45 no.2
    • /
    • pp.116-125
    • /
    • 2003
  • Predictions of stream water quality require both estimation of pollutant loading from different sources and simulation of water quality processes in the stream. Nonpoint source pollution models are often employed for estimating pollutant loading in rural watersheds. In this study, a conjunctive application of SWAT model and WASP model was made and evaluated for its applicability based on the simulation results. Runoff and nutrient loading obtained from the SWAT model were used for generating input data for WASP model. The results showed that the simulated runoff was in good agreement with the observed data and indicated reasonable applicability. Loading for the water quality parameters predicted by WASP model also showed a reasonable agreement with the observed data. It is expected that stream water quality could be predicted by the coupled application of the two models, SWAT and WASP, in rural watersheds.

Multivariate Gamma-Poisson Model and Parameter Estimation for Polytomous Data : Application to Defective Pixels of LCD (다가자료에 적합한 다변수 감마-포아송 모델과 파라미터 추정방법 : LCD 화소불량 응용)

  • Ha, Jung-Hoon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.34 no.1
    • /
    • pp.42-51
    • /
    • 2011
  • Poisson model and Gamma-Poisson model are popularly used to analyze statistical behavior from defective data. The methods are based on binary criteria, that is, good or failure. However, manufacturing industries prefer polytomous criteria for classifying manufactured products due to flexibility of marketing. In this paper, I introduce two multivariate Gamma-Poisson(MGP) models and estimation methods of the parameters in the models, which are able to handle polytomous data. The models and estimators are verified on defective pixels of LCD manufacturing. Experimental results show that both the independent MGP model and the multinomial MGP model have excellent performance in terms of mean absolute deviation and the choice of method depends on the purpose of use.

Performance analysis of the data link layer of IEC/ISA fieldbus system by simulation model (시뮬레이션 모델을 이용한 IEC/ISA 필드버스 시스템의 데이터 링크 계층 성능 분석)

  • Lee, Seong-Geun;Hong, Seung-Ho
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.2 no.3
    • /
    • pp.209-219
    • /
    • 1996
  • Fieldbus provides a real-time data communication among field devices in the process control and manufacturing automation systems. In this paper, a Petri Net model of the 1993 draft of IEC/ISA fieldbus which is proposed as an international standard of fieldbus network is developed. Based on the Petri Net model, discrete-event simulation model of IEC/ISA fieldbus network is developed. This paper evaluates the network induced delay in the data link layer of IEC/ISA fieldbus using the simulation model. In addition, an integrated discrete-event/continuous-time simulation model of fieldbus system and distributed control system is developed. This paper investigates the real-time data processing capability of IEC/ISA fieldbus and the effect of network-induced delay to the performance of control system.

  • PDF

Neutron Cross Section Evaluation on Mo-95, Tc-99, Ru-101 and Rh-1()3 in the Fast Energy Region

  • Lee, Y. D.;J. H. Chang
    • Nuclear Engineering and Technology
    • /
    • v.34 no.6
    • /
    • pp.533-544
    • /
    • 2002
  • The neutron induced nuclear data for Mo-95, Tc-99, Ru-101 and Rh-103 was calculated and evaluated in the fast energy region. The energy dependent optical model potential parameters were extracted based on the recent experimental data and applied up to 20 MeV. The s-wave strength function was calculated from the parameters. Spherical optical model, statistical model in equilibrium energy, multistep direct and multistep compound model in pre-equilibrium energy and direct capture model were used in the calculation. The theoretically calculated cross sections were compared with the experimental data and the evaluated files The model- calculated total and capture cross sections were in good agreement with the reference experimental data. The direct capture contribution improved the capture cross sections in pre- equilibrium region. The evaluated cross section results were compiled to ENDF-6 format and will improve the ENDF/B-Vl.