• Title/Summary/Keyword: robust test

Search Result 735, Processing Time 0.026 seconds

Korea Pathfinder Lunar Orbiter (KPLO) Operation: From Design to Initial Results

  • Moon-Jin Jeon;Young-Ho Cho;Eunhyeuk Kim;Dong-Gyu Kim;Young-Joo Song;SeungBum Hong;Jonghee Bae;Jun Bang;Jo Ryeong Yim;Dae-Kwan Kim
    • Journal of Astronomy and Space Sciences
    • /
    • v.41 no.1
    • /
    • pp.43-60
    • /
    • 2024
  • Korea Pathfinder Lunar Orbiter (KPLO) is South Korea's first space exploration mission, developed by the Korea Aerospace Research Institute. It aims to develop technologies for lunar exploration, explore lunar science, and test new technologies. KPLO was launched on August 5, 2022, by a Falcon-9 launch vehicle from cape canaveral space force station (CCSFS) in the United States and placed on a ballistic lunar transfer (BLT) trajectory. A total of four trajectory correction maneuvers were performed during the approximately 4.5-month trans-lunar cruise phase to reach the Moon. Starting with the first lunar orbit insertion (LOI) maneuver on December 16, the spacecraft performed a total of three maneuvers before arriving at the lunar mission orbit, at an altitude of 100 kilometers, on December 27, 2022. After entering lunar orbit, the commissioning phase validated the operation of the mission mode, in which the payload is oriented toward the center of the Moon. After completing about one month of commissioning, normal mission operations began, and each payload successfully performed its planned mission. All of the spacecraft operations that KPLO performs from launch to normal operations were designed through the system operations design process. This includes operations that are automatically initiated post-separation from the launch vehicle, as well as those in lunar transfer orbit and lunar mission orbit. Key operational procedures such as the spacecraft's initial checkout, trajectory correction maneuvers, LOI, and commissioning were developed during the early operation preparation phase. These procedures were executed effectively during both the early and normal operation phases. The successful execution of these operations confirms the robust verification of the system operation.

Deep learning-based automatic segmentation of the mandibular canal on panoramic radiographs: A multi-device study

  • Moe Thu Zar Aung;Sang-Heon Lim;Jiyong Han;Su Yang;Ju-Hee Kang;Jo-Eun Kim;Kyung-Hoe Huh;Won-Jin Yi;Min-Suk Heo;Sam-Sun Lee
    • Imaging Science in Dentistry
    • /
    • v.54 no.1
    • /
    • pp.81-91
    • /
    • 2024
  • Purpose: The objective of this study was to propose a deep-learning model for the detection of the mandibular canal on dental panoramic radiographs. Materials and Methods: A total of 2,100 panoramic radiographs (PANs) were collected from 3 different machines: RAYSCAN Alpha (n=700, PAN A), OP-100 (n=700, PAN B), and CS8100 (n=700, PAN C). Initially, an oral and maxillofacial radiologist coarsely annotated the mandibular canals. For deep learning analysis, convolutional neural networks (CNNs) utilizing U-Net architecture were employed for automated canal segmentation. Seven independent networks were trained using training sets representing all possible combinations of the 3 groups. These networks were then assessed using a hold-out test dataset. Results: Among the 7 networks evaluated, the network trained with all 3 available groups achieved an average precision of 90.6%, a recall of 87.4%, and a Dice similarity coefficient (DSC) of 88.9%. The 3 networks trained using each of the 3 possible 2-group combinations also demonstrated reliable performance for mandibular canal segmentation, as follows: 1) PAN A and B exhibited a mean DSC of 87.9%, 2) PAN A and C displayed a mean DSC of 87.8%, and 3) PAN B and C demonstrated a mean DSC of 88.4%. Conclusion: This multi-device study indicated that the examined CNN-based deep learning approach can achieve excellent canal segmentation performance, with a DSC exceeding 88%. Furthermore, the study highlighted the importance of considering the characteristics of panoramic radiographs when developing a robust deep-learning network, rather than depending solely on the size of the dataset.

An Improved Method to Determine Corn (Zea mays L.) Plant Response to Glyphosate (Glyphosate에 대한 옥수수 반응의 개선된 검정방법)

  • Kim, Jin-Seog;Lee, Byung-Hoi;Kim, So-Hee;Min, Suk-Ki;Choi, Jung-Sup
    • Journal of Plant Biotechnology
    • /
    • v.33 no.1
    • /
    • pp.57-62
    • /
    • 2006
  • Several methods for determining the response of corn to glyphosate were investigated to provide a fast and reliable method for identifying glyphosate-resistant corn in vivo. Two bioassays were developed. One assay is named 'whole plant / leaf growth assay', in which the herbicide glyphosate is applied on the upper part of 3rd leaf and the growth of herbicide-untreated 4th leaf is measured at 3 day after treatment. in this assay, the leaf growth of conventional corn was inhibited in a dose dependent from 50 to $1600{\mu}g/mL$ of glyphosate and growth inhibition at $1600{\mu}g/mL$ was 55% of untreated control. The assay has the potential to be used especially in the case that the primary cause of glyphosate resistance is related with a reduction of the herbicide translocation. Another assay is named 'leaf segment / shikimate accumulation assay', in which the four excised leaf segments ($4{\times}4mm$) are placed in each well of a 48-well microtiter plate containing $200{\mu}L$ test solution and the amount of shikimate is determined after incubation for 24 h in continuous light at $25^{\circ}C$. In this assay, 0.33% sucrose added to basic test solution enhanced a shikimate accumulation by 3 to 4 times and the shikimate accumulation was linearly occurred from 2 to $8{\mu}g/mL$ of glyphosate, showing an improved response to the method described by Shaner et al. (2005). The leaf segment / shikimate accumulation assay is simple and robust and has the potential to be used as a high throughput assay in the case that the primary cause of glyphosate resistance is related with EPSPS, target site of the herbicide. Taken together, these two assays would be highly useful to initially select the lines obtained after transformation, to investigate the migration of glyphosate-resistant gene into other weeds and to detect a weedy glyphosate-resistant corn in field.

Evaluation of the Measurement Uncertainty from the Standard Operating Procedures(SOP) of the National Environmental Specimen Bank (국가환경시료은행 생태계 대표시료의 채취 및 분석 표준운영절차에 대한 단계별 측정불확도 평가 연구)

  • Lee, Jongchun;Lee, Jangho;Park, Jong-Hyouk;Lee, Eugene;Shim, Kyuyoung;Kim, Taekyu;Han, Areum;Kim, Myungjin
    • Journal of Environmental Impact Assessment
    • /
    • v.24 no.6
    • /
    • pp.607-618
    • /
    • 2015
  • Five years have passed since the first set of environmental samples was taken in 2011 to represent various ecosystems which would help future generations lead back to the past environment. Those samples have been preserved cryogenically in the National Environmental Specimen Bank(NESB) at the National Institute of Environmental Research. Even though there is a strict regulation (SOP, standard operating procedure) that rules over the whole sampling procedure to ensure each sample to represent the sampling area, it has not been put to the test for the validation. The question needs to be answered to clear any doubts on the representativeness and the quality of the samples. In order to address the question and ensure the sampling practice set in the SOP, many steps to the measurement of the sample, that is, from sampling in the field and the chemical analysis in the lab are broken down to evaluate the uncertainty at each level. Of the 8 species currently taken for the cryogenic preservation in the NESB, pine tree samples from two different sites were selected for this study. Duplicate samples were taken from each site according to the sampling protocol followed by the duplicate analyses which were carried out for each discrete sample. The uncertainties were evaluated by Robust ANOVA; two levels of uncertainty, one is the uncertainty from the sampling practice, and the other from the analytical process, were then compiled to give the measurement uncertainty on a measured concentration of the measurand. As a result, it was confirmed that it is the sampling practice not the analytical process that accounts for the most of the measurement uncertainty. Based on the top-down approach for the measurement uncertainty, the efficient way to ensure the representativeness of the sample was to increase the quantity of each discrete sample for the making of a composite sample, than to increase the number of the discrete samples across the site. Furthermore, the cost-effective approach to enhance the confidence level on the measurement can be expected from the efforts to lower the sampling uncertainty, not the analytical uncertainty. To test the representativeness of a composite sample of a sampling area, the variance within the site should be less than the difference from duplicate sampling. For that, a criterion, ${i.e.s^2}_{geochem}$(across the site variance) <${s^2}_{samp}$(variance at the sampling location) was proposed. In light of the criterion, the two representative samples for the two study areas passed the requirement. In contrast, whenever the variance of among the sampling locations (i.e. across the site) is larger than the sampling variance, more sampling increments need to be added within the sampling area until the requirement for the representativeness is achieved.

Optimum Synthesis Conditions of Coating Slurry for Metallic Structured De-NOx Catalyst by Coating Process on Ship Exhaust Gas (선박 배연탈질용 금속 구조체 기반 촉매 제조를 위한 코팅슬러리 최적화)

  • Jeong, Haeyoung;Kim, Taeyong;Im, Eunmi;Lim, Dong-Ha
    • Clean Technology
    • /
    • v.24 no.2
    • /
    • pp.127-134
    • /
    • 2018
  • To reduce the environmental pollution by $NO_x$ from ship engine, International maritime organization (IMO) announced Tier III regulation, which is the emmision regulation of ship's exhaust gas in Emission control area (ECA). Selective catalytic reduction (SCR) process is the most commercial $De-NO_x$ system in order to meet the requirement of Tier III regulation. In generally, commercial ceramic honeycomb SCR catalyst has been installed in SCR reactor inside marine vessel engine. However, the ceramic honeycomb SCR catalyst has some serious issues such as low strength and easy destroution at high velocity of exhaust gas from the marine engine. For these reasons, we design to metallic structured catalyst in order to compensate the defects of the ceramic honeycomb catalyst for applying marine SCR system. Especially, metallic structured catalyst has many advantages such as robustness, compactness, lightness, and high thermal conductivity etc. In this study, in order to support catalyst on metal substrate, coating slurry is prepared by changing binder. we successfully fabricate the metallic structured catalyst with strong adhesion by coating, drying, and calcination process. And we carry out the SCR performance and durability such as sonication and dropping test for the prepared samples. The MFC01 shows above 95% of $NO_x$ conversion and much more robust and more stable compared to the commercial honeycomb catalyst. Based on the evaluation of characterization and performance test, we confirm that the proposed metallic structured catalyst in this study has high efficient and durability. Therefore, we suggest that the metallic structured catalyst may be a good alternative as a new type of SCR catalyst for marine SCR system.

A Study on the Born Global Venture Corporation's Characteristics and Performance ('본글로벌(born global)전략'을 추구하는 벤처기업의 특성과 성과에 관한 연구)

  • Kim, Hyung-Jun;Jung, Duk-Hwa
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.39-59
    • /
    • 2007
  • The international involvement of a firm has been described as a gradual development process "a process in which the enterprise gradually increases its international involvement in many studies. This process evolves in the interplay between the development of knowledge about foreign markets and operations on one hand and increasing commitment of resources to foreign markets on the other." On the basis of Uppsala internationalization model, many studies strengthen strong theoretical and empirical support. According to the predictions of the classic stages theory, the internationalization process of firms have been recognized and characterized gradual evolution to foreign markets, so called stage theory: indirect & direct export, strategic alliance and foreign direct investment. However, termed "international new ventures" (McDougall, Shane, and Oviatt 1994), "born globals" (Knight 1997; Knight and Cavusgil 1996; Madsen and Servais 1997), "instant internationals" (Preece, Miles, and Baetz 1999), or "global startups" (Oviatt and McDougall 1994) have been used and come into spotlight in internationalization study of technology intensity venture companies. Recent researches focused on venture company have suggested the phenomenons of 'born global' firms as a contradiction to the stages theory. Especially the article by Oviatt and McDougall threw the spotlight on international entrepreneurs, on international new ventures, and on their importance in the globalising world economy. Since venture companies have, by definition. lack of economies of scale, lack of resources (financial and knowledge), and aversion to risk taking, they have a difficulty in expanding their market to abroad and pursue internalization gradually and step by step. However many venture companies have pursued 'Born Global Strategy', which is different from process strategy, because corporate's environment has been rapidly changing to globalization. The existing studies investigate that (1) why the ventures enter into overseas market in those early stage, even in infancy, (2) what make the different international strategy among ventures and the born global strategy is better to the infant ventures. However, as for venture's performance(growth and profitability), the existing results do not correspond each other. They also, don't include marketing strategy (differentiation, low price, market breadth and market pioneer) that is important factors in studying of BGV's performance. In this paper I aim to delineate the appearance of international new ventures and the phenomenons of venture companies' internationalization strategy. In order to verify research problems, I develop a resource-based model and marketing strategies for analyzing the effects of the born global venture firms. In this paper, I suggested 3 research problems. First, do the korean venture companies take some advantages in the aspects of corporate's performances (growth, profitability and overall market performances) when they pursue internationalization from inception? Second, do the korean BGV have firm specific assets (foreign experiences, foreign orientation, organizational absorptive capacity)? Third, What are the marketing strategies of korean BGV and is it different from others? Under these problems, I test then (1) whether the BGV that a firm started its internationalization activity almost from inception, has more intangible resources(foreign experience of corporate members, foreign orientation, technological competences and absorptive capacity) than any other venture firms(Non_BGV) and (2) also whether the BGV's marketing strategies-differentiation, low price, market diversification and preemption strategy are different from Non_BGV. Above all, the main purpose of this research is that results achieved by BGV are indeed better than those obtained by Non_BGV firms with respect to firm's growth rate and efficiency. To do this research, I surveyed venture companies located in Seoul and Deajeon in Korea during November to December, 2005. I gather the data from 200 venture companies and then selected 84 samples, which have been founded during 1999${\sim}$2000. To compare BGV's characteristics with those of Non_BGV, I also had to classify BGV by export intensity over 50% among five or six aged venture firms. Many other researches tried to classify BGV and Non_BGV, but there were various criterion as many as researchers studied on this topic. Some of them use time gap, which is time difference of establishment and it's first internationalization experience and others use export intensity, ration of export sales amount divided by total sales amount. Although using a mixed criterion of prior research in my case, I do think this kinds of criterion is subjective and arbitrary rather than objective, so I do mention my research has some critical limitation in the classification of BGV and Non_BGV. The first purpose of research is the test of difference of performance between BGV and Non_BGV. As a result of t-test, the research show that there are statistically efficient difference not only in the growth rate (sales growth rate compared to competitors and 3 years averaged sales growth rate) but also in general market performance of BGV. But in case of profitability performance, the hypothesis that is BGV is more profit (return on investment(ROI) compared to competitors and 3 years averaged ROI) than Non-BGV was not supported. From these results, this paper concludes that BGV grows rapidly and gets a high market performance (in aspect of market share and customer loyalty) but there is no profitability difference between BGV and Non_BGV. The second result is that BGV have more absorptive capacity especially, knowledge competence, and entrepreneur's international experience than Non_BGV. And this paper also found BGV search for product differentiation, exemption strategy and market diversification strategy while Non_BGV search for low price strategy. These results have never been dealt with other existing studies. This research has some limitations. First limitation is concerned about the definition of BGV, as I mentioned above. Conceptually speaking, BGV is defined as company pursue internationalization from inception, but in empirical study, it's very difficult to classify between BGV and Non_BGV. I tried to classify on the basis of time difference and export intensity, this criterions are so subjective and arbitrary that the results are not robust if the criterion were changed. Second limitation is concerned about sample used in this research. I surveyed venture companies just located in Seoul and Daejeon and also use only 84 samples which more or less provoke sample bias problem and generalization of results. I think the more following studies that focus on ventures located in other region, the better to verify the results of this paper.

  • PDF

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

A PLS Path Modeling Approach on the Cause-and-Effect Relationships among BSC Critical Success Factors for IT Organizations (PLS 경로모형을 이용한 IT 조직의 BSC 성공요인간의 인과관계 분석)

  • Lee, Jung-Hoon;Shin, Taek-Soo;Lim, Jong-Ho
    • Asia pacific journal of information systems
    • /
    • v.17 no.4
    • /
    • pp.207-228
    • /
    • 2007
  • Measuring Information Technology(IT) organizations' activities have been limited to mainly measure financial indicators for a long time. However, according to the multifarious functions of Information System, a number of researches have been done for the new trends on measurement methodologies that come with financial measurement as well as new measurement methods. Especially, the researches on IT Balanced Scorecard(BSC), concept from BSC measuring IT activities have been done as well in recent years. BSC provides more advantages than only integration of non-financial measures in a performance measurement system. The core of BSC rests on the cause-and-effect relationships between measures to allow prediction of value chain performance measures to allow prediction of value chain performance measures, communication, and realization of the corporate strategy and incentive controlled actions. More recently, BSC proponents have focused on the need to tie measures together into a causal chain of performance, and to test the validity of these hypothesized effects to guide the development of strategy. Kaplan and Norton[2001] argue that one of the primary benefits of the balanced scorecard is its use in gauging the success of strategy. Norreklit[2000] insist that the cause-and-effect chain is central to the balanced scorecard. The cause-and-effect chain is also central to the IT BSC. However, prior researches on relationship between information system and enterprise strategies as well as connection between various IT performance measurement indicators are not so much studied. Ittner et al.[2003] report that 77% of all surveyed companies with an implemented BSC place no or only little interest on soundly modeled cause-and-effect relationships despite of the importance of cause-and-effect chains as an integral part of BSC. This shortcoming can be explained with one theoretical and one practical reason[Blumenberg and Hinz, 2006]. From a theoretical point of view, causalities within the BSC method and their application are only vaguely described by Kaplan and Norton. From a practical consideration, modeling corporate causalities is a complex task due to tedious data acquisition and following reliability maintenance. However, cause-and effect relationships are an essential part of BSCs because they differentiate performance measurement systems like BSCs from simple key performance indicator(KPI) lists. KPI lists present an ad-hoc collection of measures to managers but do not allow for a comprehensive view on corporate performance. Instead, performance measurement system like BSCs tries to model the relationships of the underlying value chain in cause-and-effect relationships. Therefore, to overcome the deficiencies of causal modeling in IT BSC, sound and robust causal modeling approaches are required in theory as well as in practice for offering a solution. The propose of this study is to suggest critical success factors(CSFs) and KPIs for measuring performance for IT organizations and empirically validate the casual relationships between those CSFs. For this purpose, we define four perspectives of BSC for IT organizations according to Van Grembergen's study[2000] as follows. The Future Orientation perspective represents the human and technology resources needed by IT to deliver its services. The Operational Excellence perspective represents the IT processes employed to develop and deliver the applications. The User Orientation perspective represents the user evaluation of IT. The Business Contribution perspective captures the business value of the IT investments. Each of these perspectives has to be translated into corresponding metrics and measures that assess the current situations. This study suggests 12 CSFs for IT BSC based on the previous IT BSC's studies and COBIT 4.1. These CSFs consist of 51 KPIs. We defines the cause-and-effect relationships among BSC CSFs for IT Organizations as follows. The Future Orientation perspective will have positive effects on the Operational Excellence perspective. Then the Operational Excellence perspective will have positive effects on the User Orientation perspective. Finally, the User Orientation perspective will have positive effects on the Business Contribution perspective. This research tests the validity of these hypothesized casual effects and the sub-hypothesized causal relationships. For the purpose, we used the Partial Least Squares approach to Structural Equation Modeling(or PLS Path Modeling) for analyzing multiple IT BSC CSFs. The PLS path modeling has special abilities that make it more appropriate than other techniques, such as multiple regression and LISREL, when analyzing small sample sizes. Recently the use of PLS path modeling has been gaining interests and use among IS researchers in recent years because of its ability to model latent constructs under conditions of nonormality and with small to medium sample sizes(Chin et al., 2003). The empirical results of our study using PLS path modeling show that the casual effects in IT BSC significantly exist partially in our hypotheses.