• Title/Summary/Keyword: tree classification method

Search Result 361, Processing Time 0.034 seconds

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

Relationships between Community Unit and Environment Factor in Forest Vegetation of Mt. Dutasan, Pyeongchang-gun (평창 두타산 산림식생의 군집유형과 입지환경요인의 상관관계)

  • Lee, Jeong Eun;Shin, Jae Kwon;Kim, Dong Gap;Yun, Chung Weon
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.3
    • /
    • pp.275-287
    • /
    • 2017
  • The purpose of this study was to analyze forest vegetation type classification and relationships between the type and environment factor in Mt. Dutasan. Data were collected by total of forty six plots using Z-M phytosociological method from June to October, 2016, and analyzed by vegetation classification, canopy layer structure and relationships between vegetation unit and environment factor using coincidence methods. As a result of vegetation type classification, Quercus mongolica community group was classified at a top level of vegetation hierarchy that was classified into Rhododendron schlippenbachii community and Betula costata community. R. schlippenbachii community was divided into Lychnis cognata group and R. schlippenbachii typical group. L. cognata group was subdivided into Veratrum oxysepalum subgroup and L. cognata typical subgroup. B. costata community was divided into Fraxinus mandshurica group and Betula schmidtii group. F. mandshurica group was subdivided into Weigela subsessilis subgroup and Cimicifuga heracleifolia subgroup. Therefore the forest vegetation was composed of six vegetation units with two kinds of bisected species groups and fourteen species groups. As the result of an analysis of canopy layer structure, there were two kinds of structures with monotonous structures V. oxysepalum subgroup (vegetation units 1), L. cognata typical subgroup (vegetation units 2), W. subsessilis subgroup (vegetation units 4) and complicated structures R. schlippenbachii typical group (vegetation units 3), C. heracleifolia subgroup (vegetation units 5), Betula schmidtii group (vegetation units 6). The vertical layer structure of vegetation unit 5 was the most developed and vegetation unit 6 had the lowest coverage of herb layer. According to the correlation between vegetation unit and environmental factor, R. schlippenbachii community (vegetation units 1~3) and B. costata community (vegetation units 4~6) were classified based on 1,100 m of altitude, middle slope, twenty of slope degree, twenty percents of bare rock and thirty centimeters of DBH in tree layer. R. schlippenbachii community (vegetation units 1~3) showed positive correlation with altitude, topography and B. costata community (vegetation units 4~6) showed negative correlation tendency with them.

Forest Vegetation Classification and Quantitative Analysis of Picea jezoensis and Abies hollophylla stand in Mt. Gyebang (계방산 가문비나무 및 전나무 임분의 산림식생유형분류와 정량적 분석)

  • Ko, Seung-Yeon;Han, Sang-Hak;Lee, Won-Hee;Han, Sim-Hee;Shin, Hak-Sub;Yun, Chung-Weon
    • Korean Journal of Environment and Ecology
    • /
    • v.28 no.2
    • /
    • pp.182-196
    • /
    • 2014
  • In this study, for the forest vegetation classification and the quantitative analysis of the Picea jezoensis and Abies hollophylla stand, the type classification of the vegetation structure was performed with Z-M phytosociological method, and as a result, it was classified into the Picea jenoensis community and the Abies holophylla community in the community unity. The Picea jezoensis community was subdivided into the Rosa koreana group and the Acer ukurunduense group in the group unity and the Abies holophylla community was subdivided into the Acer mandshuricum group and the Lindera obtusiloba group. In the results of estimating the importance value based on the classified vegetation unity, it was deemed that the dominance of the Picea jezoensis would be continued for a while as the importance value from the tree layers of vegetation unity 1 and 2 represented relatively high with 30.73% and 20.25%. In addition, in the results of analyzing the species diversity to estimate the maturity of the community, the species diversity index of the vegetation unity 4 was the lowest with 0.6976 and that of vegetation unity 2 was the highest with 1.1256. As in the similarity between the communities, the vegetation unit 1 and 4 and the vegetation unit 2 and 4 represented low with 0.2880 and 0.3626, respectively, and the similarity between the vegetation unit 1 and 2 and between 2 and 4 represented 0.5411 and 0.5041, respectively, it was deemed that they were the communities that the difference in the composition species between the communities was not big. In the results of analyzing the Chi-square matrix and the catalog of constellations for the interspecific, they were divided mainly into two types, and type 1 plant species were mostly differential species and the characteristic species, which appeared in the Picea jezoensis community classified phytosociologically, and type II plant species were mostly the species appearing in the Abies holophylla community growing in the relatively damp places. Such results is deemed that the positive (+) correlation is recognized among the species, of which growing environments are similar, and the negative (-) correlation .represents among the species, of which preferential environments are different.

Interpretation Method of Eco-Cultural Resources from the Perspective of Landscape Ecology in Jeju Olle Trail (제주 올레길 생태문화자원 경관생태학적 해석기법 연구)

  • Hur, Myung-Jin;Han, Bong-Ho;Park, Seok-Cheol
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.2
    • /
    • pp.128-140
    • /
    • 2021
  • This study applied the theory of Landscape Ecology to representative resources of Jeju Olle-gil, which is a representative subject of walking tourism, to identify ecological characteristics and to establish a technique for landscape ecological analysis of Olle-gil resources. Jeju Olle Trail type based on the biotope type, major land use, vegetation status around Olle Trail and roads were divided into 12 types. Based on the type of ecological tourism resource classification, the Jeju Olle-gil walking tourism resource classification was divided into seven types of natural resources and seven types of humanities resources, and each resource was characterized by Geotope, Biotope, and Anthropopope, just like the landscape ecology system. Geotope resources are strong in landscape characteristics such as coast and beach, rocks, bedrocks, waterfalls, geology and Jusangjeolli Cliff, Oreum and craters, water resources, and landscape viewpoints. The Biotope resources showed strong ecological characteristics due to large tree and protected tree, Gotjawal, forest road and vegetation communities, biological habitat, vegetation landscape view point. Antropotope include Culture of Jeju Haenyeo and traditional culture, potting and lighthouses, experience facilities, temples and churches, military and beacon facilities, other historical and cultural facilities, and cultural landscape views. Jeju Olle Trail The representative resources for each type of Jeju Olle Trail are coastal, Oreum, Gotjawal, field and Stonewall Fencing farming land, Jeju Village and Stone wall of Jeju. In order to learn about the components and various functions of the resources representing the Olle Trail's ecological culture, the landscape ecological technique was interpreted. Looking at the ecological and cultural characteristics of coastal, the coast includes black basalt rocks, coastal vegetation, coastal grasslands, coastal rock vegetation, winter migratory birds and Jeju haenyeo. Oreum is a unique volcanic topography, which includes circular and oval mountain bodies, oreum vegetation, crater wetlands, the origin and legend of the name of Oreum, the legend of the name of Oreum, the culture of grazing horses, the use of military purposes, the object of folk belief, and the view from the summit. Gotjawal features rocky bumps, unique microclimate formation, Gotjawal vegetation, geographical names, the culture of charcoal being baked in the past, and bizarre shapes of trees and vines. Field walls include the structure and shape of field walls, field cultivation crops, field wall habitats, Jeju agricultural culture, and field walls. The village includes a stone wall and roof structure built from basalt, a pavilion at the entrance of the village, a yard and garden inside the house, a view of the lives of local people, and an alleyway view. These resources have slowly changed with the long lives of humans, and are now unique to Jeju Island. By providing contents specialized for each type of Olle Trail, tourists who walk on Olle will be able to experience the Olle Trail in depth as they learn the story of the resources, and will be able to increase the sustainable use and satisfaction of Jeju Olle Trail users.

Application of Hyperspectral Imagery to Decision Tree Classifier for Assessment of Spring Potato (Solanum tuberosum) Damage by Salinity and Drought (초분광 영상을 이용한 의사결정 트리 기반 봄감자(Solanum tuberosum)의 염해 판별)

  • Kang, Kyeong-Suk;Ryu, Chan-Seok;Jang, Si-Hyeong;Kang, Ye-Seong;Jun, Sae-Rom;Park, Jun-Woo;Song, Hye-Young;Lee, Su Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.4
    • /
    • pp.317-326
    • /
    • 2019
  • Salinity which is often detected on reclaimed land is a major detrimental factor to crop growth. It would be advantageous to develop an approach for assessment of salinity and drought damages using a non-destructive method in a large landfills area. The objective of this study was to examine applicability of the decision tree classifier using imagery for classifying for spring potatoes (Solanum tuberosum) damaged by salinity or drought at vegetation growth stages. We focused on comparing the accuracies of OA (Overall accuracy) and KC (Kappa coefficient) between the simple reflectance and the band ratios minimizing the effect on the light unevenness. Spectral merging based on the commercial band width with full width at half maximum (FWHM) such as 10 nm, 25 nm, and 50 nm was also considered to invent the multispectral image sensor. In the case of the classification based on original simple reflectance with 5 nm of FWHM, the selected bands ranged from 3-13 bands with the accuracy of less than 66.7% of OA and 40.8% of KC in all FWHMs. The maximum values of OA and KC values were 78.7% and 57.7%, respectively, with 10 nm of FWHM to classify salinity and drought damages of spring potato. When the classifier was built based on the band ratios, the accuracy was more than 95% of OA and KC regardless of growth stages and FWHMs. If the multispectral image sensor is made with the six bands (the ratios of three bands) with 10 nm of FWHM, it is possible to classify the damaged spring potato by salinity or drought using the reflectance of images with 91.3% of OA and 85.0% of KC.

Analysis of Utilization Characteristics, Health Behaviors and Health Management Level of Participants in Private Health Examination in a General Hospital (일개 종합병원의 민간 건강검진 수검자의 검진이용 특성, 건강행태 및 건강관리 수준 분석)

  • Kim, Yoo-Mi;Park, Jong-Ho;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.301-311
    • /
    • 2013
  • This study aims to analyze characteristics, health behaviors and health management level related to private health examination recipients in one general hospital. To achieve this, we analyzed 150,501 cases of private health examination data for 11 years from 2001 to 2011 for 20,696 participants in 2011 in a Dae-Jeon general hospital health examination center. The cluster analysis for classify private health examination group is used z-score standardization of K-means clustering method. The logistic regression analysis, decision tree and neural network analysis are used to periodic/non-periodic private health examination classification model. 1,000 people were selected as a customer management business group that has high probability to be non-periodic private health examination patients in new private health examination. According to results of this study, private health examination group was categorized by new, periodic and non-periodic group. New participants in private health examination were more 30~39 years old person than other age groups and more patients suspected of having renal disease. Periodic participants in private health examination were more male participants and more patients suspected of having hyperlipidemia. Non-periodic participants in private health examination were more smoking and sitting person and more patients suspected of having anemia and diabetes mellitus. As a result of decision tree, variables related to non-periodic participants in private health examination were sex, age, residence, exercise, anemia, hyperlipidemia, diabetes mellitus, obesity and liver disease. In particular, 71.4% of non-periodic participants were female, non-anemic, non-exercise, and suspicious obesity person. To operation of customized customer management business for private health examination will contribute to efficiency in health examination center.

Community Structure and Floristic Composition of Cymbidium goeringii Group in Korean Islets (한반도 도서지역 춘란집단의 종조성과 군락구조)

  • Song, Hong-Seon;Park, Yong-Jin
    • FLOWER RESEARCH JOURNAL
    • /
    • v.18 no.2
    • /
    • pp.110-116
    • /
    • 2010
  • This text was analyzed and investigated the vegetation and floristic composition by ordination and classification of phytosociological method, to evaluate the species composition and community structure of Cymbidium goeringii group in Korean islets. In habitat of 33 plots, the mean altitude was 65.9m, the direction was the southeast slope, the mean slope was 7.9%. The coverage of Cymbidium goeringii was 4.5%. The appearing plants with the Cymbidium goeringii was the total 102 taxa, and it was the kind of trees 68 taxa (66.7%), herbs 34 taxa (33.3%), evergreen plants 36 taxa (35.3%) and deciduous plants 66 taxa (64.7 %) respectively. The frequency of appearing plant was the highest in the Eurya japonica (48.5%), followed by Pinus thunbergii (45.5%), Smilax china (36.4%), Carex lanceolata (33.3%), Hedera rhombea (33.3%), Machilus thunbergii (30.0%), Styrax japonicus (30.3%) and Pinus densiflora (27.3%), respectively. The vegetation of tree layer in Cymbidium goeringii group was classified into Pinus thunbergii community, Pinus densiflora community, Castanopsis sieboldii community and Quercus variabilis community. Pinus densiflora community showed a strong combination with Cymbidium goeringii group in Korean islets. Pinus thunbergii community among communities was combined with Castanopsis sieboldii community, and Pinus densiflora community and Quercus variabilis community were combined.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.