• Title/Summary/Keyword: Multiple Decision Method

Search Result 460, Processing Time 0.027 seconds

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Creation of regression analysis for estimation of carbon fiber reinforced polymer-steel bond strength

  • Xiaomei Sun;Xiaolei Dong;Weiling Teng;Lili Wang;Ebrahim Hassankhani
    • Steel and Composite Structures
    • /
    • v.51 no.5
    • /
    • pp.509-527
    • /
    • 2024
  • Bonding carbon fiber-reinforced polymer (CFRP) laminates have been extensively employed in the restoration of steel constructions. In addition to the mechanical properties of the CFRP, the bond strength (PU) between the CFRP and steel is often important in the eventual strengthened performance. Nonetheless, the bond behavior of the CFRP-steel (CS) interface is exceedingly complicated, with multiple failure causes, giving the PU challenging to forecast, and the CFRP-enhanced steel structure is unsteady. In just this case, appropriate methods were established by hybridized Random Forests (RF) and support vector regression (SVR) approaches on assembled CS single-shear experiment data to foresee the PU of CS, in which a recently established optimization algorithm named Aquila optimizer (AO) was used to tune the RF and SVR hyperparameters. In summary, the practical novelty of the article lies in its development of a reliable and efficient method for predicting bond strength at the CS interface, which has significant implications for structural rehabilitation, design optimization, risk mitigation, cost savings, and decision support in engineering practice. Moreover, the Fourier Amplitude Sensitivity Test was performed to depict each parameter's impact on the target. The order of parameter importance was tc> Lc > EA > tA > Ec > bc > fc > fA from largest to smallest by 0.9345 > 0.8562 > 0.79354 > 0.7289 > 0.6531 > 0.5718 > 0.4307 > 0.3657. In three training, testing, and all data phases, the superiority of AO - RF with respect to AO - SVR and MARS was obvious. In the training stage, the values of R2 and VAF were slightly similar with a tiny superiority of AO - RF compared to AO - SVR with R2 equal to 0.9977 and VAF equal to 99.772, but large differences with results of MARS.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Development of Intelligent ATP System Using Genetic Algorithm (유전 알고리듬을 적용한 지능형 ATP 시스템 개발)

  • Kim, Tai-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.131-145
    • /
    • 2010
  • The framework for making a coordinated decision for large-scale facilities has become an important issue in supply chain(SC) management research. The competitive business environment requires companies to continuously search for the ways to achieve high efficiency and lower operational costs. In the areas of production/distribution planning, many researchers and practitioners have developedand evaluated the deterministic models to coordinate important and interrelated logistic decisions such as capacity management, inventory allocation, and vehicle routing. They initially have investigated the various process of SC separately and later become more interested in such problems encompassing the whole SC system. The accurate quotation of ATP(Available-To-Promise) plays a very important role in enhancing customer satisfaction and fill rate maximization. The complexity for intelligent manufacturing system, which includes all the linkages among procurement, production, and distribution, makes the accurate quotation of ATP be a quite difficult job. In addition to, many researchers assumed ATP model with integer time. However, in industry practices, integer times are very rare and the model developed using integer times is therefore approximating the real system. Various alternative models for an ATP system with time lags have been developed and evaluated. In most cases, these models have assumed that the time lags are integer multiples of a unit time grid. However, integer time lags are very rare in practices, and therefore models developed using integer time lags only approximate real systems. The differences occurring by this approximation frequently result in significant accuracy degradations. To introduce the ATP model with time lags, we first introduce the dynamic production function. Hackman and Leachman's dynamic production function in initiated research directly related to the topic of this paper. They propose a modeling framework for a system with non-integer time lags and show how to apply the framework to a variety of systems including continues time series, manufacturing resource planning and critical path method. Their formulation requires no additional variables or constraints and is capable of representing real world systems more accurately. Previously, to cope with non-integer time lags, they usually model a concerned system either by rounding lags to the nearest integers or by subdividing the time grid to make the lags become integer multiples of the grid. But each approach has a critical weakness: the first approach underestimates, potentially leading to infeasibilities or overestimates lead times, potentially resulting in excessive work-inprocesses. The second approach drastically inflates the problem size. We consider an optimized ATP system with non-integer time lag in supply chain management. We focus on a worldwide headquarter, distribution centers, and manufacturing facilities are globally networked. We develop a mixed integer programming(MIP) model for ATP process, which has the definition of required data flow. The illustrative ATP module shows the proposed system is largely affected inSCM. The system we are concerned is composed of a multiple production facility with multiple products, multiple distribution centers and multiple customers. For the system, we consider an ATP scheduling and capacity allocationproblem. In this study, we proposed the model for the ATP system in SCM using the dynamic production function considering the non-integer time lags. The model is developed under the framework suitable for the non-integer lags and, therefore, is more accurate than the models we usually encounter. We developed intelligent ATP System for this model using genetic algorithm. We focus on a capacitated production planning and capacity allocation problem, develop a mixed integer programming model, and propose an efficient heuristic procedure using an evolutionary system to solve it efficiently. This method makes it possible for the population to reach the approximate solution easily. Moreover, we designed and utilized a representation scheme that allows the proposed models to represent real variables. The proposed regeneration procedures, which evaluate each infeasible chromosome, makes the solutions converge to the optimum quickly.

Estimation of LOADEST coefficients according to watershed characteristics (유역특성에 따른 LOADEST 회귀모형 매개변수 추정)

  • Kim, Kyeung;Kang, Moon Seong;Song, Jung Hun;Park, Jihoon
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.2
    • /
    • pp.151-163
    • /
    • 2018
  • The objective of this study was to estimate LOADEST (LOAD Estimator) coefficients for simulating pollutant loads in ungauged watersheds. Regression models of LOADEST were used to simulate pollutant loads, and the multiple linear regression (MLR) was used for coefficients estimation on watershed characteristics. The fifth and third model of LOADEST were selected to simulate T-N (Total-Nitrogen) and T-P (Total-Phosphorous) loads, respectively. The results and statistics indicated that regression models based on LOADEST simulated pollutant loads reasonably and model coefficients were reliable. However, the results also indicated that LOADEST underestimated pollutant loads and had a bias. For this reason, simulated loads were corrected the bias by a quantile mapping method in this study. Corrected loads indicated that the bias correction was effective. Using multiple regression analysis, a coefficient estimation methods according to the watershed characteristic were developed. Coefficients which calculated by MLR were used in models. The simulated result and statistics indicated that MLR estimated the model coefficients reasonably. Regression models developed in this study would help simulate pollutant loads for ungauged watersheds and be a screen model for policy decision.

A Study on the Calculation of Productive Rate of Return (생산투자수익률 계산방법에 대한 연구)

  • Kim, Jin Wook;Kim, Kun-Woo;Kim, Seok Gon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.95-99
    • /
    • 2015
  • The IRR(internal rate of return) is often used by investors for the evaluation of engineering projects. Unfortunately, it has serial flaws: (1) multiple real-valued IRRs may arise; (2) complex-valued IRRs may arise; (3) the IRR is, in special cases, incompatible with the net present value (NPV) in accept/reject decisions. The efforts of management scientists and economists in providing a reliable project rate of return have generated over the decades an immense amount of contributions aiming to solve these shortcomings. Especially, multiple internal rate of returns (IRRs) have a fatal flaw when we decide to accep it or not. To solve it, some researchers came up with external rate of returns (ERRs) such as ARR (Average Rate of Return) or MIRR (MIRR, Modified Internal Rate of Return). ARR or MIRR. will also always yield the same decision for a engineering project consistent with the NPV criterion. The ERRs are to modify the procedure for computing the rate of return by making explicit and consistent assumptions about the interest rate at which intermediate receipts from projects may be invested. This reinvestment could be either in other projects or in the outside market. However, when we use traditional ERRs, a volume of capital investment is still unclear. Alternatively, the productive rate of return (PRR) can settle these problems. Generally, a rate of return is a profit on an investment over a period of time, expressed as a proportion of the original investment. The time period is typically the life of a project. The PRR is based on the full life of the engineering project. but has been annualised to project one year. And the PRR uses the effective investment instead of the original investment. This method requires that the cash flow of an engineering project must be separated into 'investment' and 'loss' to calculate the PRR value. In this paper, we proposed a tabulated form for easy calculation of the PRR by modifing the profit and loss statement, and the cash flow statement.

A Study on Smart City Project Evaluation System: Focusing on Case Analysis of IFEZ Smart City (스마트시티 프로젝트 평가체계에 대한 연구: IFEZ 스마트시티 사례분석을 중심으로)

  • Sang-Ho Lee;Hee-Yeon Jo;Yun-Hong Min
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.83-97
    • /
    • 2023
  • Project evaluation is the process of evaluating the progress and results of a project. Smart city projects can be divided into system components (infrastructure, services, platforms), or projects can run simultaneously for multiple services. In addition, services are developed and expanded through additional projects. In order to ensure that the smart city, which is composed of various projects, proceeds in accordance with the goals and strategies, periodic project evaluation is required during the project implementation process. The smart city project evaluation system proposed in this paper is designed to provide comprehensive and objective indicators by reflecting various factors that must be considered for projects occurring in all stages of planning, design, construction, and operation of smart cities. The indicators derived from the evaluation system can be used by decision makers to determine the direction of smart city project development. In addition, it is designed so that the performance of the project can be evaluated interim before the end of the project and the feedback obtained from it can be reflected. To introduce the application method of the smart city project evaluation system proposed in this study, the evaluation system developed in this study was applied to the smart city project case of Incheon Free Economic Zone (IFEZ). Based on the evaluation results, items that can maximize the improvement effect of each smart city project item were presented, and the direction of smart city project implementation was suggested. By utilizing a smart city project evaluation system that reflects the characteristics of smart city projects that are composed of multiple projects, comprehensive planning and management of smart city projects will be possible, and this study will serve as a reference for identifying priority improvement factors for projects.

A Method for Selecting AI Innovation Projects in the Enterprise: Case Study of HR part (기업의 혁신 프로젝트 선정을 위한 모폴로지-AHP-TOPSIS 모형: HR 분야 사례 연구)

  • Chung Doohee;Lee Jaeyun;Kim Taehee
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.5
    • /
    • pp.159-174
    • /
    • 2023
  • In this paper, we proposed a methodology to effectively determine the selection and prioritization of new business and innovation projects using AI technology. AI technology is a technology that can upgrade the business of companies in various industries and increase the added value of the entire industry. However, there are various constraints and difficulties in the decision-making process of selecting and implementing AI projects in the enterprise. In this paper, we propose a new methodology for prioritizing AI projects using Morphology, AHP, and TOPSIS. The proposed methodology helps prioritize AI projects by simultaneously considering the technical feasibility of AI technology and real-world user requirements. In this study, we applied the proposal methodology to a real enterprise that wanted to prioritize multiple AI projects in the HR field and evaluated the results. The results confirm the practical applicability of the methodology and suggest ways to use it to help companies make decisions about AI projects. The significance of the methodology proposed in this study is that it is a framework for prioritizing multiple AI projects considered by a company in the most reasonable way by considering both business and technical factors at the same time.

  • PDF

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Performance Evaluation and Offset Time Decision for Supporting Differential Multiple Services in Optical Burst Switched Networks (광 버스트 교환 망에서 차등적 다중 서비스 제공을 위한 offset 시간 결정 및 성능 평가)

  • So W.H.;im Y.C.K
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.1
    • /
    • pp.1-12
    • /
    • 2004
  • In this paper, we take advantage of the characteristics of optical burst switching (OBS) to support service-differentiation in optical networks. With the offset time between control packet and burst data, the proposed scheme uses different offset time of each service class. As contrasted with the Previous method, in which the high Priority service use only long offset time, it derives the burst loss rate as a QoS parameter in consideration of conservation law and given service-differential ratios and decides a reasonable offset time for this QoS finally Firstly proposed method classifies services into one of high or low class and is an algorithm deciding the offset time for supporting the required QoS of high class. In order to consider the multi-classes environment, we expand the analysis method of first algorithm and propose the second algorithm. It divides services into one of high or low group according to their burst loss rate and decides the offset time for high group, and lastly cumulates the offset time of each class. The proposed algorithms are evaluated through simulation. The result of simulation is compared with that of analysis to verify the proposed scheme.