• Title/Summary/Keyword: 유전자집합분석

Search Result 51, Processing Time 0.021 seconds

Optimized Bankruptcy Prediction through Combining SVM with Fuzzy Theory (퍼지이론과 SVM 결합을 통한 기업부도예측 최적화)

  • Choi, So-Yun;Ahn, Hyun-Chul
    • Journal of Digital Convergence
    • /
    • v.13 no.3
    • /
    • pp.155-165
    • /
    • 2015
  • Bankruptcy prediction has been one of the important research topics in finance since 1960s. In Korea, it has gotten attention from researchers since IMF crisis in 1998. This study aims at proposing a novel model for better bankruptcy prediction by converging three techniques - support vector machine(SVM), fuzzy theory, and genetic algorithm(GA). Our convergence model is basically based on SVM, a classification algorithm enables to predict accurately and to avoid overfitting. It also incorporates fuzzy theory to extend the dimensions of the input variables, and GA to optimize the controlling parameters and feature subset selection. To validate the usefulness of the proposed model, we applied it to H Bank's non-external auditing companies' data. We also experimented six comparative models to validate the superiority of the proposed model. As a result, our model was found to show the best prediction accuracy among the models. Our study is expected to contribute to the relevant literature and practitioners on bankruptcy prediction.

Combined Application of Data Imbalance Reduction Techniques Using Genetic Algorithm (유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용)

  • Jang, Young-Sik;Kim, Jong-Woo;Hur, Joon
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.3
    • /
    • pp.133-154
    • /
    • 2008
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. In order to solve the data imbalance problem, there has been proposed a number of techniques based on re-sampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

Development of Bridge Management System for Next Generation based on Life-Cycle Cost and Performance (생애주기 비용 및 성능을 고려한 차세대 교량 유지관리기법 개발)

  • Park, Kyung-Hoon
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • 2007.11a
    • /
    • pp.167-174
    • /
    • 2007
  • This study proposes a practical and realistic method to establish an optimal lifetime maintenance strategy for deteriorating bridges by considering the life-cycle performance as well as the life-cycle cost. The proposed method offers a set of optimal tradeoff maintenance scenarios among other conflicting objectives, such as minimizing cost and maximizing performance. A genetic algorithm is used to generate a set of maintenance scenarios that is a multi-objective combinatorial optimization problem related to the and the life-cycle cost and performance as separate objective functions. A computer program, which generates optimal maintenance scenarios, was developed based on the proposed method. The subordinate relation between bridge members has been considered to decide optimal maintenance sequence. The developed program has been used to present a procedure for finding an optimal maintenance scenario for steel-girder bridges on the Korean National Road. Through this bridge maintenance scenario analysis, it is expected that the developed method and program can be effectively used to allow bridge managers an optimal maintenance strategy satisfying various constraints and requirements.

  • PDF

An Optimized Combination of π-fuzzy Logic and Support Vector Machine for Stock Market Prediction (주식 시장 예측을 위한 π-퍼지 논리와 SVM의 최적 결합)

  • Dao, Tuanhung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.43-58
    • /
    • 2014
  • As the use of trading systems has increased rapidly, many researchers have become interested in developing effective stock market prediction models using artificial intelligence techniques. Stock market prediction involves multifaceted interactions between market-controlling factors and unknown random processes. A successful stock prediction model achieves the most accurate result from minimum input data with the least complex model. In this research, we develop a combination model of ${\pi}$-fuzzy logic and support vector machine (SVM) models, using a genetic algorithm to optimize the parameters of the SVM and ${\pi}$-fuzzy functions, as well as feature subset selection to improve the performance of stock market prediction. To evaluate the performance of our proposed model, we compare the performance of our model to other comparative models, including the logistic regression, multiple discriminant analysis, classification and regression tree, artificial neural network, SVM, and fuzzy SVM models, with the same data. The results show that our model outperforms all other comparative models in prediction accuracy as well as return on investment.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Exploring Cancer-Specific microRNA-mRNA Interactions by Evolutionary Layered Hypernetwork Models (진화연산 기반 계층적 하이퍼네트워크 모델에 의한 암 특이적 microRNA-mRNA 상호작용 탐색)

  • Kim, Soo-Jin;Ha, Jung-Woo;Zhang, Byoung-Tak
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.10
    • /
    • pp.980-984
    • /
    • 2010
  • Exploring microRNA (miRNA) and mRNA regulatory interactions may give new insights into diverse biological phenomena. Recently, miRNAs have been discovered as important regulators that play a major role in various cellular processes. Therefore, it is essential to identify functional interactions between miRNAs and mRNAs for understanding the context- dependent activities of miRNAs in complex biological systems. While elucidating complex miRNA-mRNA interactions has been studied with experimental and computational approaches, it is still difficult to infer miRNA-mRNA regulatory modules. Here we present a novel method, termed layered hypernetworks (LHNs), for identifying functional miRNA-mRNA interactions from heterogeneous expression data. In experiments, we apply the LHN model to miRNA and mRNA expression profiles on multiple cancers. The proposed method identifies cancer-specific miRNA-mRNA interactions. We show the biological significance of the discovered miRNA- mRNA interactions.

The Model to Generate Optimum Maintenance Scenario for Steel Bridges considering Life-Cycle Cost and Performance (강교량의 최적 유지관리 시나리오 선정 모델)

  • Park, Kyung Hoon;Lee, Sang Yoon;Kim, Jung Ho;Cho, Hyo Nam;Kong, Jung Sik
    • Journal of Korean Society of Steel Construction
    • /
    • v.18 no.6
    • /
    • pp.677-686
    • /
    • 2006
  • In this paper, a more practical and realistic method is proposed to establish the lifetime optimum maintenance strategies of the deteriorating bridges considering the life-cycle performance as well as life-cycle cost. The genetic algorithm is applied to generate the set of maintenance scenarios that is the multi-objective combinatorial optimization problem related to lifetime performance and cost as separate objective functions, and the technique to select optimum tradeoff maintenance scenario is presented. Optimum maintenance scenarios could be generated not only at the individual member level but also at the system level of the bridge. Through the analytical results of applying the proposed methodology to the existing bridge, it is expected that the methodology will be effectively used to determine the optimum maintenance strategy for introducing a real preventive maintenance system and overcoming the limits of existing maintenance methods.

Re-understanding of Technoscience and Nature through Actor-Network Theory (행위자-연결망 이론을 통한 과학과 자연의 재해석)

  • Kim, Sook-Jin
    • Journal of the Korean Geographical Society
    • /
    • v.45 no.4
    • /
    • pp.461-477
    • /
    • 2010
  • Recent environmental issues such as genetically modified organisms, the loss of biodiversity, climate change, and nuclear waste cannot be reduced to a matter of science or society and explained through nature-society dualist approaches because of their complexity and heterogeneity. This paper examines how nature-society dualism has been embedded in science studies and geography and how this dualism can be overcome. Actor-Network Theory as an attempt to overcome this nature-society dualism is appropriate in analysing "strange imbriglio" of biology, politics, technoscience, market, value, ethics and facts that constitute our society by focusing on heterogeneous association, and can contribute to providing a useful framework to solve environmental problems.

A Method for the Classification of Water Pollutants using Machine Learning Model with Swimming Activities Videos of Caenorhabditis elegans (예쁜꼬마선충의 수영 행동 영상과 기계학습 모델을 이용한 수질 오염 물질 구분 방법)

  • Kang, Seung-Ho;Jeong, In-Seon;Lim, Hyeong-Seok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.7
    • /
    • pp.903-909
    • /
    • 2021
  • Caenorhabditis elegans whose DNA sequence was completely identified is a representative species used in various research fields such as gene functional analysis and animal behavioral research. In the mean time, many researches on the bio-monitoring system to determine whether water is contaminated or not by using the swimming activities of nematodes. In this paper, we show the possibility of using the swimming activities of C. elegans in the development of a machine learning based bio-monitoring system which identifies chemicals that cause water pollution. To characterize swimming activities of nematode, BLS entropy is computed for the nematode in a frame. And, BLS entropy profile, an assembly of entropies, are classified into several patterns using clustering algorithms. Finally these patterns are used to construct data sets. We recorded images of swimming behavior of nematodes in the arenas in which formaldehyde, benzene and toluene were added at a concentration of 0.1 ppm, respectively, and evaluate the performance of the developed HMM.

Identification of Salted Opossum Shrimp Using COI-based Restriction Fragment Length Polymorphism (COI 기반 제한효소 절편 길이 다형성(RFLP)을 이용한 새우젓 분석)

  • Park, Ju Hyeon;Moon, Soo Young;Kang, Ji Hye;Jung, Myoung Hwa;Kim, Sang Jo;Choi, Hee Jung
    • Journal of Life Science
    • /
    • v.31 no.1
    • /
    • pp.66-72
    • /
    • 2021
  • This study developed a species identification method for the salted opossum shrimp of Acetes japonicus, A. chinensis (Korea, China), A. indicus (I, II), and Palaemon gravieri based on PCR-RFLP markers. Genomic DNA was extracted from the salted opossum shrimp. The COI gene was used to amplify 519 base pairs (bp) using specific primers. The amplified products were digested by Acc I and Hinf I, and the DNA fragments were separated by automated electrophoresis for RFLP analysis. When the amplified DNA product (519 bp) was digested with Acc I, A. japonicus, A. chinensis (Korea), and A. indius (II) showed two fragments, whereas a single band of 519 bp was detected in A. chinensis (China) and A. indius (I). Also, in the RFLP patterns digested by Hinf I, A. chinensis (Korea) and A. chinensis (China) showed a single band of 519 bp, while two fragments were observed in A. japonicus and A. indius (I) and four fragments in A. indius (II). The PCR amplicon of P. gravieri was digested by Acc I into 3 bands of 271, 202, and 46 bp and by Hinf I into a single band of 519 bp. Therefore, salted opossum shrimp-specific RFLP markers showing distinct differences between four species and two sub-species by PCR-RFLP analysis. Thus, the PCR-RFLP markers developed in this study are a good method for identifying the six types of salted opossum shrimp.