• Title/Summary/Keyword: Artificial neural Networks (ANN)

Search Result 375, Processing Time 0.028 seconds

Machine Learning-based Classification of Hyperspectral Imagery

  • Haq, Mohd Anul;Rehman, Ziaur;Ahmed, Ahsan;Khan, Mohd Abdul Rahim
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.193-202
    • /
    • 2022
  • The classification of hyperspectral imagery (HSI) is essential in the surface of earth observation. Due to the continuous large number of bands, HSI data provide rich information about the object of study; however, it suffers from the curse of dimensionality. Dimensionality reduction is an essential aspect of Machine learning classification. The algorithms based on feature extraction can overcome the data dimensionality issue, thereby allowing the classifiers to utilize comprehensive models to reduce computational costs. This paper assesses and compares two HSI classification techniques. The first is based on the Joint Spatial-Spectral Stacked Autoencoder (JSSSA) method, the second is based on a shallow Artificial Neural Network (SNN), and the third is used the SVM model. The performance of the JSSSA technique is better than the SNN classification technique based on the overall accuracy and Kappa coefficient values. We observed that the JSSSA based method surpasses the SNN technique with an overall accuracy of 96.13% and Kappa coefficient value of 0.95. SNN also achieved a good accuracy of 92.40% and a Kappa coefficient value of 0.90, and SVM achieved an accuracy of 82.87%. The current study suggests that both JSSSA and SNN based techniques prove to be efficient methods for hyperspectral classification of snow features. This work classified the labeled/ground-truth datasets of snow in multiple classes. The labeled/ground-truth data can be valuable for applying deep neural networks such as CNN, hybrid CNN, RNN for glaciology, and snow-related hazard applications.

Integrating UAV Remote Sensing with GIS for Predicting Rice Grain Protein

  • Sarkar, Tapash Kumar;Ryu, Chan-Seok;Kang, Ye-Seong;Kim, Seong-Heon;Jeon, Sae-Rom;Jang, Si-Hyeong;Park, Jun-Woo;Kim, Suk-Gu;Kim, Hyun-Jin
    • Journal of Biosystems Engineering
    • /
    • v.43 no.2
    • /
    • pp.148-159
    • /
    • 2018
  • Purpose: Unmanned air vehicle (UAV) remote sensing was applied to test various vegetation indices and make prediction models of protein content of rice for monitoring grain quality and proper management practice. Methods: Image acquisition was carried out by using NIR (Green, Red, NIR), RGB and RE (Blue, Green, Red-edge) camera mounted on UAV. Sampling was done synchronously at the geo-referenced points and GPS locations were recorded. Paddy samples were air-dried to 15% moisture content, and then dehulled and milled to 92% milling yield and measured the protein content by near-infrared spectroscopy. Results: Artificial neural network showed the better performance with $R^2$ (coefficient of determination) of 0.740, NSE (Nash-Sutcliffe model efficiency coefficient) of 0.733 and RMSE (root mean square error) of 0.187% considering all 54 samples than the models developed by PR (polynomial regression), SLR (simple linear regression), and PLSR (partial least square regression). PLSR calibration models showed almost similar result with PR as 0.663 ($R^2$) and 0.169% (RMSE) for cloud-free samples and 0.491 ($R^2$) and 0.217% (RMSE) for cloud-shadowed samples. However, the validation models performed poorly. This study revealed that there is a highly significant correlation between NDVI (normalized difference vegetation index) and protein content in rice. For the cloud-free samples, the SLR models showed $R^2=0.553$ and RMSE = 0.210%, and for cloud-shadowed samples showed 0.479 as $R^2$ and 0.225% as RMSE respectively. Conclusion: There is a significant correlation between spectral bands and grain protein content. Artificial neural networks have the strong advantages to fit the nonlinear problem when a sigmoid activation function is used in the hidden layer. Quantitatively, the neural network model obtained a higher precision result with a mean absolute relative error (MARE) of 2.18% and root mean square error (RMSE) of 0.187%.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Development of new artificial neural network optimizer to improve water quality index prediction performance (수질 지수 예측성능 향상을 위한 새로운 인공신경망 옵티마이저의 개발)

  • Ryu, Yong Min;Kim, Young Nam;Lee, Dae Won;Lee, Eui Hoon
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.2
    • /
    • pp.73-85
    • /
    • 2024
  • Predicting water quality of rivers and reservoirs is necessary for the management of water resources. Artificial Neural Networks (ANNs) have been used in many studies to predict water quality with high accuracy. Previous studies have used Gradient Descent (GD)-based optimizers as an optimizer, an operator of ANN that searches parameters. However, GD-based optimizers have the disadvantages of the possibility of local optimal convergence and absence of a solution storage and comparison structure. This study developed improved optimizers to overcome the disadvantages of GD-based optimizers. Proposed optimizers are optimizers that combine adaptive moments (Adam) and Nesterov-accelerated adaptive moments (Nadam), which have low learning errors among GD-based optimizers, with Harmony Search (HS) or Novel Self-adaptive Harmony Search (NSHS). To evaluate the performance of Long Short-Term Memory (LSTM) using improved optimizers, the water quality data from the Dasan water quality monitoring station were used for training and prediction. Comparing the learning results, Mean Squared Error (MSE) of LSTM using Nadam combined with NSHS (NadamNSHS) was the lowest at 0.002921. In addition, the prediction rankings according to MSE and R2 for the four water quality indices for each optimizer were compared. Comparing the average of ranking for each optimizer, it was confirmed that LSTM using NadamNSHS was the highest at 2.25.

A Study on Real-Time Walking Action Control of Biped Robot with Twenty Six Joints Based on Voice Command (음성명령기반 26관절 보행로봇 실시간 작업동작제어에 관한 연구)

  • Jo, Sang Young;Kim, Min Sung;Yang, Jun Suk;Koo, Young Mok;Jung, Yang Geun;Han, Sung Hyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.4
    • /
    • pp.293-300
    • /
    • 2016
  • The Voice recognition is one of convenient methods to communicate between human and robots. This study proposes a speech recognition method using speech recognizers based on Hidden Markov Model (HMM) with a combination of techniques to enhance a biped robot control. In the past, Artificial Neural Networks (ANN) and Dynamic Time Wrapping (DTW) were used, however, currently they are less commonly applied to speech recognition systems. This Research confirms that the HMM, an accepted high-performance technique, can be successfully employed to model speech signals. High recognition accuracy can be obtained by using HMMs. Apart from speech modeling techniques, multiple feature extraction methods have been studied to find speech stresses caused by emotions and the environment to improve speech recognition rates. The procedure consisted of 2 parts: one is recognizing robot commands using multiple HMM recognizers, and the other is sending recognized commands to control a robot. In this paper, a practical voice recognition system which can recognize a lot of task commands is proposed. The proposed system consists of a general purpose microprocessor and a useful voice recognition processor which can recognize a limited number of voice patterns. By simulation and experiment, it was illustrated the reliability of voice recognition rates for application of the manufacturing process.

Feasibility Evaluation of High-Tech New Product Development Projects Using Support Vector Machines

  • Shin, Teak-Soo;Noh, Jeon-Pyo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.11a
    • /
    • pp.241-250
    • /
    • 2005
  • New product development (NPD) is defined as the transformation of a market opportunity and a set of assumptions about product technology into a product available for sale. Managers charged with project selection decisions in the NPD process, such as go/no-go choices and specific resource allocation decisions, are faced with a complicated problem. Therefore, the ability to develop new successful products has identifies as a major determinant in sustaining a firm's competitive advantage. The purpose of this study is to develop a new evaluation model for NPD project selection in the high -tech industry using support vector machines (SYM). The evaluation model is developed through two phases. In the first phase, binary (go/no-go) classification prediction model, i.e. SVM for high-tech NPD project selection is developed. In the second phase. using the predicted output value of SVM, feasibility grade is calculated for the final NPD project decision making. In this study, the feasibility grades are also divided as three level grades. We assume that the frequency of NPD project cases is symmetrically determined according to the feasibility grades and misclassification errors are partially minimized by the multiple grades. However, the horizon of grade level can be changed by firms' NPD strategy. Our proposed feasibility grade method is more reasonable in NPD decision problems by considering particularly risk factor of NPD in viewpoints of future NPD success probability. In our empirical study using Korean NPD cases, the SVM significantly outperformed ANN and logistic regression as benchmark models in hit ratio. And the feasibility grades generated from the predicted output value of SVM showed that they can offer a useful guideline for NPD project selection.

  • PDF

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.