• Title/Summary/Keyword: Decision support techniques

Search Result 220, Processing Time 0.024 seconds

Data-driven Model Prediction of Harmful Cyanobacterial Blooms in the Nakdong River in Response to Increased Temperatures Under Climate Change Scenarios (기후변화 시나리오의 기온상승에 따른 낙동강 남세균 발생 예측을 위한 데이터 기반 모델 시뮬레이션)

  • Gayeon Jang;Minkyoung Jo;Jayun Kim;Sangjun Kim;Himchan Park;Joonhong Park
    • Journal of Korean Society on Water Environment
    • /
    • v.40 no.3
    • /
    • pp.121-129
    • /
    • 2024
  • Harmful cyanobacterial blooms (HCBs) are caused by the rapid proliferation of cyanobacteria and are believed to be exacerbated by climate change. However, the extent to which HCBs will be stimulated in the future due to increased temperature remains uncertain. This study aims to predict the future occurrence of cyanobacteria in the Nakdong River, which has the highest incidence of HCBs in South Korea, based on temperature rise scenarios. Representative Concentration Pathways (RCPs) were used as the basis for these scenarios. Data-driven model simulations were conducted, and out of the four machine learning techniques tested (multiple linear regression, support vector regressor, decision tree, and random forest), the random forest model was selected for its relatively high prediction accuracy. The random forest model was used to predict the occurrence of cyanobacteria. The results of boxplot and time-series analyses showed that under the worst-case scenario (RCP8.5 (2100)), where temperature increases significantly, cyanobacterial abundance across all study areas was greatly stimulated. The study also found that the frequencies of HCB occurrences exceeding certain thresholds (100,000 and 1,000,000 cells/mL) increased under both the best-case scenario (RCP2.6 (2050)) and worst-case scenario (RCP8.5 (2100)). These findings suggest that the frequency of HCB occurrences surpassing a certain threshold level can serve as a useful diagnostic indicator of vulnerability to temperature increases caused by climate change. Additionally, this study highlights that water bodies currently susceptible to HCBs are likely to become even more vulnerable with climate change compared to those that are currently less susceptible.

Performance Evaluation of Machine Learning Model for Seismic Response Prediction of Nuclear Power Plant Structures considering Aging deterioration (원전 구조물의 경년열화를 고려한 지진응답예측 기계학습 모델의 성능평가)

  • Kim, Hyun-Su;Kim, Yukyung;Lee, So Yeon;Jang, Jun Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.24 no.3
    • /
    • pp.43-51
    • /
    • 2024
  • Dynamic responses of nuclear power plant structure subjected to earthquake loads should be carefully investigated for safety. Because nuclear power plant structure are usually constructed by material of reinforced concrete, the aging deterioration of R.C. have no small effect on structural behavior of nuclear power plant structure. Therefore, aging deterioration of R.C. nuclear power plant structure should be considered for exact prediction of seismic responses of the structure. In this study, a machine learning model for seismic response prediction of nuclear power plant structure was developed by considering aging deterioration. The OPR-1000 was selected as an example structure for numerical simulation. The OPR-1000 was originally designated as the Korean Standard Nuclear Power Plant (KSNP), and was re-designated as the OPR-1000 in 2005 for foreign sales. 500 artificial ground motions were generated based on site characteristics of Korea. Elastic modulus, damping ratio, poisson's ratio and density were selected to consider material property variation due to aging deterioration. Six machine learning algorithms such as, Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN), eXtreme Gradient Boosting (XGBoost), were used t o construct seispic response prediction model. 13 intensity measures and 4 material properties were used input parameters of the training database. Performance evaluation was performed using metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analysis results show that neural networks present good prediction performance considering aging deterioration.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Risk Assessment of Pine Tree Dieback in Sogwang-Ri, Uljin (울진 소광리 금강소나무 고사발생 특성 분석 및 위험지역 평가)

  • Kim, Eun-Sook;Lee, Bora;Kim, Jaebeom;Cho, Nanghyun;Lim, Jong-Hwan
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.3
    • /
    • pp.259-270
    • /
    • 2020
  • Extreme weather events, such as heat and drought, have occurred frequently over the past two decades. This has led to continuous reports of cases of forest damage due to physiological stress, not pest damage. In 2014, pine trees were collectively damaged in the forest genetic resources reserve of Sogwang-ri, Uljin, South Korea. An investigation was launched to determine the causes of the dieback, so that a forest management plan could be prepared to deal with the current dieback, and to prevent future damage. This study aimedto 1) understand the topographic and structural characteristics of the area which experienced pine tree dieback, 2) identify the main causes of the dieback, and 3) predict future risk areas through the use of machine-learning techniques. A model for identifying risk areas was developed using 14 explanatory variables, including location, elevation, slope, and age class. When three machine-learning techniques-Decision Tree, Random Forest (RF), and Support Vector Machine (SVM) were applied to the model, RF and SVM showed higher predictability scores, with accuracies over 93%. Our analysis of the variable set showed that the topographical areas most vulnerable to pine dieback were those with high altitudes, high daily solar radiation, and limited water availability. We also found that, when it came to forest stand characteristics, pine trees with high vertical stand densities (5-15 m high) and higher age classes experienced a higher risk of dieback. The RF and SVM models predicted that 9.5% or 115 ha of the Geumgang Pine Forest are at high risk for pine dieback. Our study suggests the need for further investigation into the vulnerable areas of the Geumgang Pine Forest, and also for climate change adaptive forest management steps to protect those areas which remain undamaged.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Design of Web-GIS based SWG Simulator for Disseminating Integrated Water Information (통합 물정보 제공을 위한 웹 GIS 기반의 SWG 시뮬레이터 설계)

  • Park, Yonggil;Kim, Kyehyun;Lee, Sungjoo;Yoo, Jaehyun
    • Spatial Information Research
    • /
    • v.23 no.1
    • /
    • pp.19-31
    • /
    • 2015
  • Due to the global warming and unstable abnormal climate changes, water resources differences between regions and water shortage are occurring. Therefore, the water resources management is becoming more important for the stable securement of future water supply and demand. Researches on Smart Water Grid (SWG), which is considered as a new method, that can stably secure and maintain the water resources, are actively being conducted but it is still in infancy. Thus, this study aimed to design SWG simulator based on GIS in order to provide integrated water information in web environment. The user's requirements were analyzed for system development and important functions such as SWG current situation checking, future prediction, filtration plant situation checking functions were designed and data expression techniques using GIS and HTML5 were applied to enhance the understanding of the users. Also, when the emergency situations occurred, the solving process of the situations are reproduced to check the solution process using scenario reproduction functions. Use-case, class, sequence diagram, which are a design for real system development and defines the system usage contents of users, were written, and the story board was written to check the final development contents. This study designed a SWG simulator in order to support the water maintenance reacting to climate changes. The development of system is expected to help securing information to deal with emergency situations such as water shortage and help the decision maker to make decision through reproduction of scenario. The major functions were designed for the convenience of water resource manager and producer but new contents for consumers must be developed to enable duplex information transmission.

Development of newly recruited privates on-the-job Training Achievements Group Classification Model (신병 주특기교육 성취집단 예측모형 개발)

  • Kwak, Ki-Hyo;Suh, Yong-Moo
    • Journal of the military operations research society of Korea
    • /
    • v.33 no.2
    • /
    • pp.101-113
    • /
    • 2007
  • The period of military personnel service will be phased down by 2014 according to 'The law of National Defense Reformation' issued by the Ministry of National Defense. For this reason, the ROK army provides discrimination education to 'newly recruited privates' for more effective individual performance in the on-the-job training. For the training to be more effective, it would be essential to predict the degree of achievements by new privates in the training. Thus, we used data mining techniques to develop a classification model which classifies the new privates into one of two achievements groups, so that different skills of education are applied to each group. The target variable for this model is a binary variable, whose value can be either 'a group of general control' or 'a group of special control'. We developed four pure classification models using Neural Network, Decision Tree, Support Vector Machine and Naive Bayesian. We also built four hybrid models, each of which combines k-means clustering algorithm with one of these four mining technique. Experimental results demonstrated that the highest performance model was the hybrid model of k-means and Neural Network. We expect that various military education programs could be supported by these classification models for better educational performance.

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (온라인 연관관계 분석의 장바구니 기준에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu
    • CRM연구
    • /
    • v.4 no.2
    • /
    • pp.19-29
    • /
    • 2011
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems.

  • PDF

Development Direction of Maritime Manned-Unmanned Systems through Measurement of Combat Effectiveness against Major Threats on Sea Lines of Communication (해상교통로 상 주요 위협별 전투 효과 측정을 통한 해양 유·무인 복합체계 발전방향)

  • Yong-Hoon Kim;Yonghoon Ha
    • Journal of Industrial Convergence
    • /
    • v.21 no.11
    • /
    • pp.29-41
    • /
    • 2023
  • In this study, assuming that the maritime manned-unmanned systems, which will be used as the main force of the ROK Navy in the future, conducts its sea line of communication(SLOC) protection operations, the combat effectiveness against major threats was measured, and through this, the development direction of the manned-unmanned systems was suggested. Multi-criteria decision-making techniques such as Delphi and AHP were used to measure combat effectiveness, and the AHP survey was conducted on 40 naval officers, including 25 senior officers who are well-understood in the combat effectiveness of the weapons system and MUM-T. As an evaluation index for measuring combat effectiveness, the OODA loop was set as the main attribute, followed by Observe(0.358), Orient(0.315), Act(0.217), and Decide(0.110). The combat effectiveness of each major threat in SLOC, the lowest alternative, was measured to be 1.68 times higher than the response to maritime conflicts in neighboring countries and 3.61 times higher than the response to transnational threats. These results are expected to support rational decision-making in determining the level of technology required for acquisition of marine manned-unmanned systems and establishing operational plans for naval forces.