Search | Korea Science

Design and evaluation of artificial intelligence models for abnormal data detection and prediction

Hae-Jong Joo;Ho-Bin Song
- Journal of Platform Technology
- /
- v.11 no.6
- /
- pp.3-12
- /
- 2023
In today's system operation, it is difficult to detect failures and take immediate action in the case of a shortage of manpower compared to the number of equipment or failures in vulnerable time zones, which can lead to delays in failure recovery. In addition, various algorithms exist to detect abnormal symptom data, and it is important to select an appropriate algorithm for each problem. In this paper, an ensemble-based isolation forest model was used to efficiently detect multivariate point anomalies that deviated from the mean distribution in the data set generated to predict system failure and minimize service interruption. And since significant changes in memory space usage are observed together with changes in CPU usage, the problem is solved by using LSTM-Auto Encoder for a collective anomaly in which another feature exhibits an abnormal pattern according to a change in one by comparing two or more features. did In addition, evaluation indicators are set for the performance evaluation of the model presented in this study, and then AI model evaluation is performed.
PDF

Selecting Optimal Algorithms for Stroke Prediction: Machine Learning-Based Approach

Kyung Tae CHOI;Kyung-A KIM;Myung-Ae CHUNG;Min Soo KANG
- Korean Journal of Artificial Intelligence
- /
- v.12 no.2
- /
- pp.1-7
- /
- 2024
In this paper, we compare three models (logistic regression, Random Forest, and XGBoost) for predicting stroke occurrence using data from the Korea National Health and Nutrition Examination Survey (KNHANES). We evaluated these models using various metrics, focusing mainly on recall and F1 score to assess their performance. Initially, the logistic regression model showed a satisfactory recall score among the three models; however, it was excluded from further consideration because it did not meet the F1 score threshold, which was set at a minimum of 0.5. The F1 score is crucial as it considers both precision and recall, providing a balanced measure of a model's accuracy. Among the models that met the criteria, XGBoost showed the highest recall rate and showed excellent performance in stroke prediction. In particular, XGBoost shows strong performance not only in recall, but also in F1 score and AUC, so it should be considered the optimal algorithm for predicting stroke occurrence. This study determines that the performance of XGBoost is optimal in the field of stroke prediction.
https://doi.org/10.24225/kjai.2024.12.2.1 인용 PDF

Estimation of ultimate bearing capacity of shallow foundations resting on cohesionless soils using a new hybrid M5'-GP model

Khorrami, Rouhollah;Derakhshani, Ali
- Geomechanics and Engineering
- /
- v.19 no.2
- /
- pp.127-139
- /
- 2019
Available methods to determine the ultimate bearing capacity of shallow foundations may not be accurate enough owing to the complicated failure mechanism and diversity of the underlying soils. Accordingly, applying new methods of artificial intelligence can improve the prediction of the ultimate bearing capacity. The M5' model tree and the genetic programming are two robust artificial intelligence methods used for prediction purposes. The model tree is able to categorize the data and present linear models while genetic programming can give nonlinear models. In this study, a combination of these methods, called the M5'-GP approach, is employed to predict the ultimate bearing capacity of the shallow foundations, so that the advantages of both methods are exploited, simultaneously. Factors governing the bearing capacity of the shallow foundations, including width of the foundation (B), embedment depth of the foundation (D), length of the foundation (L), effective unit weight of the soil (${\gamma}$) and internal friction angle of the soil (${\varphi}$) are considered for modeling. To develop the new model, experimental data of large and small-scale tests were collected from the literature. Evaluation of the new model by statistical indices reveals its better performance in contrast to both traditional and recent approaches. Moreover, sensitivity analysis of the proposed model indicates the significance of various predictors. Additionally, it is inferred that the new model compares favorably with different models presented by various researchers based on a comprehensive ranking system.
https://doi.org/10.12989/gae.2019.19.2.127 인용 KSCI

Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence (수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구)

Park, Jungsu
- Journal of Korean Society of Water and Wastewater
- /
- v.36 no.4
- /
- pp.239-248
- /
- 2022
The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.
https://doi.org/10.11001/jksww.2022.36.4.239 인용 PDF KSCI

Development of Prediction Model for Diabetes Using Machine Learning

Kim, Duck-Jin;Quan, Zhixuan
- Korean Journal of Artificial Intelligence
- /
- v.6 no.1
- /
- pp.16-20
- /
- 2018
The development of modern information technology has increased the amount of big data about patients' information and diseases. In this study, we developed a prediction model of diabetes using the health examination data provided by the public data portal in 2016. In addition, we graphically visualized diabetes incidence by sex, age, residence area, and income level. As a result, the incidence of diabetes was different in each residence area and income level, and the probability of accurately predicting male and female was about 65%. In addition, it can be confirmed that the influence of X on male and Y on female is highly to affect diabetes. This predictive model can be used to predict the high-risk patients and low-risk patients of diabetes and to alarm the serious patients, thereby dramatically improving the re-admission rate. Ultimately it will be possible to contribute to improve public health and reduce chronic disease management cost by continuous target selection and management.
https://doi.org/10.24225/kjai.2018.6.1.16 인용 PDF

Prediction of California bearing ratio (CBR) for coarse- and fine-grained soils using the GMDH-model

Mintae Kim;Seyma Ordu;Ozkan Arslan;Junyoung Ko
- Geomechanics and Engineering
- /
- v.33 no.2
- /
- pp.183-194
- /
- 2023
This study presents the prediction of the California bearing ratio (CBR) of coarse- and fine-grained soils using artificial intelligence technology. The group method of data handling (GMDH) algorithm, an artificial neural network-based model, was used in the prediction of the CBR values. In the design of the prediction models, various combinations of independent input variables for both coarse- and fine-grained soils have been used. The results obtained from the designed GMDH-type neural networks (GMDH-type NN) were compared with other regression models, such as linear, support vector, and multilayer perception regression methods. The performance of models was evaluated with a regression coefficient (R²), root-mean-square error (RMSE), and mean absolute error (MAE). The results showed that GMDH-type NN algorithm had higher performance than other regression methods in the prediction of CBR value for coarse- and fine-grained soils. The GMDH model had an R² of 0.938, RMSE of 1.87, and MAE of 1.48 for the input variables {G, S, and MDD} in coarse-grained soils. For fine-grained soils, it had an R² of 0.829, RMSE of 3.02, and MAE of 2.40, when using the input variables {LL, PI, MDD, and OMC}. The performance evaluations revealed that the GMDH-type NN models were effective in predicting CBR values of both coarse- and fine-grained soils.
https://doi.org/10.12989/gae.2023.33.2.183 인용

A Methodology for Bankruptcy Prediction in Imbalanced Datasets using eXplainable AI (데이터 불균형을 고려한 설명 가능한 인공지능 기반 기업부도예측 방법론 연구)

Heo, Sun-Woo;Baek, Dong Hyun
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.45 no.2
- /
- pp.65-76
- /
- 2022
Recently, not only traditional statistical techniques but also machine learning algorithms have been used to make more accurate bankruptcy predictions. But the insolvency rate of companies dealing with financial institutions is very low, resulting in a data imbalance problem. In particular, since data imbalance negatively affects the performance of artificial intelligence models, it is necessary to first perform the data imbalance process. In additional, as artificial intelligence algorithms are advanced for precise decision-making, regulatory pressure related to securing transparency of Artificial Intelligence models is gradually increasing, such as mandating the installation of explanation functions for Artificial Intelligence models. Therefore, this study aims to present guidelines for eXplainable Artificial Intelligence-based corporate bankruptcy prediction methodology applying SMOTE techniques and LIME algorithms to solve a data imbalance problem and model transparency problem in predicting corporate bankruptcy. The implications of this study are as follows. First, it was confirmed that SMOTE can effectively solve the data imbalance issue, a problem that can be easily overlooked in predicting corporate bankruptcy. Second, through the LIME algorithm, the basis for predicting bankruptcy of the machine learning model was visualized, and derive improvement priorities of financial variables that increase the possibility of bankruptcy of companies. Third, the scope of application of the algorithm in future research was expanded by confirming the possibility of using SMOTE and LIME through case application.
https://doi.org/10.11627/jksie.2022.45.2.065 인용 PDF KSCI

A Knowledge Integration Model for Corporate Dividend Prediction

Kim, Jin-Hwa;Won, Chae-Hwan;Bae, Jae-Kwon
- 한국경영정보학회:학술대회논문집
- /
- 2008.06a
- /
- pp.129-134
- /
- 2008
Dividend is one of essential factors determining the value of a firm. According to the valuation theory in finance, discounted cash flow (DCF) is the most popular and widely used method for the valuation of any asset. Since dividends play a key role in the pricing of a firm value by DCF, it is natural that the accurate prediction of future dividends should be most important work in the valuation. Although the dividend forecasting is of importance in the real world for the purpose of investment and financing decision, it is not easy for us to find good theoretical models which can predict future dividends accurately except Marsh and Merton (1987) model. Thus, if we can develop a better method than Marsh and Merton in the prediction of future dividends, it can contribute significantly to the enhancement of a firm value. Therefore, the most important goal of this study is to develop a better method than Marsh and Merton model by applying artificial intelligence techniques.
PDF

Optimizing Artificial Neural Network-Based Models to Predict Rice Blast Epidemics in Korea

Lee, Kyung-Tae;Han, Juhyeong;Kim, Kwang-Hyung
- The Plant Pathology Journal
- /
- v.38 no.4
- /
- pp.395-402
- /
- 2022
To predict rice blast, many machine learning methods have been proposed. As the quality and quantity of input data are essential for machine learning techniques, this study develops three artificial neural network (ANN)-based rice blast prediction models by combining two ANN models, the feed-forward neural network (FFNN) and long short-term memory, with diverse input datasets, and compares their performance. The Blast_Weathe long short-term memory r_FFNN model had the highest recall score (66.3%) for rice blast prediction. This model requires two types of input data: blast occurrence data for the last 3 years and weather data (daily maximum temperature, relative humidity, and precipitation) between January and July of the prediction year. This study showed that the performance of an ANN-based disease prediction model was improved by applying suitable machine learning techniques together with the optimization of hyperparameter tuning involving input data. Moreover, we highlight the importance of the systematic collection of long-term disease data.
https://doi.org/10.5423/PPJ.NT.04.2022.0062 인용 PDF KSCI

소비자 구매행동 예측을 위한 이질적인 모형들의 통합

Bae, Jae-Gwon;Kim, Jin-Hwa
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2007.11a
- /
- pp.489-498
- /
- 2007
For better predictions and classifications in customer recommendation, this study proposes an integrative model that efficiently combines the currently-in-use statistical and artificial intelligence models. In particular, by integrating the models such as Association Rule, Frequency Matrix, and Rule Induction, this study suggests an integrative prediction model. The data set for the tests is collected from a convenience store G, which is the number one in its brand in S. Korea. This data set contains sales information on customer transactions from September 1, 2005 to December 7, 2005. About 1,000 transactions are selected for a specific item. Using this data set, it suggests an integrated model predicting whether a customer buys or not buys a specific product for target marketing strategy. The performance of integrated model is compared with that of other models. The results from the experiments show that the performance of integrated model is superior to that of all other models such as Association Rule, Frequency Matrix, and Rule Induction.
PDF

Search Result 423, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)