• Title/Summary/Keyword: Ensemble prediction

Search Result 372, Processing Time 0.035 seconds

Prediction of ultimate shear strength and failure modes of R/C ledge beams using machine learning framework

  • Ahmed M. Yousef;Karim Abd El-Hady;Mohamed E. El-Madawy
    • Structural Monitoring and Maintenance
    • /
    • v.9 no.4
    • /
    • pp.337-357
    • /
    • 2022
  • The objective of this study is to present a data-driven machine learning (ML) framework for predicting ultimate shear strength and failure modes of reinforced concrete ledge beams. Experimental tests were collected on these beams with different loading, geometric and material properties. The database was analyzed using different ML algorithms including decision trees, discriminant analysis, support vector machine, logistic regression, nearest neighbors, naïve bayes, ensemble and artificial neural networks to identify the governing and critical parameters of reinforced concrete ledge beams. The results showed that ML framework can effectively identify the failure mode of these beams either web shear failure, flexural failure or ledge failure. ML framework can also derive equations for predicting the ultimate shear strength for each failure mode. A comparison of the ultimate shear strength of ledge failure was conducted between the experimental results and the results from the proposed equations and the design equations used by international codes. These comparisons indicated that the proposed ML equations predict the ultimate shear strength of reinforced concrete ledge beams better than the design equations of AASHTO LRFD-2020 or PCI-2020.

Development of Rainfall Ensemble Prediction Model based on Radar Rainfall (레이더 강우량 기반 강우앙상블 예측모형 개발)

  • Kim, Ho-Jun;Uranchimeg, Sumiya;Ryou, Minsuk;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.276-276
    • /
    • 2021
  • 최근 댐과 같은 수공구조물의 건설로 대규모 홍수피해는 급격히 줄어들었지만, 돌발홍수(flash flood)로 인한 저지대 침수 등의 도시홍수 발생빈도가 급증하고 있다. 2020년에는 최장의 장마가 관측되었으며, 전국적으로 홍수로 인한 침수피해가 발생하였다. 홍수에 선제적으로 대응하기 위해서 신뢰성 있는 홍수예·경보가 필요하며, 이를 위해서는 신속하고 정확성있는 강우예측이 선행되어야 한다. 이에 본 연구에서는 초단기 강우예측을 목적으로 둔 레이더 기반의 강우앙상블 예측모형을 개발하였다. 라그랑지안 지속성(Lagrangian persistence)을 기반으로 개발하였으며, 강우장의 이동 패턴은 이류특성을 활용해 추정하였다. 즉, 강우장의 예측정확도를 향상시키기 위해 공간적 규모별 캐스캐이드(cascade) 방법으로 분리해 이동 경로를 추정하였다. 예측시간에 따른 강우량은 각 캐스캐이드에 자기회귀모형을 적용하였다. 레이더 강우량은 2016-2020년 사이에 발생한 강우사상에 대한 환경부 홍수통제소에서 제공한 레이더 합성장을 이용하였다. 예측강우량에 대한 평가는 RMSE, Pearson's Correlation, FSS 등 통계치를 통해 수행하였다. 본 연구에서 소개된 강우예측 모형은 초단기 홍수예측에 정확도 높은 강우 정보를 제공할 수 있으며, 이에 따라 홍수피해를 저감하는데 도움이 될 것으로 판단된다.

  • PDF

Automated Phase Identification in Shingle Installation Operation Using Machine Learning

  • Dutta, Amrita;Breloff, Scott P.;Dai, Fei;Sinsel, Erik W.;Warren, Christopher M.;Wu, John Z.
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.728-735
    • /
    • 2022
  • Roofers get exposed to increased risk of knee musculoskeletal disorders (MSDs) at different phases of a sloped shingle installation task. As different phases are associated with different risk levels, this study explored the application of machine learning for automated classification of seven phases in a shingle installation task using knee kinematics and roof slope information. An optical motion capture system was used to collect knee kinematics data from nine subjects who mimicked shingle installation on a slope-adjustable wooden platform. Four features were used in building a phase classification model. They were three knee joint rotation angles (i.e., flexion, abduction-adduction, and internal-external rotation) of the subjects, and the roof slope at which they operated. Three ensemble machine learning algorithms (i.e., random forests, decision trees, and k-nearest neighbors) were used for training and prediction. The simulations indicate that the k-nearest neighbor classifier provided the best performance, with an overall accuracy of 92.62%, demonstrating the considerable potential of machine learning methods in detecting shingle installation phases from workers knee joint rotation and roof slope information. This knowledge, with further investigation, may facilitate knee MSD risk identification among roofers and intervention development.

  • PDF

Prediction on Busan's Gross Product and Employment of Major Industry with Logistic Regression and Machine Learning Model (로지스틱 회귀모형과 머신러닝 모형을 활용한 주요산업의 부산 지역총생산 및 고용 효과 예측)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.47 no.2
    • /
    • pp.69-88
    • /
    • 2022
  • This paper aims to predict Busan's regional product and employment using the logistic regression models and machine learning models. The following are the main findings of the empirical analysis. First, the OLS regression model shows that the main industries such as electricity and electronics, machine and transport, and finance and insurance affect the Busan's income positively. Second, the binomial logistic regression models show that the Busan's strategic industries such as the future transport machinery, life-care, and smart marine industries contribute on the Busan's income in large order. Third, the multinomial logistic regression models show that the Korea's main industries such as the precise machinery, transport equipment, and machinery influence the Busan's economy positively. And Korea's exports and the depreciation can affect Busan's economy more positively at the higher employment level. Fourth, the voting ensemble model show the higher predictive power than artificial neural network model and support vector machine models. Furthermore, the gradient boosting model and the random forest show the higher predictive power than the voting model in large order.

A Prediction Model for the Development of Cataract Using Random Forests (Random Forests 기법을 이용한 백내장 예측모형 - 일개 대학병원 건강검진 수검자료에서 -)

  • Han, Eun-Jeong;Song, Ki-Jun;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.771-780
    • /
    • 2009
  • Cataract is the main cause of blindness and visual impairment, especially, age-related cataract accounts for about half of the 32 million cases of blindness worldwide. As the life expectancy and the expansion of the elderly population are increasing, the cases of cataract increase as well, which causes a serious economic and social problem throughout the country. However, the incidence of cataract can be reduced dramatically through early diagnosis and prevention. In this study, we developed a prediction model of cataracts for early diagnosis using hospital data of 3,237 subjects who received the screening test first and then later visited medical center for cataract check-ups cataract between 1994 and 2005. To develop the prediction model, we used random forests and compared the predictive performance of this model with other common discriminant models such as logistic regression, discriminant model, decision tree, naive Bayes, and two popular ensemble model, bagging and arcing. The accuracy of random forests was 67.16%, sensitivity was 72.28%, and main factors included in this model were age, diabetes, WBC, platelet, triglyceride, BMI and so on. The results showed that it could predict about 70% of cataract existence by screening test without any information from direct eye examination by ophthalmologist. We expect that our model may contribute to diagnose cataract and help preventing cataract in early stages.

Development of Predictive Model for Length of Stay(LOS) in Acute Stroke Patients using Artificial Intelligence (인공지능을 이용한 급성 뇌졸중 환자의 재원일수 예측모형 개발)

  • Choi, Byung Kwan;Ham, Seung Woo;Kim, Chok Hwan;Seo, Jung Sook;Park, Myung Hwa;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.16 no.1
    • /
    • pp.231-242
    • /
    • 2018
  • The efficient management of the Length of Stay(LOS) is important in hospital. It is import to reduce medical cost for patients and increase profitability for hospitals. In order to efficiently manage LOS, it is necessary to develop an artificial intelligence-based prediction model that supports hospitals in benchmarking and reduction ways of LOS. In order to develop a predictive model of LOS for acute stroke patients, acute stroke patients were extracted from 2013 and 2014 discharge injury patient data. The data for analysis was classified as 60% for training and 40% for evaluation. In the model development, we used traditional regression technique such as multiple regression analysis method, artificial intelligence technique such as interactive decision tree, neural network technique, and ensemble technique which integrate all. Model evaluation used Root ASE (Absolute error) index. They were 23.7 by multiple regression, 23.7 by interactive decision tree, 22.7 by neural network and 22.7 by esemble technique. As a result of model evaluation, neural network technique which is artificial intelligence technique was found to be superior. Through this, the utility of artificial intelligence has been proved in the development of the prediction LOS model. In the future, it is necessary to continue research on how to utilize artificial intelligence techniques more effectively in the development of LOS prediction model.

A Study on the Prediction of Disc Cutter Wear Using TBM Data and Machine Learning Algorithm (TBM 데이터와 머신러닝 기법을 이용한 디스크 커터마모 예측에 관한 연구)

  • Tae-Ho, Kang;Soon-Wook, Choi;Chulho, Lee;Soo-Ho, Chang
    • Tunnel and Underground Space
    • /
    • v.32 no.6
    • /
    • pp.502-517
    • /
    • 2022
  • As the use of TBM increases, research has recently increased to to analyze TBM data with machine learning techniques to predict the exchange cycle of disc cutters, and predict the advance rate of TBM. In this study, a regression prediction of disc cutte wear of slurry shield TBM site was made by combining machine learning based on the machine data and the geotechnical data obtained during the excavation. The data were divided into 7:3 for training and testing the prediction of disc cutter wear, and the hyper-parameters are optimized by cross-validated grid-search over a parameter grid. As a result, gradient boosting based on the ensemble model showed good performance with a determination coefficient of 0.852 and a root-mean-square-error of 3.111 and especially excellent results in fit times along with learning performance. Based on the results, it is judged that the suitability of the prediction model using data including mechanical data and geotechnical information is high. In addition, research is needed to increase the diversity of ground conditions and the amount of disc cutter data.

Water Level Prediction on the Golok River Utilizing Machine Learning Technique to Evaluate Flood Situations

  • Pheeranat Dornpunya;Watanasak Supaking;Hanisah Musor;Oom Thaisawasdi;Wasukree Sae-tia;Theethut Khwankeerati;Watcharaporn Soyjumpa
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.31-31
    • /
    • 2023
  • During December 2022, the northeast monsoon, which dominates the south and the Gulf of Thailand, had significant rainfall that impacted the lower southern region, causing flash floods, landslides, blustery winds, and the river exceeding its bank. The Golok River, located in Narathiwat, divides the border between Thailand and Malaysia was also affected by rainfall. In flood management, instruments for measuring precipitation and water level have become important for assessing and forecasting the trend of situations and areas of risk. However, such regions are international borders, so the installed measuring telemetry system cannot measure the rainfall and water level of the entire area. This study aims to predict 72 hours of water level and evaluate the situation as information to support the government in making water management decisions, publicizing them to relevant agencies, and warning citizens during crisis events. This research is applied to machine learning (ML) for water level prediction of the Golok River, Lan Tu Bridge area, Sungai Golok Subdistrict, Su-ngai Golok District, Narathiwat Province, which is one of the major monitored rivers. The eXtreme Gradient Boosting (XGBoost) algorithm, a tree-based ensemble machine learning algorithm, was exploited to predict hourly water levels through the R programming language. Model training and testing were carried out utilizing observed hourly rainfall from the STH010 station and hourly water level data from the X.119A station between 2020 and 2022 as main prediction inputs. Furthermore, this model applies hourly spatial rainfall forecasting data from Weather Research and Forecasting and Regional Ocean Model System models (WRF-ROMs) provided by Hydro-Informatics Institute (HII) as input, allowing the model to predict the hourly water level in the Golok River. The evaluation of the predicted performances using the statistical performance metrics, delivering an R-square of 0.96 can validate the results as robust forecasting outcomes. The result shows that the predicted water level at the X.119A telemetry station (Golok River) is in a steady decline, which relates to the input data of predicted 72-hour rainfall from WRF-ROMs having decreased. In short, the relationship between input and result can be used to evaluate flood situations. Here, the data is contributed to the Operational support to the Special Water Resources Management Operation Center in Southern Thailand for flood preparedness and response to make intelligent decisions on water management during crisis occurrences, as well as to be prepared and prevent loss and harm to citizens.

  • PDF

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.