• Title/Summary/Keyword: Machine Learning Models

Search Result 1,395, Processing Time 0.029 seconds

Machine learning-based techniques to facilitate the production of stone nano powder-reinforced manufactured-sand concrete

  • Zanyu Huang;Qiuyue Han;Adil Hussein Mohammed;Arsalan Mahmoodzadeh;Nejib Ghazouani;Shtwai Alsubai;Abed Alanazi;Abdullah Alqahtani
    • Advances in nano research
    • /
    • v.15 no.6
    • /
    • pp.533-539
    • /
    • 2023
  • This study aims to examine four machine learning (ML)-based models for their potential to estimate the splitting tensile strength (STS) of manufactured sand concrete (MSC). The ML models were trained and tested based on 310 experimental data points. Stone nanopowder content (SNPC), curing age (CA), and water-to-cement (W/C) ratio were also studied for their impacts on the STS of MSC. According to the results, the support vector regression (SVR) model had the highest correlation with experimental data. Still, all of the optimized ML models showed promise in estimating the STS of MSC. Both ML and laboratory results showed that MSC with 10% SNPC improved the STS of MSC.

Prediction of Hardness for Cold Forging Manufacturing through Machine Learning (기계학습을 활용한 냉간단조 부품 제조 경도 예측 연구)

  • K. Kim;J-.G. Park;U. R. Heo;Y. H. Lee;D. H. Chang;H. W. Yang
    • Transactions of Materials Processing
    • /
    • v.32 no.6
    • /
    • pp.329-334
    • /
    • 2023
  • The process of heat treatment in cold forging is an essential role in enhancing mechanical properties. However, it relies heavily on the experience and skill of individuals. The aim of this study is to predict hardness using machine learning to optimize production efficiency in cold forging manufacturing. Random Forest (RF), Gradient Boosting Regressor (GBR), Extra Trees (ET), and ADAboosting (ADA) models were utilized. In the result, the RF, GBR, and ET models show the excellent performance. However, it was observed that GBR and ET models leaned significantly towards the influence of temperature, unlike the RF model. We suggest that RF model demonstrates greater reliability in predicting hardness due to its ability to consider various variables that occur during the cold forging process.

Machine Learning Based Model Development and Optimization for Predicting Radiation (방사선량률 예측을 위한 기계학습 기반 모델 개발 및 최적화 연구)

  • SiHyun Lee;HongYeon Lee;JungMin Yeom
    • Journal of Radiation Industry
    • /
    • v.17 no.4
    • /
    • pp.551-557
    • /
    • 2023
  • In recent years, radiation has become a socially important issue, increasing the need for accurate prediction of radiation levels. In this study, machine learning-based models such as Multiple Linear Regression (MLR), Random Forest (RF), XGBoost, and LightGBM, which predict the dose rate by time(nSv h-1) by selecting only important variables, were used, and the correlation between temperature, humidity, cumulative precipitation, wind direction, wind speed, local air pressure, sea pressure, solar radiation, and radiation dose rate (nSv h-1) was analyzed by collecting weather data and radiation dose rate for about 6 months in Jangseong, Jeollanam-do. As a result of the evaluation based on the RMSE (Root Mean Squared Error) and R-Squared (R-Squared coefficient of determination) scores, the RMSE of the XGBoost model was 22.92 and the R-Squared was 0.73, showing the best performance among the models used. As a result of optimizing hyperparameters of all models using the GridSearch method and comparing them by adding variables inside the measuring instrument, it was confirmed that the performance improved to 2.39 for RMSE and 0.99 for R-Squared in both XGBoost and LightGBM.

A Performance Comparison of Machine Learning Classification Methods for Soil Creep Susceptibility Assessment (땅밀림 위험지 평가를 위한 기계학습 분류모델 비교)

  • Lee, Jeman;Seo, Jung Il;Lee, Jin-Ho;Im, Sangjun
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.4
    • /
    • pp.610-621
    • /
    • 2021
  • The soil creep, primarily caused by earthquakes and torrential rainfall events, has widely occurred across the country. The Korea Forest Service attempted to quantify the soil creep susceptible areas using a discriminant value table to prevent or mitigate casualties and/or property damages in advance. With the advent of advanced computer technologies, machine learning-based classification models have been employed for managing mountainous disasters, such as landslides and debris flows. This study aims to quantify the soil creep susceptibility using several classifiers, namely the k-Nearest Neighbor (k-NN), Naive Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM) models. To develop the classification models, we downscaled 292 data from 4,618 field survey data. About 70% of the selected data were used for training, with the remaining 30% used for model testing. The developed models have the classification accuracy of 0.727 for k-NN, 0.750 for NB, 0.807 for RF, and 0.750 for SVM against test datasets representing 30% of the total data. Furthermore, we estimated Cohen's Kappa index as 0.534, 0.580, 0.673, and 0.585, with AUC values of 0.872, 0.912, 0.943, and 0.834, respectively. The machine learning-based classifications for soil creep susceptibility were RF, NB, SVM, and k-NN in that order. Our findings indicate that the machine learning classifiers can provide valuable information in establishing and implementing natural disaster management plans in mountainous areas.

Learning Graphical Models for DNA Chip Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.59-60
    • /
    • 2000
  • The past few years have seen a dramatic increase in gene expression data on the basis of DNA microarrays or DNA chips. Going beyond a generic view on the genome, microarray data are able to distinguish between gene populations in different tissues of the same organism and in different states of cells belonging to the same tissue. This affords a cell-wide view of the metabolic and regulatory processes under different conditions, building an effective basis for new diagnoses and therapies of diseases. In this talk we present machine learning techniques for effective mining of DNA microarray data. A brief introduction to the research field of machine learning from the computer science and artificial intelligence point of view is followed by a review of recently-developed learning algorithms applied to the analysis of DNA chip gene expression data. Emphasis is put on graphical models, such as Bayesian networks, latent variable models, and generative topographic mapping. Finally, we report on our own results of applying these learning methods to two important problems: the identification of cell cycle-regulated genes and the discovery of cancer classes by gene expression monitoring. The data sets are provided by the competition CAMDA-2000, the Critical Assessment of Techniques for Microarray Data Mining.

  • PDF

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Classifying Sub-Categories of Apartment Defect Repair Tasks: A Machine Learning Approach (아파트 하자 보수 시설공사 세부공종 머신러닝 분류 시스템에 관한 연구)

  • Kim, Eunhye;Ji, HongGeun;Kim, Jina;Park, Eunil;Ohm, Jay Y.
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.359-366
    • /
    • 2021
  • A number of construction companies in Korea invest considerable human and financial resources to construct a system for managing apartment defect data and for categorizing repair tasks. Thus, this study proposes machine learning models to automatically classify defect complaint text-data into one of the sub categories of 'finishing work' (i.e., one of the defect repair tasks). In the proposed models, we employed two word representation methods (Bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF)) and two machine learning classifiers (Support Vector Machine, Random Forest). In particular, we conducted both binary- and multi- classification tasks to classify 9 sub categories of finishing work: home appliance installation work, paperwork, painting work, plastering work, interior masonry work, plaster finishing work, indoor furniture installation work, kitchen facility installation work, and tiling work. The machine learning classifiers using the TF-IDF representation method and Random Forest classification achieved more than 90% accuracy, precision, recall, and F1 score. We shed light on the possibility of constructing automated defect classification systems based on the proposed machine learning models.

A New Ensemble Machine Learning Technique with Multiple Stacking (다중 스태킹을 가진 새로운 앙상블 학습 기법)

  • Lee, Su-eun;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.1-13
    • /
    • 2020
  • Machine learning refers to a model generation technique that can solve specific problems from the generalization process for given data. In order to generate a high performance model, high quality training data and learning algorithms for generalization process should be prepared. As one way of improving the performance of model to be learned, the Ensemble technique generates multiple models rather than a single model, which includes bagging, boosting, and stacking learning techniques. This paper proposes a new Ensemble technique with multiple stacking that outperforms the conventional stacking technique. The learning structure of multiple stacking ensemble technique is similar to the structure of deep learning, in which each layer is composed of a combination of stacking models, and the number of layers get increased so as to minimize the misclassification rate of each layer. Through experiments using four types of datasets, we have showed that the proposed method outperforms the exiting ones.

Assessment of maximum liquefaction distance using soft computing approaches

  • Kishan Kumar;Pijush Samui;Shiva S. Choudhary
    • Geomechanics and Engineering
    • /
    • v.37 no.4
    • /
    • pp.395-418
    • /
    • 2024
  • The epicentral region of earthquakes is typically where liquefaction-related damage takes place. To determine the maximum distance, such as maximum epicentral distance (Re), maximum fault distance (Rf), or maximum hypocentral distance (Rh), at which an earthquake can inflict damage, given its magnitude, this study, using a recently updated global liquefaction database, multiple ML models are built to predict the limiting distances (Re, Rf, or Rh) required for an earthquake of a given magnitude to cause damage. Four machine learning models LSTM (Long Short-Term Memory), BiLSTM (Bidirectional Long Short-Term Memory), CNN (Convolutional Neural Network), and XGB (Extreme Gradient Boosting) are developed using the Python programming language. All four proposed ML models performed better than empirical models for limiting distance assessment. Among these models, the XGB model outperformed all the models. In order to determine how well the suggested models can predict limiting distances, a number of statistical parameters have been studied. To compare the accuracy of the proposed models, rank analysis, error matrix, and Taylor diagram have been developed. The ML models proposed in this paper are more robust than other current models and may be used to assess the minimal energy of a liquefaction disaster caused by an earthquake or to estimate the maximum distance of a liquefied site provided an earthquake in rapid disaster mapping.

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • v.14 no.3
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.