• Title/Summary/Keyword: Machine Learning Models

Search Result 1,395, Processing Time 0.028 seconds

Application of Statistical and Machine Learning Techniques for Habitat Potential Mapping of Siberian Roe Deer in South Korea

  • Lee, Saro;Rezaie, Fatemeh
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.2 no.1
    • /
    • pp.1-14
    • /
    • 2021
  • The study has been carried out with an objective to prepare Siberian roe deer habitat potential maps in South Korea based on three geographic information system-based models including frequency ratio (FR) as a bivariate statistical approach as well as convolutional neural network (CNN) and long short-term memory (LSTM) as machine learning algorithms. According to field observations, 741 locations were reported as roe deer's habitat preferences. The dataset were divided with a proportion of 70:30 for constructing models and validation purposes. Through FR model, a total of 10 influential factors were opted for the modelling process, namely altitude, valley depth, slope height, topographic position index (TPI), topographic wetness index (TWI), normalized difference water index, drainage density, road density, radar intensity, and morphological feature. The results of variable importance analysis determined that TPI, TWI, altitude and valley depth have higher impact on predicting. Furthermore, the area under the receiver operating characteristic (ROC) curve was applied to assess the prediction accuracies of three models. The results showed that all the models almost have similar performances, but LSTM model had relatively higher prediction ability in comparison to FR and CNN models with the accuracy of 76% and 73% during the training and validation process. The obtained map of LSTM model was categorized into five classes of potentiality including very low, low, moderate, high and very high with proportions of 19.70%, 19.81%, 19.31%, 19.86%, and 21.31%, respectively. The resultant potential maps may be valuable to monitor and preserve the Siberian roe deer habitats.

Evaluation of Surrogate Monitoring Parameters for SS and T-P Using Multiple Linear Regression and Random Forest (다중 선형 회귀 분석과 랜덤 포레스트를 이용한 SS, T-P 대리모니터링 기법 평가)

  • Jeung, Minhyuk;Beom, Jina;Choi, Dongho;Kim, Young-joo;Her, Younggu;Yoon, Kwangsik
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.63 no.2
    • /
    • pp.51-60
    • /
    • 2021
  • Effective nonpoint source (NPS) pollution management requires frequent water quality monitoring, which is, however, often costly to be implemented in practice. Statistical techniques and machine learning methods allow us to identify and focus on fundamental environmental variables that have close relationships with NPS pollutants of interest. This study developed surrogate models to predict the concentrations of suspended sediment (SS) and total phosphorus (T-P) from turbidity and runoff discharge rates using multiple linear regression (MLR) and random forest (RF) methods. The RF models provided acceptable performance in predicting SS and T-P, especially when runoff discharge rates were high. The RF models outperformed the MLR models in all the cases. Such finding highlights the potential of RF techniques and models as a tool to identify fundamental environmental variables that are measured in relatively inexpensive ways or freely available but still able to provide information required to quantify the concentrations of NP S pollutants. The analysis of relative importance rates showed that the temporal variations of SS and T-P concentrations could be more effectively explained by that of turbidity than runoff discharge rate. This study demonstrated that the advanced statistical techniques such as machine learning could help to improve the efficiency of NPS pollutants monitoring.

Machine Learning Methods to Predict Vehicle Fuel Consumption

  • Ko, Kwangho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.13-20
    • /
    • 2022
  • It's proposed and analyzed ML(Machine Learning) models to predict vehicle FC(Fuel Consumption) in real-time. The test driving was done for a car to measure vehicle speed, acceleration, road gradient and FC for training dataset. The various ML models were trained with feature data of speed, acceleration and road-gradient for target FC. There are two kind of ML models and one is regression type of linear regression and k-nearest neighbors regression and the other is classification type of k-nearest neighbors classifier, logistic regression, decision tree, random forest and gradient boosting in the study. The prediction accuracy is low in range of 0.5 ~ 0.6 for real-time FC and the classification type is more accurate than the regression ones. The prediction error for total FC has very low value of about 0.2 ~ 2.0% and regression models are more accurate than classification ones. It's for the coefficient of determination (R2) of accuracy score distributing predicted values along mean of targets as the coefficient decreases. Therefore regression models are good for total FC and classification ones are proper for real-time FC prediction.

Intelligent prediction of engineered cementitious composites with limestone calcined clay cement (LC3-ECC) compressive strength based on novel machine learning techniques

  • Enming Li;Ning Zhang;Bin Xi;Vivian WY Tam;Jiajia Wang;Jian Zhou
    • Computers and Concrete
    • /
    • v.32 no.6
    • /
    • pp.577-594
    • /
    • 2023
  • Engineered cementitious composites with calcined clay limestone cement (LC3-ECC) as a kind of green, low-carbon and high toughness concrete, has recently received significant investigation. However, the complicated relationship between potential influential factors and LC3-ECC compressive strength makes the prediction of LC3-ECC compressive strength difficult. Regarding this, the machine learning-based prediction models for the compressive strength of LC3-ECC concrete is firstly proposed and developed. Models combine three novel meta-heuristic algorithms (golden jackal optimization algorithm, butterfly optimization algorithm and whale optimization algorithm) with support vector regression (SVR) to improve the accuracy of prediction. A new dataset about LC3-ECC compressive strength was integrated based on 156 data from previous studies and used to develop the SVR-based models. Thirteen potential factors affecting the compressive strength of LC3-ECC were comprehensively considered in the model. The results show all hybrid SVR prediction models can reach the Coefficient of determination (R2) above 0.95 for the testing set and 0.97 for the training set. Radar and Taylor plots also show better overall prediction performance of the hybrid SVR models than several traditional machine learning techniques, which confirms the superiority of the three proposed methods. The successful development of this predictive model can provide scientific guidance for LC3-ECC materials and further apply to such low-carbon, sustainable cement-based materials.

A Deep Learning Model for Extracting Consumer Sentiments using Recurrent Neural Network Techniques

  • Ranjan, Roop;Daniel, AK
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.238-246
    • /
    • 2021
  • The rapid rise of the Internet and social media has resulted in a large number of text-based reviews being placed on sites such as social media. In the age of social media, utilizing machine learning technologies to analyze the emotional context of comments aids in the understanding of QoS for any product or service. The classification and analysis of user reviews aids in the improvement of QoS. (Quality of Services). Machine Learning algorithms have evolved into a powerful tool for analyzing user sentiment. Unlike traditional categorization models, which are based on a set of rules. In sentiment categorization, Bidirectional Long Short-Term Memory (BiLSTM) has shown significant results, and Convolution Neural Network (CNN) has shown promising results. Using convolutions and pooling layers, CNN can successfully extract local information. BiLSTM uses dual LSTM orientations to increase the amount of background knowledge available to deep learning models. The suggested hybrid model combines the benefits of these two deep learning-based algorithms. The data source for analysis and classification was user reviews of Indian Railway Services on Twitter. The suggested hybrid model uses the Keras Embedding technique as an input source. The suggested model takes in data and generates lower-dimensional characteristics that result in a categorization result. The suggested hybrid model's performance was compared using Keras and Word2Vec, and the proposed model showed a significant improvement in response with an accuracy of 95.19 percent.

Improving Chest X-ray Image Classification via Integration of Self-Supervised Learning and Machine Learning Algorithms

  • Tri-Thuc Vo;Thanh-Nghi Do
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.165-171
    • /
    • 2024
  • In this study, we present a novel approach for enhancing chest X-ray image classification (normal, Covid-19, edema, mass nodules, and pneumothorax) by combining contrastive learning and machine learning algorithms. A vast amount of unlabeled data was leveraged to learn representations so that data efficiency is improved as a means of addressing the limited availability of labeled data in X-ray images. Our approach involves training classification algorithms using the extracted features from a linear fine-tuned Momentum Contrast (MoCo) model. The MoCo architecture with a Resnet34, Resnet50, or Resnet101 backbone is trained to learn features from unlabeled data. Instead of only fine-tuning the linear classifier layer on the MoCopretrained model, we propose training nonlinear classifiers as substitutes for softmax in deep networks. The empirical results show that while the linear fine-tuned ImageNet-pretrained models achieved the highest accuracy of only 82.9% and the linear fine-tuned MoCo-pretrained models an increased highest accuracy of 84.8%, our proposed method offered a significant improvement and achieved the highest accuracy of 87.9%.

Online Selective-Sample Learning of Hidden Markov Models for Sequence Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.145-152
    • /
    • 2015
  • We consider an online selective-sample learning problem for sequence classification, where the goal is to learn a predictive model using a stream of data samples whose class labels can be selectively queried by the algorithm. Given that there is a limit to the total number of queries permitted, the key issue is choosing the most informative and salient samples for their class labels to be queried. Recently, several aggressive selective-sample algorithms have been proposed under a linear model for static (non-sequential) binary classification. We extend the idea to hidden Markov models for multi-class sequence classification by introducing reasonable measures for the novelty and prediction confidence of the incoming sample with respect to the current model, on which the query decision is based. For several sequence classification datasets/tasks in online learning setups, we demonstrate the effectiveness of the proposed approach.

Implementing a Branch-and-bound Algorithm for Transductive Support Vector Machines

  • Park, Chan-Kyoo
    • Management Science and Financial Engineering
    • /
    • v.16 no.1
    • /
    • pp.81-117
    • /
    • 2010
  • Semi-supervised learning incorporates unlabeled examples, whose labels are unknown, as well as labeled examples into learning process. Although transductive support vector machine (TSVM), one of semi-supervised learning models, was proposed about a decade ago, its application to large-scaled data has still been limited due to its high computational complexity. Our previous research addressed this limitation by introducing a branch-and-bound algorithm for finding an optimal solution to TSVM. In this paper, we propose three new techniques to enhance the performance of the branch-and-bound algorithm. The first one tightens min-cut bound, one of two bounding strategies. Another technique exploits a graph-based approximation to a support vector machine problem to avoid the most time-consuming step. The last one tries to fix the labels of unlabeled examples whose labels can be obviously predicted based on labeled examples. Experimental results are presented which demonstrate that the proposed techniques can reduce drastically the number of subproblems and eventually computational time.

Wind power forecasting based on time series and machine learning models (시계열 모형과 기계학습 모형을 이용한 풍력 발전량 예측 연구)

  • Park, Sujin;Lee, Jin-Young;Kim, Sahm
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.723-734
    • /
    • 2021
  • Wind energy is one of the rapidly developing renewable energies which is being developed and invested in response to climate change. As renewable energy policies and power plant installations are promoted, the supply of wind power in Korea is gradually expanding and attempts to accurately predict demand are expanding. In this paper, the ARIMA and ARIMAX models which are Time series techniques and the SVR, Random Forest and XGBoost models which are machine learning models were compared and analyzed to predict wind power generation in the Jeonnam and Gyeongbuk regions. Mean absolute error (MAE) and mean absolute percentage error (MAPE) were used as indicators to compare the predicted results of the model. After subtracting the hourly raw data from January 1, 2018 to October 24, 2020, the model was trained to predict wind power generation for 168 hours from October 25, 2020 to October 31, 2020. As a result of comparing the predictive power of the models, the Random Forest and XGBoost models showed the best performance in the order of Jeonnam and Gyeongbuk. In future research, we will try not only machine learning models but also forecasting wind power generation based on data mining techniques that have been actively researched recently.

Default Prediction of Automobile Credit Based on Support Vector Machine

  • Chen, Ying;Zhang, Ruirui
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.75-88
    • /
    • 2021
  • Automobile credit business has developed rapidly in recent years, and corresponding default phenomena occur frequently. Credit default will bring great losses to automobile financial institutions. Therefore, the successful prediction of automobile credit default is of great significance. Firstly, the missing values are deleted, then the random forest is used for feature selection, and then the sample data are randomly grouped. Finally, six prediction models of support vector machine (SVM), random forest and k-nearest neighbor (KNN), logistic, decision tree, and artificial neural network (ANN) are constructed. The results show that these six machine learning models can be used to predict the default of automobile credit. Among these six models, the accuracy of decision tree is 0.79, which is the highest, but the comprehensive performance of SVM is the best. And random grouping can improve the efficiency of model operation to a certain extent, especially SVM.