• Title/Summary/Keyword: Model over-fitting

Search Result 153, Processing Time 0.023 seconds

Modeling the Natural Occurrence of Selected Dipterocarp Genera in Sarawak, Borneo

  • Teo, Stephen;Phua, Mui-How
    • Journal of Forest and Environmental Science
    • /
    • v.28 no.3
    • /
    • pp.170-178
    • /
    • 2012
  • Dipterocarps or Dipterocarpaceae is a commercially important timber producing and dominant keystone tree family in the rain forests of Borneo. Borneo's landscape is changing at an unprecedented rate in recent years which affects this important biodiversity. This paper attempts to model the natural occurrence (distribution including those areas with natural forests before being converted to other land uses as opposed to current distribution) of dipterocarp species in Sarawak which is important for forest biodiversity conservation and management. Local modeling method of Inverse Distance Weighting was compared with commonly used statistical method (Binary Logistic Regression) to build the best natural distribution models for three genera (12 species) of dipterocarps. Database of species occurrence data and pseudoabsence data were constructed and divided into two halves for model building and validation. For logistic regression modeling, climatic, topographical and edaphic parameters were used. Proxy variables were used to represent the parameters which were highly (p>0.75) correlated to avoid over-fitting. The results show that Inverse Distance Weighting produced the best and consistent prediction with an average accuracy of over 80%. This study demonstrates that local interpolation method can be used for the modeling of natural distribution of dipterocarp species. The Inverse Distance Weighted was proven a better method and the possible reasons are discussed.

Novel Image Classification Method Based on Few-Shot Learning in Monkey Species

  • Wang, Guangxing;Lee, Kwang-Chan;Shin, Seong-Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.2
    • /
    • pp.79-83
    • /
    • 2021
  • This paper proposes a novel image classification method based on few-shot learning, which is mainly used to solve model overfitting and non-convergence in image classification tasks of small datasets and improve the accuracy of classification. This method uses model structure optimization to extend the basic convolutional neural network (CNN) model and extracts more image features by adding convolutional layers, thereby improving the classification accuracy. We incorporated certain measures to improve the performance of the model. First, we used general methods such as setting a lower learning rate and shuffling to promote the rapid convergence of the model. Second, we used the data expansion technology to preprocess small datasets to increase the number of training data sets and suppress over-fitting. We applied the model to 10 monkey species and achieved outstanding performances. Experiments indicated that our proposed method achieved an accuracy of 87.92%, which is 26.1% higher than that of the traditional CNN method and 1.1% higher than that of the deep convolutional neural network ResNet50.

Decision Tree Techniques with Feature Reduction for Network Anomaly Detection (네트워크 비정상 탐지를 위한 속성 축소를 반영한 의사결정나무 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.4
    • /
    • pp.795-805
    • /
    • 2019
  • Recently, there is a growing interest in network anomaly detection technology to tackle unknown attacks. For this purpose, diverse studies using data mining, machine learning, and deep learning have been applied to detect network anomalies. In this paper, we evaluate the decision tree to see its feasibility for network anomaly detection on NSL-KDD data set, which is one of the most popular data mining techniques for classification. In order to handle the over-fitting problem of decision tree, we select 13 features from the original 41 features of the data set using chi-square test, and then model the decision tree using TensorFlow and Scik-Learn, yielding 84% and 70% of binary classification accuracies on the KDDTest+ and KDDTest-21 of NSL-KDD test data set. This result shows 3% and 6% improvements compared to the previous 81% and 64% of binary classification accuracies by decision tree technologies, respectively.

An improved Maxwell creep model for salt rock

  • Wang, Jun-Bao;Liu, Xin-Rong;Song, Zhan-Ping;Shao, Zhu-Shan
    • Geomechanics and Engineering
    • /
    • v.9 no.4
    • /
    • pp.499-511
    • /
    • 2015
  • The creep property of salt rock significantly influences the long-term stability of the salt rock underground storage. Triaxial creep tests were performed to investigate the creep behavior of salt rock. The test results indicate that the creep of salt rock has a nonlinear characteristic, which is related to stress level and creep time. The higher the stress level, the longer the creep time, the more obvious the nonlinear characteristic will be. The elastic modulus of salt rock decreases with the prolonged creep time, which shows that the creep damage is produced for the gradual expansion of internal cracks, defects, etc., causing degradation of mechanical properties; meanwhile, the creep rate of salt rock also decreases with the prolonged creep time in the primary creep stage, which indicates that the mechanical properties of salt rock are hardened and strengthened. That is to say, damage and hardening exist simultaneously during the creep of salt rock. Both the damage effect and the hardening effect are considered, an improved Maxwell creep model is proposed by connecting an elastic body softened over time with a viscosity body hardened over time in series, and the creep equation of which is deduced. Creep test data of salt rock are used to evaluate the reasonability and applicability of the improved Maxwell model. The fitting curves are in excellent agreement with the creep test data, and compared with the classical Burgers model, the improved Maxwell model is able to precisely predict the long-term creep deformation of salt rock, illustrating our model can perfectly describe the creep property of salt rock.

Side Information Extrapolation Using Motion-aligned Auto Regressive Model for Compressed Sensing based Wyner-Ziv Codec

  • Li, Ran;Gan, Zongliang;Cui, Ziguan;Wu, Minghu;Zhu, Xiuchang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.2
    • /
    • pp.366-385
    • /
    • 2013
  • In this paper, we propose a compressed sensing (CS) based Wyner-Ziv (WZ) codec using motion-aligned auto regressive model (MAAR) based side information (SI) extrapolation to improve the compression performance of low-delay distributed video coding (DVC). In the CS based WZ codec, the WZ frame is divided into small blocks and CS measurements of each block are acquired at the encoder, and a specific CS reconstruction algorithm is proposed to correct errors in the SI using CS measurements at the decoder. In order to generate high quality SI, a MAAR model is introduced to improve the inaccurate motion field in auto regressive (AR) model, and the Tikhonov regularization on MAAR coefficients and overlapped block based interpolation are performed to reduce block effects and errors from over-fitting. Simulation experiments show that our proposed CS based WZ codec associated with MAAR based SI generation achieves better results compared to other SI extrapolation methods.

New Parametric Affine Modeling and Control for Skid-to-Turn Missiles (STT(Skid-to-Turn)미사일의 매개변수화 어파인 모델링 및 제어)

  • Chwa, Dong-Kyoung;Park, Jin-Young;Kim, Jinho;Song, Chan-Ho
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.6 no.8
    • /
    • pp.727-731
    • /
    • 2000
  • This paper presents a new practical autopilot design approach to acceleration control for tail-controlled STT(Skid-to-Turn) missiles. The approach is novel in that the proposed parametric affine missile model adopts acceleration as th controlled output and considers the couplings between the forces as well as the moments and control fin deflections. The aerodynamic coefficients in the proposed model are expressed in a closed form with fittable parameters over the whole operating range. The parameters are fitted from aerodynamic coefficient look-up tables by the function approximation technique which is based on the combination of local parametric models through curve fitting using the corresponding influence functions. In this paper in order to employ the results of parametric affine modeling in the autopilot controller design we derived a parametric affine missile model and designed a feedback linearizing controller for the obtained model. Stability analysis for the overall closed loop sys-tem is provided considering the uncertainties arising from approximation errors. the validity of the proposed modeling and control approach is demonstrated through simulations for an STT missile.

  • PDF

A Study on Weight Estimation Model of Floating Offshore Structures using Enhanced Genetic Programming Method (개선된 유전적 프로그래밍 방법을 이용한 부유식 해양 구조물의 중량 추정 모델 연구)

  • Um, Tae-Sub;Roh, Myung-Il;Shin, Hyunkyoung
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.52 no.1
    • /
    • pp.1-7
    • /
    • 2015
  • The weight estimation of floating offshore structures such as FPSO, TLP, semi-Submersibles, Floating Offshore Wind Turbines etc. in the preliminary design, is one of direct measures of both construction cost and basic performance. Through both literature investigation and internet search, the weight data of floating offshore structures such as FPSO and TLP was collected. In this study, the weight estimation model with the genetic programming was suggested for FPSO. The weight estimation model using genetic programming was established by fixing the independent variables based on this data. In addition, the correlation analysis was performed to make up for the weak points of genetic programming; it is apt to induce over-fitting when the number of data is relatively smaller than that of independent variables. That is, by reducing the number of variables through the analysis of the correlation between the independent variables, the increasing effect in the number of weight data can be expected. The reliability of the developed weight estimation model was within 2% of error rate.

Boosting the Performance of the Predictive Model on the Imbalanced Dataset Using SVM Based Bagging and Out-of-Distribution Detection (SVM 기반 Bagging과 OoD 탐색을 활용한 제조공정의 불균형 Dataset에 대한 예측모델의 성능향상)

  • Kim, Jong Hoon;Oh, Hayoung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.455-464
    • /
    • 2022
  • There are two unique characteristics of the datasets from a manufacturing process. They are the severe class imbalance and lots of Out-of-Distribution samples. Some good strategies such as the oversampling over the minority class, and the down-sampling over the majority class, are well known to handle the class imbalance. In addition, SMOTE has been chosen to address the issue recently. But, Out-of-Distribution samples have been studied just with neural networks. It seems to be hardly shown that Out-of-Distribution detection is applied to the predictive model using conventional machine learning algorithms such as SVM, Random Forest and KNN. It is known that conventional machine learning algorithms are much better than neural networks in prediction performance, because neural networks are vulnerable to over-fitting and requires much bigger dataset than conventional machine learning algorithms does. So, we suggests a new approach to utilize Out-of-Distribution detection based on SVM algorithm. In addition to that, bagging technique will be adopted to improve the precision of the model.

Survival Prediction of Rats with Hemorrhagic Shocks Using Support Vector Machine (지원벡터기계를 이용한 출혈을 일으킨 흰쥐에서의 생존 예측)

  • Jang, K.H.;Choi, J.L.;Yoo, T.K.;Kwon, M.K.;Kim, D.W.
    • Journal of Biomedical Engineering Research
    • /
    • v.33 no.1
    • /
    • pp.1-7
    • /
    • 2012
  • Hemorrhagic shock is a common cause of death in emergency rooms. Early diagnosis of hemorrhagic shock makes it possible for physicians to treat patients successfully. Therefore, the purpose of this study was to select an optimal survival prediction model using physiological parameters for the two analyzed periods: two and five minutes before and after the bleeding end. We obtained heart rates, mean arterial pressures, respiration rates and temperatures from 45 rats. These physiological parameters were used for the training and testing data sets of survival prediction models using an artificial neural network (ANN) and support vector machine (SVM). We applied a 5-fold cross validation method to avoid over-fitting and to select the optimal survival prediction model. In conclusion, SVM model showed slightly better accuracy than ANN model for survival prediction during the entire analysis period.

Development of the Algorithm for Optimizing Wavelength Selection in Multiple Linear Regression

  • Hoeil Chung
    • Near Infrared Analysis
    • /
    • v.1 no.1
    • /
    • pp.1-7
    • /
    • 2000
  • A convenient algorithm for optimizing wavelength selection in multiple linear regression (MLR) has been developed. MOP (MLP Optimization Program) has been developed to test all possible MLR calibration models in a given spectral range and finally find an optimal MLR model with external validation capability. MOP generates all calibration models from all possible combinations of wavelength, and simultaneously calculates SEC (Standard Error of Calibration) and SEV (Standard Error of Validation) by predicting samples in a validation data set. Finally, with determined SEC and SEV, it calculates another parameter called SAD (Sum of SEC, SEV, and Absolute Difference between SEC and SEV: sum(SEC+SEV+Abs(SEC-SEV)). SAD is an useful parameter to find an optimal calibration model without over-fitting by simultaneously evaluating SEC, SEV, and difference of error between calibration and validation. The calibration model corresponding to the smallest SAD value is chosen as an optimum because the errors in both calibration and validation are minimal as well as similar in scale. To evaluate the capability of MOP, the determination of benzene content in unleaded gasoline has been examined. MOP successfully found the optimal calibration model and showed the better calibration and independent prediction performance compared to conventional MLR calibration.