• Title/Summary/Keyword: Training Data Sets

Search Result 321, Processing Time 0.052 seconds

Discrimination between earthquake and explosion by using seismic spectral characteristics and linear discriminant analysis (지진파 스펙트럼특성과 선형판별분석을 이용한 자연지진과 인공지진 식별)

  • 제일영;전정수;이희일
    • Proceedings of the Earthquake Engineering Society of Korea Conference
    • /
    • 2003.09a
    • /
    • pp.13-19
    • /
    • 2003
  • Discriminant method using seismic signal was studied for discrimination of surface explosion. By means of the seismic spectral characteristics, multi-variate discriminant analysis was performed. Four single discriminant techniques - Pg/Lg, Lg1/Lg2, Pg1/Pg2, and Rg/Lg - based on seismic source theory were applied to explosion and earthquake training data sets. The Pg/Lg discriminant technique was most effective among the four techniques. Nevertheless, it could not perfectly discriminate the samples of the training data sets. In this study, a compound linear discriminant analysis was defined by using common characteristics of the training data sets for the single discriminants. The compound linear discriminant analysis was used for the single discriminant as an independent variable. From this analysis, all the samples of the training data sets were correctly discriminated, and the probability of misclassification was lowered to 0.7%.

  • PDF

Video augmentation technique for human action recognition using genetic algorithm

  • Nida, Nudrat;Yousaf, Muhammad Haroon;Irtaza, Aun;Velastin, Sergio A.
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.327-338
    • /
    • 2022
  • Classification models for human action recognition require robust features and large training sets for good generalization. However, data augmentation methods are employed for imbalanced training sets to achieve higher accuracy. These samples generated using data augmentation only reflect existing samples within the training set, their feature representations are less diverse and hence, contribute to less precise classification. This paper presents new data augmentation and action representation approaches to grow training sets. The proposed approach is based on two fundamental concepts: virtual video generation for augmentation and representation of the action videos through robust features. Virtual videos are generated from the motion history templates of action videos, which are convolved using a convolutional neural network, to generate deep features. Furthermore, by observing an objective function of the genetic algorithm, the spatiotemporal features of different samples are combined, to generate the representations of the virtual videos and then classified through an extreme learning machine classifier on MuHAVi-Uncut, iXMAS, and IAVID-1 datasets.

Empirical modeling of flexural and splitting tensile strengths of concrete containing fly ash by GEP

  • Saridemir, Mustafa
    • Computers and Concrete
    • /
    • v.17 no.4
    • /
    • pp.489-498
    • /
    • 2016
  • In this paper, the flexural strength ($f_{fs}$) and splitting tensile strength ($f_{sts}$) of concrete containing different proportions of fly ash have been modeled by using gene expression programming (GEP). Two GEP models called GEP-I and GEP-II are constituted to predict the $f_{fs}$ and $f_{sts}$ values, respectively. In these models, the age of specimen, cement, water, sand, aggregate, superplasticizer and fly ash are used as independent input parameters. GEP-I model is constructed by 292 experimental data and trisected into 170, 86 and 36 data for training, testing and validating sets, respectively. Similarly, GEP-II model is constructed by 278 experimental data and trisected into 142, 70 and 66 data for training, testing and validating sets, respectively. The experimental data used in the validating set of these models are independent from the training and testing sets. The results of the statistical parameters obtained from the models indicate that the proposed empirical models have good prediction and generalization capability.

An Improved Deep Learning Method for Animal Images (동물 이미지를 위한 향상된 딥러닝 학습)

  • Wang, Guangxing;Shin, Seong-Yoon;Shin, Kwang-Weong;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.123-124
    • /
    • 2019
  • This paper proposes an improved deep learning method based on small data sets for animal image classification. Firstly, we use a CNN to build a training model for small data sets, and use data augmentation to expand the data samples of the training set. Secondly, using the pre-trained network on large-scale datasets, such as VGG16, the bottleneck features in the small dataset are extracted and to be stored in two NumPy files as new training datasets and test datasets. Finally, training a fully connected network with the new datasets. In this paper, we use Kaggle famous Dogs vs Cats dataset as the experimental dataset, which is a two-category classification dataset.

  • PDF

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

Comparison of Machine Learning-Based Radioisotope Identifiers for Plastic Scintillation Detector

  • Jeon, Byoungil;Kim, Jongyul;Yu, Yonggyun;Moon, Myungkook
    • Journal of Radiation Protection and Research
    • /
    • v.46 no.4
    • /
    • pp.204-212
    • /
    • 2021
  • Background: Identification of radioisotopes for plastic scintillation detectors is challenging because their spectra have poor energy resolutions and lack photo peaks. To overcome this weakness, many researchers have conducted radioisotope identification studies using machine learning algorithms; however, the effect of data normalization on radioisotope identification has not been addressed yet. Furthermore, studies on machine learning-based radioisotope identifiers for plastic scintillation detectors are limited. Materials and Methods: In this study, machine learning-based radioisotope identifiers were implemented, and their performances according to data normalization methods were compared. Eight classes of radioisotopes consisting of combinations of 22Na, 60Co, and 137Cs, and the background, were defined. The training set was generated by the random sampling technique based on probabilistic density functions acquired by experiments and simulations, and test set was acquired by experiments. Support vector machine (SVM), artificial neural network (ANN), and convolutional neural network (CNN) were implemented as radioisotope identifiers with six data normalization methods, and trained using the generated training set. Results and Discussion: The implemented identifiers were evaluated by test sets acquired by experiments with and without gain shifts to confirm the robustness of the identifiers against the gain shift effect. Among the three machine learning-based radioisotope identifiers, prediction accuracy followed the order SVM > ANN > CNN, while the training time followed the order SVM > ANN > CNN. Conclusion: The prediction accuracy for the combined test sets was highest with the SVM. The CNN exhibited a minimum variation in prediction accuracy for each class, even though it had the lowest prediction accuracy for the combined test sets among three identifiers. The SVM exhibited the highest prediction accuracy for the combined test sets, and its training time was the shortest among three identifiers.

Modelling the flexural strength of mortars containing different mineral admixtures via GEP and RA

  • Saridemir, Mustafa
    • Computers and Concrete
    • /
    • v.19 no.6
    • /
    • pp.717-724
    • /
    • 2017
  • In this paper, four formulas are proposed via gene expression programming (GEP)-based models and regression analysis (RA) to predict the flexural strength ($f_s$) values of mortars containing different mineral admixtures that are ground granulated blast-furnace slag (GGBFS), silica fume (SF) and fly ash (FA) at different ages. Three formulas obtained from the GEP-I, GEP-II and GEP-III models are constituted to predict the $f_s$ values from the age of specimen, water-binder ratio and compressive strength. Besides, one formula obtained from the RA is constituted to predict the $f_s$ values from the compressive strength. To achieve these formulas in the GEP and RA models, 972 data of the experimental studies presented with mortar mixtures were gathered from the literatures. 734 data of the experimental studies are divided without pre-planned for these formulas achieved from the training and testing sets of GEP and RA models. Beside, these formulas are validated with 238 data of experimental studies un-employed in training and testing sets. The $f_s$ results obtained from the training, testing and validation sets of these formulas are compared with the results obtained from the experimental studies and the formulas given in the literature for concrete. These comparisons show that the results of the formulas obtained from the GEP and RA models appear to well compatible with the experimental results and find to be very credible according to the results of other formulas.

Training for Huge Data set with On Line Pruning Regression by LS-SVM

  • Kim, Dae-Hak;Shim, Joo-Yong;Oh, Kwang-Sik
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.137-141
    • /
    • 2003
  • LS-SVM(least squares support vector machine) is a widely applicable and useful machine learning technique for classification and regression analysis. LS-SVM can be a good substitute for statistical method but computational difficulties are still remained to operate the inversion of matrix of huge data set. In modern information society, we can easily get huge data sets by on line or batch mode. For these kind of huge data sets, we suggest an on line pruning regression method by LS-SVM. With relatively small number of pruned support vectors, we can have almost same performance as regression with full data set.

  • PDF

Forecasting Water Levels Of Bocheong River Using Neural Network Model

  • Kim, Ji-tae;Koh, Won-joon;Cho, Won-cheol
    • Water Engineering Research
    • /
    • v.1 no.2
    • /
    • pp.129-136
    • /
    • 2000
  • Predicting water levels is a difficult task because a lot of uncertainties are included. Therefore the neural network which is appropriate to such a problem, is introduced. One day ahead forecasting of river stage in the Bocheong River is carried out by using the neural network model. Historical water levels at Snagye gauging point which is located at the downstream of the Bocheong River and average rainfall of the Bocheong River basin are selected as training data sets. With these data sets, the training process has been done by using back propagation algorithm. Then waters levels in 1997 and 1998 are predicted with the trained algorithm. To improve the accuracy, a filtering method is introduced as predicting scheme. It is shown that predicted results are in a good agreement with observed water levels and that a filtering method can overcome the lack of training patterns.

  • PDF

Classification of Remote Sensing Data using Random Selection of Training Data and Multiple Classifiers (훈련 자료의 임의 선택과 다중 분류자를 이용한 원격탐사 자료의 분류)

  • Park, No-Wook;Yoo, Hee Young;Kim, Yihyun;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.5
    • /
    • pp.489-499
    • /
    • 2012
  • In this paper, a classifier ensemble framework for remote sensing data classification is presented that combines classification results generated from both different training sets and different classifiers. A core part of the presented framework is to increase a diversity between classification results by using both different training sets and classifiers to improve classification accuracy. First, different training sets that have different sampling densities are generated and used as inputs for supervised classification using different classifiers that show different discrimination capabilities. Then several preliminary classification results are combined via a majority voting scheme to generate a final classification result. A case study of land-cover classification using multi-temporal ENVISAT ASAR data sets is carried out to illustrate the potential of the presented classification framework. In the case study, nine classification results were combined that were generated by using three different training sets and three different classifiers including maximum likelihood classifier, multi-layer perceptron classifier, and support vector machine. The case study results showed that complementary information on the discrimination of land-cover classes of interest would be extracted within the proposed framework and the best classification accuracy was obtained. When comparing different combinations, to combine any classification results where the diversity of the classifiers is not great didn't show an improvement of classification accuracy. Thus, it is recommended to ensure the greater diversity between classifiers in the design of multiple classifier systems.