• Title/Summary/Keyword: Data Generalization

Search Result 534, Processing Time 0.021 seconds

A Study on the Development of DGA based on Deep Learning (Deep Learning 기반의 DGA 개발에 대한 연구)

  • Park, Jae-Gyun;Choi, Eun-Soo;Kim, Byung-June;Zhang, Pan
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.18-28
    • /
    • 2017
  • Recently, there are many companies that use systems based on artificial intelligence. The accuracy of artificial intelligence depends on the amount of learning data and the appropriate algorithm. However, it is not easy to obtain learning data with a large number of entity. Less data set have large generalization errors due to overfitting. In order to minimize this generalization error, this study proposed DGA which can expect relatively high accuracy even though data with a less data set is applied to machine learning based genetic algorithm to deep learning based dropout. The idea of this paper is to determine the active state of the nodes. Using Gradient about loss function, A new fitness function is defined. Proposed Algorithm DGA is supplementing stochastic inconsistency about Dropout. Also DGA solved problem by the complexity of the fitness function and expression range of the model about Genetic Algorithm As a result of experiments using MNIST data proposed algorithm accuracy is 75.3%. Using only Dropout algorithm accuracy is 41.4%. It is shown that DGA is better than using only dropout.

Randomized Bagging for Bankruptcy Prediction (랜덤화 배깅을 이용한 재무 부실화 예측)

  • Min, Sung-Hwan
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.153-166
    • /
    • 2016
  • Ensemble classification is an approach that combines individually trained classifiers in order to improve prediction accuracy over individual classifiers. Ensemble techniques have been shown to be very effective in improving the generalization ability of the classifier. But base classifiers need to be as accurate and diverse as possible in order to enhance the generalization abilities of an ensemble model. Bagging is one of the most popular ensemble methods. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. In this study we proposed a new bagging variant ensemble model, Randomized Bagging (RBagging) for improving the standard bagging ensemble model. The proposed model was applied to the bankruptcy prediction problem using a real data set and the results were compared with those of the other models. The experimental results showed that the proposed model outperformed the standard bagging model.

On the Exponentiated Generalized Modified Weibull Distribution

  • Aryal, Gokarna;Elbatal, Ibrahim
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.333-348
    • /
    • 2015
  • In this paper, we study a generalization of the modified Weibull distribution. The generalization follows the recent work of Cordeiro et al. (2013) and is based on a class of exponentiated generalized distributions that can be interpreted as a double construction of Lehmann. We introduce a class of exponentiated generalized modified Weibull (EGMW) distribution and provide a list of some well-known distributions embedded within the proposed distribution. We derive some mathematical properties of this class that include ordinary moments, generating function and order statistics. We propose a maximum likelihood method to estimate model parameters and provide simulation results to assess the model performance. Real data is used to illustrate the usefulness of the proposed distribution for modeling reliability data.

Prediction of Remaining Useful Life of Lithium-ion Battery based on Multi-kernel Support Vector Machine with Particle Swarm Optimization

  • Gao, Dong;Huang, Miaohua
    • Journal of Power Electronics
    • /
    • v.17 no.5
    • /
    • pp.1288-1297
    • /
    • 2017
  • The estimation of the remaining useful life (RUL) of lithium-ion (Li-ion) batteries is important for intelligent battery management system (BMS). Data mining technology is becoming increasingly mature, and the RUL estimation of Li-ion batteries based on data-driven prognostics is more accurate with the arrival of the era of big data. However, the support vector machine (SVM), which is applied to predict the RUL of Li-ion batteries, uses the traditional single-radial basis kernel function. This type of classifier has weak generalization ability, and it easily shows the problem of data migration, which results in inaccurate prediction of the RUL of Li-ion batteries. In this study, a novel multi-kernel SVM (MSVM) based on polynomial kernel and radial basis kernel function is proposed. Moreover, the particle swarm optimization algorithm is used to search the kernel parameters, penalty factor, and weight coefficient of the MSVM model. Finally, this paper utilizes the NASA battery dataset to form the observed data sequence for regression prediction. Results show that the improved algorithm not only has better prediction accuracy and stronger generalization ability but also decreases training time and computational complexity.

The Selection Methodology of Road Network Data for Generalization of Digital Topographic Map (수치지형도 일반화를 위한 도로 네트워크 데이터의 선택 기법 연구)

  • Park, Woo Jin;Lee, Young Min;Yu, Ki Yun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.3
    • /
    • pp.229-238
    • /
    • 2013
  • Development of methodologies to generate the small scale map from the large scale map using map generalization has huge importance in management of the digital topographic map, such as producing and updating maps. In this study, the selection methodology of map generalization for the road network data in digital topographic map is investigated and evaluated. The existing maps with 1:5,000 and 1:25,000 scales are compared and the criteria for selection of the road network data, which are the number of objects and the relative importance of road network, are analyzed by using the T$\ddot{o}$pfer's radical law and Logit model. The selection model derived from the analysis result is applied to the test data, and the road network data of 1:18,000 and 1:72,000 scales from the digital topographic map of 1:5,000 scale are generated. The generalized results showed that the road objects with relatively high importance are selected appropriately according to the target scale levels after the qualitative and quantitative evaluations.

Modified n-Level Skip-Lot Sampling Inspection Plans

  • Cho, Gyo-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.811-818
    • /
    • 2008
  • This paper is the generalization of the modified two-level skip-lot sampling plan(MTSkSP2) to n-level. The general formulas of the operating characteristic(OC) function, average sample number(ASN) and average outgoing quality(AOQ) for the plan are derived using Markov chain properties.

  • PDF

Properties of Extended Gamma Distribution

  • Lee, In-Suk;Kim, Sang-Moon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.753-758
    • /
    • 2004
  • A generalization of gamma distribution is defined by slightly modifying the form of Kobayashi's generalized gamma function(1991). We define a new extended gamma distribution and study some properties of this distribution.

  • PDF

The Study on Simplification in Digital Map Generalization (수치지도 일반화에 있어서 단순화에 관한 연구)

  • 최병길
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.19 no.2
    • /
    • pp.199-208
    • /
    • 2001
  • The digital map in Korea has been producted and utilized independently according to scales such as 1:1,000, 1:5,000, and 1:25,000. Therefore, whenever we need to obtain the spatial data of other scales, we have to product the digital maps over and over again which it is time-consuming and ineconomic. To solve these problems, it has been accomplished many researches on map generalization to make digital maps in small scale from the master data of large scale. This paper aims to analyze the conversion characteristics of the large scale to the small scale by simplification of map generalization. For this purpose, it is proposed the algorithm for the simplification process of digital map and it is investigated the simplification characteristic of digital map through the experiment on the conversion of 1:5,000 scale into 1:25.000 scale. The results show that Area-Preservation algorithm indicates the good agreement with the original data in terms of the area and features of building layer compared to Douglas-Peucker algorithm and Reumann-Witkam algorithm.

  • PDF

The Design of Optimal Fuzzy-Neural networks Structure by Means of GA and an Aggregate Weighted Performance Index (유전자 알고리즘과 합성 성능지수에 의한 최적 퍼지-뉴럴 네트워크 구조의 설계)

  • Oh, Sung-Kwun;Yoon, Ki-Chan;Kim, Hyun-Ki
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.6 no.3
    • /
    • pp.273-283
    • /
    • 2000
  • In this paper we suggest an optimal design method of Fuzzy-Neural Networks(FNN) model for complex and nonlinear systems. The FNNs use the simplified inference as fuzzy inference method and Error Back Propagation Algorithm as learning rule. And we use a HCM(Hard C-Means) Clustering Algorithm to find initial parameters of the membership function. The parameters such as parameters of membership functions learning rates and momentum weighted value is proposed to achieve a sound balance between approximation and generalization abilities of the model. According to selection and adjustment of a weighting factor of an aggregate objective function which depends on the number of data and a certain degree of nonlinearity (distribution of I/O data we show that it is available and effective to design and optimal FNN model structure with a mutual balance and dependency between approximation and generalization abilities. This methodology sheds light on the role and impact of different parameters of the model on its performance (especially the mapping and predicting capabilities of the rule based computing). To evaluate the performance of the proposed model we use the time series data for gas furnace the data of sewage treatment process and traffic route choice process.

  • PDF

Area-wise relational knowledge distillation

  • Sungchul Cho;Sangje Park;Changwon Lim
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.501-516
    • /
    • 2023
  • Knowledge distillation (KD) refers to extracting knowledge from a large and complex model (teacher) and transferring it to a relatively small model (student). This can be done by training the teacher model to obtain the activation function values of the hidden or the output layers and then retraining the student model using the same training data with the obtained values. Recently, relational KD (RKD) has been proposed to extract knowledge about relative differences in training data. This method improved the performance of the student model compared to conventional KDs. In this paper, we propose a new method for RKD by introducing a new loss function for RKD. The proposed loss function is defined using the area difference between the teacher model and the student model in a specific hidden layer, and it is shown that the model can be successfully compressed, and the generalization performance of the model can be improved. We demonstrate that the accuracy of the model applying the method proposed in the study of model compression of audio data is up to 1.8% higher than that of the existing method. For the study of model generalization, we demonstrate that the model has up to 0.5% better performance in accuracy when introducing the RKD method to self-KD using image data.