• Title/Summary/Keyword: sampling and classification

Search Result 358, Processing Time 0.025 seconds

Supervised Classification Using Training Parameters and Prior Probability Generated from VITD - The Case of QuickBird Multispectral Imagery

  • Eo, Yang-Dam;Lee, Gyeong-Wook;Park, Doo-Youl;Park, Wang-Yong;Lee, Chang-No
    • Korean Journal of Remote Sensing
    • /
    • v.24 no.5
    • /
    • pp.517-524
    • /
    • 2008
  • In order to classify an satellite imagery into geospatial features of interest, the supervised classification needs to be trained to distinguish these features through training sampling. However, even though an imagery is classified, different results of classification could be generated according to operator's experience and expertise in training process. Users who practically exploit an classification result to their applications need the research accomplishment for the consistent result as well as the accuracy improvement. The experiment includes the classification results for training process used VITD polygons as a prior probability and training parameter, instead of manual sampling. As results, classification accuracy using VITD polygons as prior probabilities shows the highest results in several methods. The training using unsupervised classification with VITD have produced similar classification results as manual training and/or with prior probability.

Log-polar Sampling based Voxel Classification for Pulmonary Nodule Detection in Lung CT scans (흉부 CT 영상에서 폐 결절 검출을 위한 Log-polar Sampling기반 Voxel Classification 방법)

  • Choi, Wook-Jin;Choi, Tae-Sun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.6 no.1
    • /
    • pp.37-44
    • /
    • 2013
  • In this paper, we propose the pulmonary nodule detection system based on voxel classification. The proposed system consists of three main steps. In the first step, we segment lung volume. In the second step, the lung structures are initially segmented. In the last step, we classify the nodules using voxel classification. To describe characteristics of each voxel, we extract the log-polar sampling based features. Support Vector Machine is applied to the extracted features to classify into nodules and non-nodules.

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data (계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과)

  • 김지현;정종빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.445-457
    • /
    • 2004
  • Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

A Cost Effective Reference Data Sampling Algorithm Using Fractal Analysis

  • Lee, Byoung-Kil;Eo, Yang-Dam;Jeong, Jae-Joon;Kim, Yong-Il
    • ETRI Journal
    • /
    • v.23 no.3
    • /
    • pp.129-137
    • /
    • 2001
  • A random sampling or systematic sampling method is commonly used to assess the accuracy of classification results. In remote sensing, with these sampling methods, much time and tedious work are required to acquire sufficient ground truth data. So, a more effective sampling method that can represent the characteristics of the population is required. In this study, fractal analysis is adopted as an index for reference sampling. The fractal dimensions of the whole study area and the sub-regions are calculated to select sub-regions that have the most similar dimensionality to that of the whole area. Then the whole area's classification accuracy is compared with those of sub-regions, and it is verified that the accuracies of selected sub-regions are similar to that of whole area. A new kind of reference sampling method using the above procedure is proposed. The results show that it is possible to reduce sampling area and sample size, while keeping the same level of accuracy as the existing methods.

  • PDF

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

Fault Location and Classification of Combined Transmission System: Economical and Accurate Statistic Programming Framework

  • Tavalaei, Jalal;Habibuddin, Mohd Hafiz;Khairuddin, Azhar;Mohd Zin, Abdullah Asuhaimi
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.6
    • /
    • pp.2106-2117
    • /
    • 2017
  • An effective statistical feature extraction approach of data sampling of fault in the combined transmission system is presented in this paper. The proposed algorithm leads to high accuracy at minimum cost to predict fault location and fault type classification. This algorithm requires impedance measurement data from one end of the transmission line. Modal decomposition is used to extract positive sequence impedance. Then, the fault signal is decomposed by using discrete wavelet transform. Statistical sampling is used to extract appropriate fault features as benchmark of decomposed signal to train classifier. Support Vector Machine (SVM) is used to illustrate the performance of statistical sampling performance. The overall time of sampling is not exceeding 1 1/4 cycles, taking into account the interval time. The proposed method takes two steps of sampling. The first step takes 3/4 cycle of during-fault and the second step takes 1/4 cycle of post fault impedance. The interval time between the two steps is assumed to be 1/4 cycle. Extensive studies using MATLAB software show accurate fault location estimation and fault type classification of the proposed method. The classifier result is presented and compared with well-established travelling wave methods and the performance of the algorithms are analyzed and discussed.

A novel reliability analysis method based on Gaussian process classification for structures with discontinuous response

  • Zhang, Yibo;Sun, Zhili;Yan, Yutao;Yu, Zhenliang;Wang, Jian
    • Structural Engineering and Mechanics
    • /
    • v.75 no.6
    • /
    • pp.771-784
    • /
    • 2020
  • Reliability analysis techniques combining with various surrogate models have attracted increasing attention because of their accuracy and great efficiency. However, they primarily focus on the structures with continuous response, while very rare researches on the reliability analysis for structures with discontinuous response are carried out. Furthermore, existing adaptive reliability analysis methods based on importance sampling (IS) still have some intractable defects when dealing with small failure probability, and there is no related research on reliability analysis for structures involving discontinuous response and small failure probability. Therefore, this paper proposes a novel reliability analysis method called AGPC-IS for such structures, which combines adaptive Gaussian process classification (GPC) and adaptive-kernel-density-estimation-based IS. In AGPC-IS, an efficient adaptive strategy for design of experiments (DoE), taking into consideration the classification uncertainty, the sampling uniformity and the regional classification accuracy improvement, is developed with the purpose of improving the accuracy of Gaussian process classifier. The adaptive kernel density estimation is introduced for constructing the quasi-optimal density function of IS. In addition, a novel and more precise stopping criterion is also developed from the perspective of the stability of failure probability estimation. The efficiency, superiority and practicability of AGPC-IS are verified by three examples.

Support Vector Machine based on Stratified Sampling

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.2
    • /
    • pp.141-146
    • /
    • 2009
  • Support vector machine is a classification algorithm based on statistical learning theory. It has shown many results with good performances in the data mining fields. But there are some problems in the algorithm. One of the problems is its heavy computing cost. So we have been difficult to use the support vector machine in the dynamic and online systems. To overcome this problem we propose to use stratified sampling of statistical sampling theory. The usage of stratified sampling supports to reduce the size of training data. In our paper, though the size of data is small, the performance accuracy is maintained. We verify our improved performance by experimental results using data sets from UCI machine learning repository.

A Deep Learning Based Over-Sampling Scheme for Imbalanced Data Classification (불균형 데이터 분류를 위한 딥러닝 기반 오버샘플링 기법)

  • Son, Min Jae;Jung, Seung Won;Hwang, Een Jun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.7
    • /
    • pp.311-316
    • /
    • 2019
  • Classification problem is to predict the class to which an input data belongs. One of the most popular methods to do this is training a machine learning algorithm using the given dataset. In this case, the dataset should have a well-balanced class distribution for the best performance. However, when the dataset has an imbalanced class distribution, its classification performance could be very poor. To overcome this problem, we propose an over-sampling scheme that balances the number of data by using Conditional Generative Adversarial Networks (CGAN). CGAN is a generative model developed from Generative Adversarial Networks (GAN), which can learn data characteristics and generate data that is similar to real data. Therefore, CGAN can generate data of a class which has a small number of data so that the problem induced by imbalanced class distribution can be mitigated, and classification performance can be improved. Experiments using actual collected data show that the over-sampling technique using CGAN is effective and that it is superior to existing over-sampling techniques.

Deeper SSD: Simultaneous Up-sampling and Down-sampling for Drone Detection

  • Sun, Han;Geng, Wen;Shen, Jiaquan;Liu, Ningzhong;Liang, Dong;Zhou, Huiyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4795-4815
    • /
    • 2020
  • Drone detection can be considered as a specific sort of small object detection, which has always been a challenge because of its small size and few features. For improving the detection rate of drones, we design a Deeper SSD network, which uses large-scale input image and deeper convolutional network to obtain more features that benefit small object classification. At the same time, in order to improve object classification performance, we implemented the up-sampling modules to increase the number of features for the low-level feature map. In addition, in order to improve object location performance, we adopted the down-sampling modules so that the context information can be used by the high-level feature map directly. Our proposed Deeper SSD and its variants are successfully applied to the self-designed drone datasets. Our experiments demonstrate the effectiveness of the Deeper SSD and its variants, which are useful to small drone's detection and recognition. These proposed methods can also detect small and large objects simultaneously.