• Title/Summary/Keyword: Large-set Classification

Search Result 183, Processing Time 0.026 seconds

Performance comparison of SVM and neural networks for large-set classification problems (대용량 분류에서 SVM과 신경망의 성능 비교)

  • Lee Jin-Seon;Kim Young-Won;Oh Il-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.25-30
    • /
    • 2005
  • In this paper, we analyzed and compared the performances of modular FFMLP(feedforward multilayer perceptron) and SVUT(Support Vector Machine) for the large-set classification problems. Overall, SVM dominated modular FFMLP in the correct recognition rate and other aspects Additionally, the recognition rate of SVM degraded more slowly than neural network as the number of classes increases. The trend of the recognition rates depending on the rejection rate has been analyzed. The parameter set of SVM(kernel functions and related variables) has been identified for the large-set classification problems.

Support Vector Machine Classification Using Training Sets of Small Mixed Pixels: An Appropriateness Assessment of IKONOS Imagery

  • Yu, Byeong-Hyeok;Chi, Kwang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.24 no.5
    • /
    • pp.507-515
    • /
    • 2008
  • Many studies have generally used a large number of pure pixels as an approach to training set design. The training set are used, however, varies between classifiers. In the recent research, it was reported that small mixed pixels between classes are actually more useful than larger pure pixels of each class in Support Vector Machine (SVM) classification. We evaluated a usability of small mixed pixels as a training set for the classification of high-resolution satellite imagery. We presented an advanced approach to obtain a mixed pixel readily, and evaluated the appropriateness with the land cover classification from IKONOS satellite imagery. The results showed that the accuracy of the classification based on small mixed pixels is nearly identical to the accuracy of the classification based on large pure pixels. However, it also showed a limitation that small mixed pixels used may provide insufficient information to separate the classes. Small mixed pixels of the class border region provide cost-effective training sets, but its use with other pixels must be considered in use of high-resolution satellite imagery or relatively complex land cover situations.

Deep Learning for Pet Image Classification (애완동물 분류를 위한 딥러닝)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.151-152
    • /
    • 2019
  • In this paper, we propose an improved learning method based on a small data set for animal image classification. First, CNN creates a training model for a small data set and uses the data set to expand the data set of the training set Second, a bottleneck of a small data set is extracted using a pre-trained network for a large data set such as VGG16 and stored in two NumPy files as a new training data set and a test data set, finally, learn the fully connected network as a new data set.

  • PDF

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

  • Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

A Wavelet based Feature Selection Method to Improve Classification of Large Signal-type Data (웨이블릿에 기반한 시그널 형태를 지닌 대형 자료의 feature 추출 방법)

  • Jang, Woosung;Chang, Woojin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.2
    • /
    • pp.133-140
    • /
    • 2006
  • Large signal type data sets are difficult to classify, especially if the data sets are non-stationary. In this paper, large signal type and non-stationary data sets are wavelet transformed so that distinct features of the data are extracted in wavelet domain rather than time domain. For the classification of the data, a few wavelet coefficients representing class properties are employed for statistical classification methods : Linear Discriminant Analysis, Quadratic Discriminant Analysis, Neural Network etc. The application of our wavelet-based feature selection method to a mass spectrometry data set for ovarian cancer diagnosis resulted in 100% classification accuracy.

A Study on Development of Classification Indicators in Transportation Sector Energy Conservation DB (에너지절약 DB 구축을 위한 수송부문 분류지표 설정)

  • Lim, Ki Choo
    • Journal of Energy Engineering
    • /
    • v.25 no.3
    • /
    • pp.149-156
    • /
    • 2016
  • This paper surveyed and analyzed cases of DB development overseas to set the range of DB to be developed for analyzing energy-saving policies in the domestic transportation sector. The foregoing prerequisites were used to establish system for classification in the broad scale under which system for classification in detail indicators that suit one in the broader indicators was set based on analysis of domestic / overseas cases to determine DB development range in the transportation sector required to analysis domestic energy-saving policies. Accordingly, six items subject to the broadest classification were determined, i.e. energy consumption, energy basic unit, emissions of greenhouse gas, economic indicators, transportation volume / transportation records and basic automobile data. Large classification and sub-items determined by surveying expert opinions were set and proposed as DB classification indicators.

Rough Set-Based Approach for Automatic Emotion Classification of Music

  • Baniya, Babu Kaji;Lee, Joonwhoan
    • Journal of Information Processing Systems
    • /
    • v.13 no.2
    • /
    • pp.400-416
    • /
    • 2017
  • Music emotion is an important component in the field of music information retrieval and computational musicology. This paper proposes an approach for automatic emotion classification, based on rough set (RS) theory. In the proposed approach, four different sets of music features are extracted, representing dynamics, rhythm, spectral, and harmony. From the features, five different statistical parameters are considered as attributes, including up to the $4^{th}$ order central moments of each feature, and covariance components of mutual ones. The large number of attributes is controlled by RS-based approach, in which superfluous features are removed, to obtain indispensable ones. In addition, RS-based approach makes it possible to visualize which attributes play a significant role in the generated rules, and also determine the strength of each rule for classification. The experiments have been performed to find out which audio features and which of the different statistical parameters derived from them are important for emotion classification. Also, the resulting indispensable attributes and the usefulness of covariance components have been discussed. The overall classification accuracy with all statistical parameters has recorded comparatively better than currently existing methods on a pair of datasets.

A Data Mining Procedure for Unbalanced Binary Classification (불균형 이분 데이터 분류분석을 위한 데이터마이닝 절차)

  • Jung, Han-Na;Lee, Jeong-Hwa;Jun, Chi-Hyuck
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.36 no.1
    • /
    • pp.13-21
    • /
    • 2010
  • The prediction of contract cancellation of customers is essential in insurance companies but it is a difficult problem because the customer database is large and the target or cancelled customers are a small proportion of the database. This paper proposes a new data mining approach to the binary classification by handling a large-scale unbalanced data. Over-sampling, clustering, regularized logistic regression and boosting are also incorporated in the proposed approach. The proposed approach was applied to a real data set in the area of insurance and the results were compared with some other classification techniques.

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

A New Distributed Parallel Algorithm for Pattern Classification using Neural Network Model

  • Kim, Dae-Su;Baeg, Soon-Cheol
    • ETRI Journal
    • /
    • v.13 no.2
    • /
    • pp.34-41
    • /
    • 1991
  • In this paper, a new distributed parallel algorithm for pattern classification based upon Self-Organizing Neural Network(SONN)[10-12] is developed. This system works without any information about the number of clusters or cluster centers. The SONN model showed good performance for finding classification information, cluster centers, the number of salient clusters and membership information. It took a considerable amount of time in the sequential version if the input data set size is very large. Therefore, design of parallel algorithm is desirous. A new distributed parallel algorithm is developed and experimental results are presented.

  • PDF