• Title/Summary/Keyword: optimal classification method

Search Result 368, Processing Time 0.022 seconds

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

Investigation of Indicator Kriging for Evaluating Proper Rock Mass Classification based on Electrical Resistivity and RMR Correlation Analysis (RMR과 전기비저항의 상관성 해석에 기초하여 지시크리깅을 적용한 최적 암반 분류 기법 고찰)

  • Lee, Kyung-Ju;Ha, Hee-Sang;Ko, Kwang-Buem;Kim, Ji-Soo
    • Tunnel and Underground Space
    • /
    • v.19 no.5
    • /
    • pp.407-420
    • /
    • 2009
  • In this study geostatistical technique using indicator kriging was performed to evaluate the optimal rock mass classification by integrating the various geophysical information such as borehole data and geophysical data. To get the optimal kriging result, it is necessary to devise the suitable technique to integrate the hard (borehole) and soft (geophysical) data effectively. Also, the model parameters of the variogram must be determined as a priori procedure. Iterative non-linear inversion method was implemented to determine the model parameters of theoretical variogram. To verify the algorithm, behaviour of object function and precision of convergence were investigated, revealing that gradient of the range is extremely small. This algorithm for the field data was applied to a mountainous area planned for a large-scale tunneling construction. As for a soft data, resistivity information from AMT survey is incorporated with RMR information from borehole data, a sort of hard data. Finally, RMR profiles were constructed and attempted to be interpreted at the tunnel elevation and the upper 1D level.

Feature Selection Using Submodular Approach for Financial Big Data

  • Attigeri, Girija;Manohara Pai, M.M.;Pai, Radhika M.
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1306-1325
    • /
    • 2019
  • As the world is moving towards digitization, data is generated from various sources at a faster rate. It is getting humungous and is termed as big data. The financial sector is one domain which needs to leverage the big data being generated to identify financial risks, fraudulent activities, and so on. The design of predictive models for such financial big data is imperative for maintaining the health of the country's economics. Financial data has many features such as transaction history, repayment data, purchase data, investment data, and so on. The main problem in predictive algorithm is finding the right subset of representative features from which the predictive model can be constructed for a particular task. This paper proposes a correlation-based method using submodular optimization for selecting the optimum number of features and thereby, reducing the dimensions of the data for faster and better prediction. The important proposition is that the optimal feature subset should contain features having high correlation with the class label, but should not correlate with each other in the subset. Experiments are conducted to understand the effect of the various subsets on different classification algorithms for loan data. The IBM Bluemix BigData platform is used for experimentation along with the Spark notebook. The results indicate that the proposed approach achieves considerable accuracy with optimal subsets in significantly less execution time. The algorithm is also compared with the existing feature selection and extraction algorithms.

Fuzzy Learning Method Using Genetic Algorithms

  • Choi, Sangho;Cho, Kyung-Dal;Park, Sa-Joon;Lee, Malrey;Kim, Kitae
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.6
    • /
    • pp.841-850
    • /
    • 2004
  • This paper proposes a GA and GDM-based method for removing unnecessary rules and generating relevant rules from the fuzzy rules corresponding to several fuzzy partitions. The aim of proposed method is to find a minimum set of fuzzy rules that can correctly classify all the training patterns. When the fine fuzzy partition is used with conventional methods, the number of fuzzy rules has been enormous and the performance of fuzzy inference system became low. This paper presents the application of GA as a means of finding optimal solutions over fuzzy partitions. In each rule, the antecedent part is made up the membership functions of a fuzzy set, and the consequent part is made up of a real number. The membership functions and the number of fuzzy inference rules are tuned by means of the GA, while the real numbers in the consequent parts of the rules are tuned by means of the gradient descent method. It is shown that the proposed method has improved than the performance of conventional method in formulating and solving a combinatorial optimization problem that has two objectives: to maximize the number of correctly classified patterns and to minimize the number of fuzzy rules.

  • PDF

Damage Detection and Classification System for Sewer Inspection using Convolutional Neural Networks based on Deep Learning (CNN을 이용한 딥러닝 기반 하수관 손상 탐지 분류 시스템)

  • Hassan, Syed Ibrahim;Dang, Lien-Minh;Im, Su-hyeon;Min, Kyung-bok;Nam, Jun-young;Moon, Hyeon-joon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.3
    • /
    • pp.451-457
    • /
    • 2018
  • We propose an automatic detection and classification system of sewer damage database based on artificial intelligence and deep learning. In order to optimize the performance, we implemented a robust system against various environmental variations such as illumination and shadow changes. In our proposed system, a crack detection and damage classification method using a deep learning based Convolutional Neural Network (CNN) is implemented. For optimal results, 9,941 CCTV images with $256{\times}256$ pixel resolution were used for machine learning on the damaged area based on the CNN model. As a result, the recognition rate of 98.76% was obtained. Total of 646 images of $720{\times}480$ pixel resolution were extracted from various sewage DB for performance evaluation. Proposed system presents the optimal recognition rate for the automatic detection and classification of damage in the sewer DB constructed in various environments.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Analysis and Detection Method for Line-shaped Echoes using Support Vector Machine (Support Vector Machine을 이용한 선에코 특성 분석 및 탐지 방법)

  • Lee, Hansoo;Kim, Eun Kyeong;Kim, Sungshin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.665-670
    • /
    • 2014
  • A SVM is a kind of binary classifier in order to find optimal hyperplane which separates training data into two groups. Due to its remarkable performance, the SVM is applied in various fields such as inductive inference, binary classification or making predictions. Also it is a representative black box model; there are plenty of actively discussed researches about analyzing trained SVM classifier. This paper conducts a study on a method that is automatically detecting the line-shaped echoes, sun strobe echo and radial interference echo, using the SVM algorithm because the line-shaped echoes appear relatively often and disturb weather forecasting process. Using a spatial clustering method and corrected reflectivity data in the weather radar, the training data is made up with mean reflectivity, size, appearance, centroid altitude and so forth. With actual occurrence cases of the line-shaped echoes, the trained SVM classifier is verified, and analyzed its characteristics using the decision tree method.

Bearing Faults Identification of an Induction Motor using Acoustic Emission Signals and Histogram Modeling (음향 방출 신호와 히스토그램 모델링을 이용한 유도전동기의 베어링 결함 검출)

  • Jang, Won-Chul;Seo, Jun-Sang;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.11
    • /
    • pp.17-24
    • /
    • 2014
  • This paper proposes a fault detection method for low-speed rolling element bearings of an induction motor using acoustic emission signals and histogram modeling. The proposed method performs envelop modeling of the histogram of normalized fault signals. It then extracts and selects significant features of each fault using partial autocorrelation coefficients and distance evaluation technique, respectively. Finally, using the extracted features as inputs, the support vector regression (SVR) classifies bearing's inner, outer, and roller faults. To obtain optimal classification performance, we evaluate the proposed method with varying an adjustable parameter of the Gaussian radial basis function of SVR from 0.01 to 1.0 and the number of features from 2 to 150. Experimental results show that the proposed fault identification method using 0.64-0.65 of the adjustable parameter and 75 features achieves 91% in classification performance and outperforms conventional fault diagnosis methods as well.

Fruit Fly Optimization based EEG Channel Selection Method for BCI (BCI 시스템을 위한 Fruit Fly Optimization 알고리즘 기반 최적의 EEG 채널 선택 기법)

  • Yu, Xin-Yang;Yu, Je-Hun;Sim, Kwee-Bo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.3
    • /
    • pp.199-203
    • /
    • 2016
  • A brain-computer interface or BCI provides an alternative method for acting on the world. Brain signals can be recorded from the electrical activity along the scalp using an electrode cap. By analyzing the EEG, it is possible to determine whether a person is thinking about his/her hand or foot movement and this information can be transferred to a machine and then translated into commands. However, we do not know which information relates to motor imagery and which channel is good for extracting features. A general approach is to use all electronic channels to analyze the EEG signals, but this causes many problems, such as overfitting and problems removing noisy and artificial signals. To overcome these problems, in this paper we used a new optimization method called the Fruit Fly optimization algorithm (FOA) to select the best channels and then combine them with CSP method to extract features to improve the classification accuracy by linear discriminant analysis. We also used particle swarm optimization (PSO) and a genetic algorithm (GA) to select the optimal EEG channel and compared the performance with that of the FOA algorithm. The results show that for some subjects, the FOA algorithm is a better method for selecting the optimal EEG channel in a short time.

Principal Discriminant Variate (PDV) Method for Classification of Multicollinear Data: Application to Diagnosis of Mastitic Cows Using Near-Infrared Spectra of Plasma Samples

  • Jiang, Jian-Hui;Tsenkova, Roumiana;Yu, Ru-Qin;Ozaki, Yukihiro
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1244-1244
    • /
    • 2001
  • In linear discriminant analysis there are two important properties concerning the effectiveness of discriminant function modeling. The first is the separability of the discriminant function for different classes. The separability reaches its optimum by maximizing the ratio of between-class to within-class variance. The second is the stability of the discriminant function against noises present in the measurement variables. One can optimize the stability by exploring the discriminant variates in a principal variation subspace, i. e., the directions that account for a majority of the total variation of the data. An unstable discriminant function will exhibit inflated variance in the prediction of future unclassified objects, exposed to a significantly increased risk of erroneous prediction. Therefore, an ideal discriminant function should not only separate different classes with a minimum misclassification rate for the training set, but also possess a good stability such that the prediction variance for unclassified objects can be as small as possible. In other words, an optimal classifier should find a balance between the separability and the stability. This is of special significance for multivariate spectroscopy-based classification where multicollinearity always leads to discriminant directions located in low-spread subspaces. A new regularized discriminant analysis technique, the principal discriminant variate (PDV) method, has been developed for handling effectively multicollinear data commonly encountered in multivariate spectroscopy-based classification. The motivation behind this method is to seek a sequence of discriminant directions that not only optimize the separability between different classes, but also account for a maximized variation present in the data. Three different formulations for the PDV methods are suggested, and an effective computing procedure is proposed for a PDV method. Near-infrared (NIR) spectra of blood plasma samples from mastitic and healthy cows have been used to evaluate the behavior of the PDV method in comparison with principal component analysis (PCA), discriminant partial least squares (DPLS), soft independent modeling of class analogies (SIMCA) and Fisher linear discriminant analysis (FLDA). Results obtained demonstrate that the PDV method exhibits improved stability in prediction without significant loss of separability. The NIR spectra of blood plasma samples from mastitic and healthy cows are clearly discriminated between by the PDV method. Moreover, the proposed method provides superior performance to PCA, DPLS, SIMCA and FLDA, indicating that PDV is a promising tool in discriminant analysis of spectra-characterized samples with only small compositional difference, thereby providing a useful means for spectroscopy-based clinic applications.

  • PDF