• Title/Summary/Keyword: Data feature analysis

Search Result 1,397, Processing Time 0.029 seconds

Datawise Discriminant Analysis For Feature Extraction (자료별 분류분석(DDA)에 의한 특징추출)

  • Park, Myoung-Soo;Choi, Jin-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.1
    • /
    • pp.90-95
    • /
    • 2009
  • This paper presents a new feature extraction algorithm which can deal with the problems of linear discriminant analysis, widely used for linear dimensionality reduction. The scatter matrices included in linear discriminant analysis are defined by the distances between each datum and its class mean, and those between class means and mean of whole data. Use of these scatter matrices can cause computational problems and the limitation on the number of features. In addition, these definition assumes that the data distribution is unimodal and normal, for the cases not satisfying this assumption the appropriate features are not achieved. In this paper we define a new scatter matrix which is based on the differently weighted distances between individual data, and presents a feature extraction algorithm using this scatter matrix. With this new method. the mentioned problems of linear discriminant analysis can be avoided, and the features appropriate for discriminating data can be achieved. The performance of this new method is shown by experiments.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Terrain Feature Extraction and Classification using Contact Sensor Data (접촉식 센서 데이터를 이용한 지질 특성 추출 및 지질 분류)

  • Park, Byoung-Gon;Kim, Ja-Young;Lee, Ji-Hong
    • The Journal of Korea Robotics Society
    • /
    • v.7 no.3
    • /
    • pp.171-181
    • /
    • 2012
  • Outdoor mobile robots are faced with various terrain types having different characteristics. To run safely and carry out the mission, mobile robot should recognize terrain types, physical and geometric characteristics and so on. It is essential to control appropriate motion for each terrain characteristics. One way to determine the terrain types is to use non-contact sensor data such as vision and laser sensor. Another way is to use contact sensor data such as slope of body, vibration and current of motor that are reaction data from the ground to the tire. In this paper, we presented experimental results on terrain classification using contact sensor data. We made a mobile robot for collecting contact sensor data and collected data from four terrains we chose for experimental terrains. Through analysis of the collecting data, we suggested a new method of terrain feature extraction considering physical characteristics and confirmed that the proposed method can classify the four terrains that we chose for experimental terrains. We can also be confirmed that terrain feature extraction method using Fast Fourier Transform (FFT) typically used in previous studies and the proposed method have similar classification performance through back propagation learning algorithm. However, both methods differ in the amount of data including terrain feature information. So we defined an index determined by the amount of terrain feature information and classification error rate. And the index can evaluate classification efficiency. We compared the results of each method through the index. The comparison showed that our method is more efficient than the existing method.

Elongated Radial Basis Function for Nonlinear Representation of Face Data

  • Kim, Sang-Ki;Yu, Sun-Jin;Lee, Sang-Youn
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.7C
    • /
    • pp.428-434
    • /
    • 2011
  • Recently, subspace analysis has raised its performance to a higher level through the adoption of kernel-based nonlinearity. Especially, the radial basis function, based on its nonparametric nature, has shown promising results in face recognition. However, due to the endemic small sample size problem of face data, the conventional kernel-based feature extraction methods have difficulty in data representation. In this paper, we introduce a novel variant of the RBF kernel to alleviate this problem. By adopting the concept of the nearest feature line classifier, we show both effectiveness and generalizability of the proposed method, particularly regarding the small sample size issue.

Feature Extraction on High Dimensional Data Using Incremental PCA (점진적인 주성분분석기법을 이용한 고차원 자료의 특징 추출)

  • Kim Byung-Joo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.7
    • /
    • pp.1475-1479
    • /
    • 2004
  • High dimensional data requires efficient feature extraction techliques. Though PCA(Principal Component Analysis) is a famous feature extraction method it requires huge memory space and computational cost is high. In this paper we use incremental PCA for feature extraction on high dimensional data. Through experiment we show that proposed method is superior to APEX model.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Stacked Autoencoder Based Malware Feature Refinement Technology Research (Stacked Autoencoder 기반 악성코드 Feature 정제 기술 연구)

  • Kim, Hong-bi;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.593-603
    • /
    • 2020
  • The advent of malicious code has increased exponentially due to the spread of malicious code generation tools in accordance with the development of the network, but there is a limit to the response through existing malicious code detection methods. According to this situation, a machine learning-based malicious code detection method is evolving, and in this paper, the feature of data is extracted from the PE header for machine-learning-based malicious code detection, and then it is used to automate the malware through autoencoder. Research on how to extract the indicated features and feature importance. In this paper, 549 features composed of information such as DLL/API that can be identified from PE files that are commonly used in malware analysis are extracted, and autoencoder is used through the extracted features to improve the performance of malware detection in machine learning. It was proved to be successful in providing excellent accuracy and reducing the processing time by 2 times by effectively extracting the features of the data by compressively storing the data. The test results have been shown to be useful for classifying malware groups, and in the future, a classifier such as SVM will be introduced to continue research for more accurate malware detection.

The Extraction of End-Pixels in Feature Space for Remote Sensing Data and Its Applications

  • YUAN Lu;SUN Wei-dong
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.136-139
    • /
    • 2004
  • The extraction of 'end-pixels' (i.e. end-members) aims to quantify the abundance of different materials in a single pixel, which becomes popular in the subpixel analysis for hyperspectral dataset. In this paper, we present a new concept called 'End-Pixel of Features (EPF)' to extends the concept of end-pixels for multispectral data and even panchromatic data. The algorithm combines the advantages of previous simplex and clustering methods to search the EPFs in the feature space and reduce the effects of noise. Some experimental results show that, the proposed methodology can be successfully used to hyperspectral data and other remote sensing data.

  • PDF

Comparisons of Linear Feature Extraction Methods (선형적 특징추출 방법의 특성 비교)

  • Oh, Sang-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.4
    • /
    • pp.121-130
    • /
    • 2009
  • In this paper, feature extraction methods, which is one field of reducing dimensions of high-dimensional data, are empirically investigated. We selected the traditional PCA(Principal Component Analysis), ICA(Independent Component Analysis), NMF(Non-negative Matrix Factorization), and sNMF(Sparse NMF) for comparisons. ICA has a similar feature with the simple cell of V1. NMF implemented a "parts-based representation in the brain" and sNMF is a improved version of NMF. In order to visually investigate the extracted features, handwritten digits are handled. Also, the extracted features are used to train multi-layer perceptrons for recognition test. The characteristic of each feature extraction method will be useful when applying feature extraction methods to many real-world problems.

Feature Impact Evaluation Based Pattern Classification System

  • Rhee, Hyun-Sook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.25-30
    • /
    • 2018
  • Pattern classification system is often an important component of intelligent systems. In this paper, we present a pattern classification system consisted of the feature selection module, knowledge base construction module and decision module. We introduce a feature impact evaluation selection method based on fuzzy cluster analysis considering computational approach and generalization capability of given data characteristics. A fuzzy neural network, OFUN-NET based on unsupervised learning data mining technique produces knowledge base for representative clusters. 240 blemish pattern images are prepared and applied to the proposed system. Experimental results show the feasibility of the proposed classification system as an automating defect inspection tool.