• Title/Summary/Keyword: Feature Dimension Reduction

Search Result 106, Processing Time 0.02 seconds

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Operation diagnostic based on PCA for wastewater treatment (PCA를 이용한 하폐수처리시설 운전상태진단)

  • Jun Byong-Hee;Park Jang-Hwan;Chun Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.3
    • /
    • pp.383-388
    • /
    • 2006
  • SBR is one of the most general sewage/wastewater treatment processes and, particularly, has an advantage in high concentration wastewater treatment like sewage wastewater. A Kernel PCA based fault diagnosis system for biological reaction in full-scale wastewater treatment plant was proposed using only common bio-chemical sensors such as ORP(Oxidation-Reduction Potential) and DO(Dissolved Oxygen). During the SBR operation, the operation status could be divided into normal status and abnormal status such as controller malfunction, influent disturbance and instrumental trouble. For the classification and diagnosis of these statuses, a series of preprocessing, dimension reduction using PCA, LDA, K-PCA and feature reduction was performed. Also, the diagnosis result using differential data was superior to that of raw data, and the fusion data show better results than other data. Also, the results of combination of K-PCA and LDA were better than those of LDA or (PCA+LDA). Finally, the fault recognition rate in case of using only ORP or DO was around maximum 97.03% and the fusion method showed better result of maximum 98.02%.

Prediction of Implicit Protein - Protein Interaction Using Optimal Associative Feature Rule (최적 연관 속성 규칙을 이용한 비명시적 단백질 상호작용의 예측)

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2006
  • Proteins are known to perform a biological function by interacting with other proteins or compounds. Since protein interaction is intrinsic to most cellular processes, prediction of protein interaction is an important issue in post-genomic biology where abundant interaction data have been produced by many research groups. In this paper, we present an associative feature mining method to predict implicit protein-protein interactions of Saccharomyces cerevisiae from public protein interaction data. We discretized continuous-valued features by maximal interdependence-based discretization approach. We also employed feature dimension reduction filter (FDRF) method which is based on the information theory to select optimal informative features, to boost prediction accuracy and overall mining speed, and to overcome the dimensionality problem of conventional data mining approaches. We used association rule discovery algorithm for associative feature and rule mining to predict protein interaction. Using the discovered associative feature we predicted implicit protein interactions which have not been observed in training data. According to the experimental results, the proposed method accomplished about 96.5% prediction accuracy with reduced computation time which is about 29.4% faster than conventional method with no feature filter in association rule mining.

Design of Robust Face Recognition System with Illumination Variation Realized with the Aid of CT Preprocessing Method (CT 전처리 기법을 이용하여 조명변화에 강인한 얼굴인식 시스템 설계)

  • Jin, Yong-Tak;Oh, Sung-Kwun;Kim, Hyun-Ki
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.1
    • /
    • pp.91-96
    • /
    • 2015
  • In this study, we introduce robust face recognition system with illumination variation realized with the aid of CT preprocessing method. As preprocessing algorithm, Census Transform(CT) algorithm is used to extract locally facial features under unilluminated condition. The dimension reduction of the preprocessed data is carried out by using $(2D)^2$PCA which is the extended type of PCA. Feature data extracted through dimension algorithm is used as the inputs of proposed radial basis function neural networks. The hidden layer of the radial basis function neural networks(RBFNN) is built up by fuzzy c-means(FCM) clustering algorithm and the connection weights of the networks are described as the coefficients of linear polynomial function. The essential design parameters (including the number of inputs and fuzzification coefficient) of the proposed networks are optimized by means of artificial bee colony(ABC) algorithm. This study is experimented with both Yale Face database B and CMU PIE database to evaluate the performance of the proposed system.

Integrating Discrete Wavelet Transform and Neural Networks for Prostate Cancer Detection Using Proteomic Data

  • Hwang, Grace J.;Huang, Chuan-Ching;Chen, Ta Jen;Yue, Jack C.;Ivan Chang, Yuan-Chin;Adam, Bao-Ling
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.319-324
    • /
    • 2005
  • An integrated approach for prostate cancer detection using proteomic data is presented. Due to the high-dimensional feature of proteomic data, the discrete wavelet transform (DWT) is used in the first-stage for data reduction as well as noise removal. After the process of DWT, the dimensionality is reduced from 43,556 to 1,599. Thus, each sample of proteomic data can be represented by 1599 wavelet coefficients. In the second stage, a voting method is used to select a common set of wavelet coefficients for all samples together. This produces a 987-dimension subspace of wavelet coefficients. In the third stage, the Autoassociator algorithm reduces the dimensionality from 987 to 400. Finally, the artificial neural network (ANN) is applied on the 400-dimension space for prostate cancer detection. The integrated approach is examined on 9 categories of 2-class experiments, and also 3- and 4-class experiments. All of the experiments were run 10 times of ten-fold cross-validation (i. e. 10 partitions with 100 runs). For 9 categories of 2-class experiments, the average testing accuracies are between 81% and 96%, and the average testing accuracies of 3- and 4-way classifications are 85% and 84%, respectively. The integrated approach achieves exciting results for the early detection and diagnosis of prostate cancer.

  • PDF

Parameter Considering Variance Property for Speech Recognition in Noisy Environment (잡음환경에서의 음성인식을 위한 변이특성을 고려한 파라메터)

  • Park, Jin-Young;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.469-472
    • /
    • 2005
  • This paper propose about effective speech feature parameter that have robust character in effect of noise in realizing speech recognition system. Established MFCC that is the basic parameter used to ASR(Automatic Speech Recognition) and DCTCs that use DCT in basic parameter. Also, proposed delta-Cepstrum and delta-delta-Cepstrum parameter that reconstruct Cepstrum to have information for variation of speech. And compared recognition performance in using HMM. For dimension reduction of each parameter LDA algorithm apply and compared recognition. Results are presented reduced dimension delta-delta-Cepstrum parameter in using LDA recognition performance that improve more than existent parameter in noise environment of various condition.

  • PDF

Dynamic RNN-CNN malware classifier correspond with Random Dimension Input Data (임의 차원 데이터 대응 Dynamic RNN-CNN 멀웨어 분류기)

  • Lim, Geun-Young;Cho, Young-Bok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.5
    • /
    • pp.533-539
    • /
    • 2019
  • This study proposes a malware classification model that can handle arbitrary length input data using the Microsoft Malware Classification Challenge dataset. We are based on imaging existing data from malware. The proposed model generates a lot of images when malware data is large, and generates a small image of small data. The generated image is learned as time series data by Dynamic RNN. The output value of the RNN is classified into malware by using only the highest weighted output by applying the Attention technique, and learning the RNN output value by Residual CNN again. Experiments on the proposed model showed a Micro-average F1 score of 92% in the validation data set. Experimental results show that the performance of a model capable of learning and classifying arbitrary length data can be verified without special feature extraction and dimension reduction.

Design and Experiment of a Miniature 4/3-Way Proportional Valve for a Servo-Pneumatic Robot Hand (공압 구동식 로봇 손을 위한 소형 4/3-way 비례제어 밸브의 설계 및 실험)

  • 류시복;홍예선
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.15 no.12
    • /
    • pp.142-147
    • /
    • 1998
  • Developing robot hands with multi-degree-of-freedom is one of the topics that researchers have recently begun to improve the limitation by adding flexibility and dexterity. In this study, an articulated servo-pneumatic robot hand system with direct-drive joints has been developed whose main feature is the minimization of the dimension. The servo-pneumatic system is advantageous to fabricate a dexterous robot hand system due to the high torque-to-weight and torque-to-volume ratio. This enables the design of a finger joint with an integrated rotary vane type actuator which produces high output torque without reduction gears, being very robust. In order to control the servo-pneumatic finger joints, a miniature proportional valve that can be attached to the robot hand is required. In this paper, a flapper nozzle type 4/3-way proportional directional valve has been designed and tested. The experimental results show that the developed valve can control a finger joint satisfactorily without much vibratory joint movements and acoustic noises.

  • PDF

A Study on the Wear Condition Diagnosis of Grinding Wheel in Micro Drill-bit Grinding System (마이크로 드릴비트 연마 시스템 연삭휠의 마모 진단 연구)

  • Kim, Min-Seop;Hur, Jang-Wook
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.21 no.3
    • /
    • pp.77-85
    • /
    • 2022
  • In this study, to diagnose the grinding state of a micro drill bit, a sensor attachment location was selected through random vibration analysis of the grinding unit of the micro drill-bit grinding system. In addition, the vibration data generated during the drill bit grinding were collected from the grinding unit for the grinding wheels under the steady and worn conditions, and data feature extraction and dimension reduction were performed. The wear of the micro-drill-bit grinding wheel was diagnosed by applying KNN, a machine-learning algorithm. The classification model showed excellent performance, with an accuracy of 99.2%. The precision, recall and f1-score were higher than 99% in both the steady and wear conditions.

Feature Extraction and Classification of High Dimensional Biomedical Spectral Data (고차원을 갖는 생체 스펙트럼 데이터의 특징추출 및 분류기법)

  • Cho, Jae-Hoon;Park, Jin-Il;Lee, Dae-Jong;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.3
    • /
    • pp.297-303
    • /
    • 2009
  • In this paper, we propose the biomedical spectral pattern classification techniques by the fusion scheme based on the SpPCA and MLP in extended feature space. A conventional PCA technique for the dimension reduction has the problem that it can't find an optimal transformation matrix if the property of input data is nonlinear. To overcome this drawback, we extract features by the SpPCA technique in extended space which use the local patterns rather than whole patterns. In the classification step, individual classifier based on MLP calculates the similarity of each class for local features. Finally, biomedical spectral patterns is classified by the fusion scheme to effectively combine the individual information. As the simulation results to verify the effectiveness, the proposed method showed more improved classification results than conventional methods.