• Title/Summary/Keyword: Dimensionality Reduction

Search Result 201, Processing Time 0.032 seconds

An Assessment of a Random Forest Classifier for a Crop Classification Using Airborne Hyperspectral Imagery

  • Jeon, Woohyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.141-150
    • /
    • 2018
  • Crop type classification is essential for supporting agricultural decisions and resource monitoring. Remote sensing techniques, especially using hyperspectral imagery, have been effective in agricultural applications. Hyperspectral imagery acquires contiguous and narrow spectral bands in a wide range. However, large dimensionality results in unreliable estimates of classifiers and high computational burdens. Therefore, reducing the dimensionality of hyperspectral imagery is necessary. In this study, the Random Forest (RF) classifier was utilized for dimensionality reduction as well as classification purpose. RF is an ensemble-learning algorithm created based on the Classification and Regression Tree (CART), which has gained attention due to its high classification accuracy and fast processing speed. The RF performance for crop classification with airborne hyperspectral imagery was assessed. The study area was the cultivated area in Chogye-myeon, Habcheon-gun, Gyeongsangnam-do, South Korea, where the main crops are garlic, onion, and wheat. Parameter optimization was conducted to maximize the classification accuracy. Then, the dimensionality reduction was conducted based on RF variable importance. The result shows that using the selected bands presents an excellent classification accuracy without using whole datasets. Moreover, a majority of selected bands are concentrated on visible (VIS) region, especially region related to chlorophyll content. Therefore, it can be inferred that the phenological status after the mature stage influences red-edge spectral reflectance.

Analysis of Dimensionality Reduction Methods Through Epileptic EEG Feature Selection for Machine Learning in BCI (BCI에서 기계 학습을 위한 간질 뇌파 특징 선택을 통한 차원 감소 방법 분석)

  • Tong, Yang;Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1333-1342
    • /
    • 2018
  • Until now, Electroencephalography(: EEG) has been the most important and convenient method for the diagnosis and treatment of epilepsy. However, it is difficult to identify the wave characteristics of an epileptic EEG signals because it is very weak, non-stationary and has strong background noise. In this paper, we analyse the effect of dimensionality reduction methods on Epileptic EEG feature selection and classification. Three dimensionality reduction methods: Pincipal Component Analysis(: PCA), Kernel Principal Component Analysis(: KPCA) and Linear Discriminant Analysis(: LDA) were investigated. The performance of each method was evaluated by using Support Vector Machine SVM, Logistic Regression(: LR), K-Nearestneighbor(: K-NN), Decision Tree(: DR) and Random Forest(: RF). From the experimental result, PCA recorded 75% of highest accuracy in SVM, LR and K-NN. KPCA recorded 85% of best performance in SVM and K-KNN while LDA achieved 100% accuracy in K-NN. Thus, LDA dimensionality reduction is found to provide the best classification result for epileptic EEG signal.

Comparative Analysis of Dimensionality Reduction Techniques for Advanced Ransomware Detection with Machine Learning (기계학습 기반 랜섬웨어 공격 탐지를 위한 효과적인 특성 추출기법 비교분석)

  • Kim Han Seok;Lee Soo Jin
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.117-123
    • /
    • 2023
  • To detect advanced ransomware attacks with machine learning-based models, the classification model must train learning data with high-dimensional feature space. And in this case, a 'curse of dimension' phenomenon is likely to occur. Therefore, dimensionality reduction of features must be preceded in order to increase the accuracy of the learning model and improve the execution speed while avoiding the 'curse of dimension' phenomenon. In this paper, we conducted classification of ransomware by applying three machine learning models and two feature extraction techniques to two datasets with extremely different dimensions of feature space. As a result of the experiment, the feature dimensionality reduction techniques did not significantly affect the performance improvement in binary classification, and it was the same even when the dimension of featurespace was small in multi-class clasification. However, when the dataset had high-dimensional feature space, LDA(Linear Discriminant Analysis) showed quite excellent performance.

Identification of epistasis in ischemic stroke using multifactor dimensionality reduction and entropy decomposition

  • Park, Jung-Dae;Kim, Youn-Young;Lee, Chae-Young
    • BMB Reports
    • /
    • v.42 no.9
    • /
    • pp.617-622
    • /
    • 2009
  • We investigated the genetic associations of ischemic stroke by identifying epistasis of its heterogeneous subtypes such as small vessel occlusion (SVO) and large artery atherosclerosis (LAA). Epistasis was analyzed with 24 genes in 207 controls and 271 patients (SVO = 110, LAA = 95) using multifactor dimensionality reduction and entropy decomposition. The multifactor dimensionality reduction analysis with any of 1- to 4-locus models showed no significant association with LAA (P > 0.05). The analysis of SVO, however, revealed a significant association in the best 3-locus model with P10L of TGF-$\beta{1}$, C1013T of SPP1, and R485K of F5 (testing balanced accuracy = 63.17%, P < 0.05). Subsequent entropy analysis also revealed that such heterogeneity was present and quite a large entropy was estimated among the 3 loci for SVO (5.43%), but only a relatively small entropy was estimated for LAA (1.81%). This suggests that the synergistic epistasis model might contribute specifically to the pathogenetsis of SVO, which implies a different etiopathogenesis of the ischemic stroke subtypes.

Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload

  • Kakavand, Mohsen;Mustapha, Norwati;Mustapha, Aida;Abdullah, Mohd Taufik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3884-3910
    • /
    • 2016
  • Intrusion Detection System (IDS) in general considers a big amount of data that are highly redundant and irrelevant. This trait causes slow instruction, assessment procedures, high resource consumption and poor detection rate. Due to their expensive computational requirements during both training and detection, IDSs are mostly ineffective for real-time anomaly detection. This paper proposes a dimensionality reduction technique that is able to enhance the performance of IDSs up to constant time O(1) based on the Principle Component Analysis (PCA). Furthermore, the present study offers a feature selection approach for identifying major components in real time. The PCA algorithm transforms high-dimensional feature vectors into a low-dimensional feature space, which is used to determine the optimum volume of factors. The proposed approach was assessed using HTTP packet payload of ISCX 2012 IDS and DARPA 1999 dataset. The experimental outcome demonstrated that our proposed anomaly detection achieved promising results with 97% detection rate with 1.2% false positive rate for ISCX 2012 dataset and 100% detection rate with 0.06% false positive rate for DARPA 1999 dataset. Our proposed anomaly detection also achieved comparable performance in terms of computational complexity when compared to three state-of-the-art anomaly detection systems.

Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method

  • Lee, Seungyeoun;Son, Donghee;Yu, Wenbao;Park, Taesung
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.166-172
    • /
    • 2016
  • Although a large number of genetic variants have been identified to be associated with common diseases through genome-wide association studies, there still exits limitations in explaining the missing heritability. One approach to solving this missing heritability problem is to investigate gene-gene interactions, rather than a single-locus approach. For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely applied, since the constructive induction algorithm of MDR efficiently reduces high-order dimensions into one dimension by classifying multi-level genotypes into high- and low-risk groups. The MDR method has been extended to various phenotypes and has been improved to provide a significance test for gene-gene interactions. In this paper, we propose a simple method, called accelerated failure time (AFT) UM-MDR, in which the idea of a unified model-based MDR is extended to the survival phenotype by incorporating AFT-MDR into the classification step. The proposed AFT UM-MDR method is compared with AFT-MDR through simulation studies, and a short discussion is given.

Power of Expanded Multifactor Dimensionality Reduction with CART Algorithm (CART 알고리즘을 활용한 확장된 다중인자 차원축소방법의 검정력 평가)

  • Lee, Jea-Young;Lee, Jong-Hyeong;Lee, Ho-Guen
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.5
    • /
    • pp.667-678
    • /
    • 2010
  • It is important to detect the gene-gene interaction in GWAS(Genome-Wide Association Study). There are many studies about detecting gene-gene interaction. The one is Multifactor dimensionality reduction method. But MDR method is not applied continuous data and expanded multifactor dimensionality reduction(E-MDR) method is suggested. The goal of this study is to evaluate the power of E-MDR for identifying gene-gene interaction by simulation. Also we applied the method on the identify interaction e ects of single nucleotid polymorphisms(SNPs) responsible for economic traits in a Korean cattle population (real data).

A Comparison Study on SVM MDR and D-MDR for Detecting Gene-Gene Interaction in Continuous Data (연속형자료의 유전자 상호작용 규명을 위한 SVM MDR과 D-MDR의 방법 비교)

  • Lee, Jong-Hyeong;Lee, Jea-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.4
    • /
    • pp.413-422
    • /
    • 2011
  • We have used a multifactor dimensionality reduction(MDR) method to study the major gene interaction effect in general; however, without application of the MDR method in continuous data. In light of this, many methods have been suggested such as Expanded MDR, Dummy MDR and SVM MDR. In this paper, we compare the two methods of SVM MDR and D-MDR. In addition, we identify the gene-gene interaction effect of single nucleotide polymorphisms(SNPs) associated with economic traits in Hanwoo(Korean cattle). Lastly, we discuss a new method in consideration of the advantages that the other methods present.

Multifactor-Dimensionality Reduction in the Presence of Missing Observations

  • Chung, Yu-Jin;Lee, Seung-Yeoun;Park, Tae-Sung
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.31-36
    • /
    • 2005
  • An identification and characterization of susceptibility genes for common complex multifactorial diseases is a challengeable task, in which the effect of single genetic variation will be likely dependent on other genetic variations(gene-gene interaction) and environmental factors (gene-environment interaction). To address is issue, the multifactor dimensionality reduction (MDR) has been proposed and implemented by Ritchie et al. (2001), Moore et al. (2002), Hahn et al.(2003) and Ritchie et al. (2003). With MDR, multilocus genotypes effectively reduce the dimension of genotype predictors from n to one, which improves the identification of polymorphism combinations associated with disease risk. However, MDR cannot handle missing observations appropriately, in which missing observation is treated as an additional genotype category. This approach may suffer from a sparseness problem since when high-order interactions are considered, an additional missing category would make the contingency table cells more sparse. We propose a new MDR approach with minimum loss of sample sizes by considering missing data over all possible multifactor classes. We evaluate the proposed MDR by using the prediction errors and cross validation consistency.

  • PDF

Development of a Recommender System for E-Commerce Sites Using a Dimensionality Reduction Technique (차원 감소 기법을 이용한 전자 상거래 추천 시스템)

  • Kim, Yong-Soo;Yum, Bong-Jin;Kim, Nor-Man
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.36 no.3
    • /
    • pp.193-202
    • /
    • 2010
  • The recommender system is a typical software solution for personalized services which are now popular in e-commerce sites. Most of the existing recommender systems are based on customers' explicit rating data on items (e.g., ratings on movies), and it is only recently that recommender systems based on implicit ratings have been proposed as a better alternative. Implicit ratings of a customer on those items that are clicked but not purchased can be inferred from the customer's navigational and behavioral patterns. In this article, a dimensionality reduction (DR) technique is newly applied to the implicit rating-based recommender system, and its effectiveness is assessed using an experimental e-commerce site. The experimental results indicate that the performance of the proposed approach is superior or at least similar to the conventional collaborative filtering (CF)-based approach unless the number of recommended products is 'large.' In addition, the proposed approach requires less memory space and is computationally more efficient.