• Title, Summary, Keyword: Dimensionality Reduction

Search Result 165, Processing Time 0.035 seconds

An Assessment of a Random Forest Classifier for a Crop Classification Using Airborne Hyperspectral Imagery

  • Jeon, Woohyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.141-150
    • /
    • 2018
  • Crop type classification is essential for supporting agricultural decisions and resource monitoring. Remote sensing techniques, especially using hyperspectral imagery, have been effective in agricultural applications. Hyperspectral imagery acquires contiguous and narrow spectral bands in a wide range. However, large dimensionality results in unreliable estimates of classifiers and high computational burdens. Therefore, reducing the dimensionality of hyperspectral imagery is necessary. In this study, the Random Forest (RF) classifier was utilized for dimensionality reduction as well as classification purpose. RF is an ensemble-learning algorithm created based on the Classification and Regression Tree (CART), which has gained attention due to its high classification accuracy and fast processing speed. The RF performance for crop classification with airborne hyperspectral imagery was assessed. The study area was the cultivated area in Chogye-myeon, Habcheon-gun, Gyeongsangnam-do, South Korea, where the main crops are garlic, onion, and wheat. Parameter optimization was conducted to maximize the classification accuracy. Then, the dimensionality reduction was conducted based on RF variable importance. The result shows that using the selected bands presents an excellent classification accuracy without using whole datasets. Moreover, a majority of selected bands are concentrated on visible (VIS) region, especially region related to chlorophyll content. Therefore, it can be inferred that the phenological status after the mature stage influences red-edge spectral reflectance.

Analysis of Dimensionality Reduction Methods Through Epileptic EEG Feature Selection for Machine Learning in BCI (BCI에서 기계 학습을 위한 간질 뇌파 특징 선택을 통한 차원 감소 방법 분석)

  • Tong, Yang;Aliyu, Ibrahim;Lim, Chang-Gyoon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1333-1342
    • /
    • 2018
  • Until now, Electroencephalography(: EEG) has been the most important and convenient method for the diagnosis and treatment of epilepsy. However, it is difficult to identify the wave characteristics of an epileptic EEG signals because it is very weak, non-stationary and has strong background noise. In this paper, we analyse the effect of dimensionality reduction methods on Epileptic EEG feature selection and classification. Three dimensionality reduction methods: Pincipal Component Analysis(: PCA), Kernel Principal Component Analysis(: KPCA) and Linear Discriminant Analysis(: LDA) were investigated. The performance of each method was evaluated by using Support Vector Machine SVM, Logistic Regression(: LR), K-Nearestneighbor(: K-NN), Decision Tree(: DR) and Random Forest(: RF). From the experimental result, PCA recorded 75% of highest accuracy in SVM, LR and K-NN. KPCA recorded 85% of best performance in SVM and K-KNN while LDA achieved 100% accuracy in K-NN. Thus, LDA dimensionality reduction is found to provide the best classification result for epileptic EEG signal.

A Comparison Study on SVM MDR and D-MDR for Detecting Gene-Gene Interaction in Continuous Data (연속형자료의 유전자 상호작용 규명을 위한 SVM MDR과 D-MDR의 방법 비교)

  • Lee, Jong-Hyeong;Lee, Jea-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.4
    • /
    • pp.413-422
    • /
    • 2011
  • We have used a multifactor dimensionality reduction(MDR) method to study the major gene interaction effect in general; however, without application of the MDR method in continuous data. In light of this, many methods have been suggested such as Expanded MDR, Dummy MDR and SVM MDR. In this paper, we compare the two methods of SVM MDR and D-MDR. In addition, we identify the gene-gene interaction effect of single nucleotide polymorphisms(SNPs) associated with economic traits in Hanwoo(Korean cattle). Lastly, we discuss a new method in consideration of the advantages that the other methods present.

Power of Expanded Multifactor Dimensionality Reduction with CART Algorithm (CART 알고리즘을 활용한 확장된 다중인자 차원축소방법의 검정력 평가)

  • Lee, Jea-Young;Lee, Jong-Hyeong;Lee, Ho-Guen
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.5
    • /
    • pp.667-678
    • /
    • 2010
  • It is important to detect the gene-gene interaction in GWAS(Genome-Wide Association Study). There are many studies about detecting gene-gene interaction. The one is Multifactor dimensionality reduction method. But MDR method is not applied continuous data and expanded multifactor dimensionality reduction(E-MDR) method is suggested. The goal of this study is to evaluate the power of E-MDR for identifying gene-gene interaction by simulation. Also we applied the method on the identify interaction e ects of single nucleotid polymorphisms(SNPs) responsible for economic traits in a Korean cattle population (real data).

Development of a Recommender System for E-Commerce Sites Using a Dimensionality Reduction Technique (차원 감소 기법을 이용한 전자 상거래 추천 시스템)

  • Kim, Yong-Soo;Yum, Bong-Jin;Kim, Nor-Man
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.36 no.3
    • /
    • pp.193-202
    • /
    • 2010
  • The recommender system is a typical software solution for personalized services which are now popular in e-commerce sites. Most of the existing recommender systems are based on customers' explicit rating data on items (e.g., ratings on movies), and it is only recently that recommender systems based on implicit ratings have been proposed as a better alternative. Implicit ratings of a customer on those items that are clicked but not purchased can be inferred from the customer's navigational and behavioral patterns. In this article, a dimensionality reduction (DR) technique is newly applied to the implicit rating-based recommender system, and its effectiveness is assessed using an experimental e-commerce site. The experimental results indicate that the performance of the proposed approach is superior or at least similar to the conventional collaborative filtering (CF)-based approach unless the number of recommended products is 'large.' In addition, the proposed approach requires less memory space and is computationally more efficient.

Identification of epistasis in ischemic stroke using multifactor dimensionality reduction and entropy decomposition

  • Park, Jung-Dae;Kim, Youn-Young;Lee, Chae-Young
    • BMB Reports
    • /
    • v.42 no.9
    • /
    • pp.617-622
    • /
    • 2009
  • We investigated the genetic associations of ischemic stroke by identifying epistasis of its heterogeneous subtypes such as small vessel occlusion (SVO) and large artery atherosclerosis (LAA). Epistasis was analyzed with 24 genes in 207 controls and 271 patients (SVO = 110, LAA = 95) using multifactor dimensionality reduction and entropy decomposition. The multifactor dimensionality reduction analysis with any of 1- to 4-locus models showed no significant association with LAA (P > 0.05). The analysis of SVO, however, revealed a significant association in the best 3-locus model with P10L of TGF-$\beta{1}$, C1013T of SPP1, and R485K of F5 (testing balanced accuracy = 63.17%, P < 0.05). Subsequent entropy analysis also revealed that such heterogeneity was present and quite a large entropy was estimated among the 3 loci for SVO (5.43%), but only a relatively small entropy was estimated for LAA (1.81%). This suggests that the synergistic epistasis model might contribute specifically to the pathogenetsis of SVO, which implies a different etiopathogenesis of the ischemic stroke subtypes.

Important SNPs Identification from the Economic Traits for the High Quality Korean Cattle (고품질 한우를 위한 여러 경제형질에서의 주요 SNP 규명)

  • Lee, Jea-Young;Kim, Dong-Chul
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2009
  • In order to make the high quality Korean cattle, it has been identified the gene markers which influence to various economic traits. To identify statistically significances among SNP markers, Lee et. al. (2008b) identified SNP(19_1)$^*$SNP(28_2) marker was an important marker in LMA(longissimus muscle dorsi area). In addition, CWT(carcass cold weight) and ADG(average daily gain) are applied for expanded multifactor dimensionality reduction (expanded MDR) method from the comprehensive economic traits. The results showed that SNP(19_1)$^*$SNP(28_2) interaction marker was good and a very meaningful for economic traits.

Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method

  • Lee, Seungyeoun;Son, Donghee;Yu, Wenbao;Park, Taesung
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.166-172
    • /
    • 2016
  • Although a large number of genetic variants have been identified to be associated with common diseases through genome-wide association studies, there still exits limitations in explaining the missing heritability. One approach to solving this missing heritability problem is to investigate gene-gene interactions, rather than a single-locus approach. For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely applied, since the constructive induction algorithm of MDR efficiently reduces high-order dimensions into one dimension by classifying multi-level genotypes into high- and low-risk groups. The MDR method has been extended to various phenotypes and has been improved to provide a significance test for gene-gene interactions. In this paper, we propose a simple method, called accelerated failure time (AFT) UM-MDR, in which the idea of a unified model-based MDR is extended to the survival phenotype by incorporating AFT-MDR into the classification step. The proposed AFT UM-MDR method is compared with AFT-MDR through simulation studies, and a short discussion is given.

Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload

  • Kakavand, Mohsen;Mustapha, Norwati;Mustapha, Aida;Abdullah, Mohd Taufik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3884-3910
    • /
    • 2016
  • Intrusion Detection System (IDS) in general considers a big amount of data that are highly redundant and irrelevant. This trait causes slow instruction, assessment procedures, high resource consumption and poor detection rate. Due to their expensive computational requirements during both training and detection, IDSs are mostly ineffective for real-time anomaly detection. This paper proposes a dimensionality reduction technique that is able to enhance the performance of IDSs up to constant time O(1) based on the Principle Component Analysis (PCA). Furthermore, the present study offers a feature selection approach for identifying major components in real time. The PCA algorithm transforms high-dimensional feature vectors into a low-dimensional feature space, which is used to determine the optimum volume of factors. The proposed approach was assessed using HTTP packet payload of ISCX 2012 IDS and DARPA 1999 dataset. The experimental outcome demonstrated that our proposed anomaly detection achieved promising results with 97% detection rate with 1.2% false positive rate for ISCX 2012 dataset and 100% detection rate with 0.06% false positive rate for DARPA 1999 dataset. Our proposed anomaly detection also achieved comparable performance in terms of computational complexity when compared to three state-of-the-art anomaly detection systems.

Asymptotic Test for Dimensionality in Sliced Inverse Regression (분할 역회귀모형에서 차원결정을 위한 점근검정법)

  • Park, Chang-Sun;Kwak, Jae-Guen
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.381-393
    • /
    • 2005
  • As a promising technique for dimension reduction in regression analysis, Sliced Inverse Regression (SIR) and an associated chi-square test for dimensionality were introduced by Li (1991). However, Li's test needs assumption of Normality for predictors and found to be heavily dependent on the number of slices. We will provide a unified asymptotic test for determining the dimensionality of the SIR model which is based on the probabilistic principal component analysis and free of normality assumption on predictors. Illustrative results with simulated and real examples will also be provided.