• Title/Summary/Keyword: Dimensionality Reduction

Search Result 202, Processing Time 0.038 seconds

Binary classification by the combination of Adaboost and feature extraction methods (특징 추출 알고리즘과 Adaboost를 이용한 이진분류기)

  • Ham, Seaung-Lok;Kwak, No-Jun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.4
    • /
    • pp.42-53
    • /
    • 2012
  • In pattern recognition and machine learning society, classification has been a classical problem and the most widely researched area. Adaptive boosting also known as Adaboost has been successfully applied to binary classification problems. It is a kind of boosting algorithm capable of constructing a strong classifier through a weighted combination of weak classifiers. On the other hand, the PCA and LDA algorithms are the most popular linear feature extraction methods used mainly for dimensionality reduction. In this paper, the combination of Adaboost and feature extraction methods is proposed for efficient classification of two class data. Conventionally, in classification problems, the roles of feature extraction and classification have been distinct, i.e., a feature extraction method and a classifier are applied sequentially to classify input variable into several categories. In this paper, these two steps are combined into one resulting in a good classification performance. More specifically, each projection vector is treated as a weak classifier in Adaboost algorithm to constitute a strong classifier for binary classification problems. The proposed algorithm is applied to UCI dataset and FRGC dataset and showed better recognition rates than sequential application of feature extraction and classification methods.

Polymorphisms of XRCC1 and XRCC2 DNA Repair Genes and Interaction with Environmental Factors Influence the Risk of Nasopharyngeal Carcinoma in Northeast India

  • Singh, Seram Anil;Ghosh, Sankar Kumar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.6
    • /
    • pp.2811-2819
    • /
    • 2016
  • Multiple genetic and environmental factors have been reported to play key role in the development of nasopharyngeal carcinoma (NPC). Here, we investigated interactions of XRCC1 Arg399Gln and XRCC2 Arg188His polymorphisms and environmental factors in modulating susceptibility to NPC in Northeast India. One-hundred NPC patients, 90 first-degree relatives of patients and 120 controls were enrolled in the study. XRCC1 Arg399Gln and XRCC2 Arg188His polymorphisms were determined using PCR-RFLP, and the results were confirmed by DNA sequencing. Logistic regression (LR) and multifactor dimensionality reduction (MDR) approaches were applied for statistical analysis. The XRCC1 Gln/Gln genotype showed increased risk (OR=2.76; P<0.024) of NPC. However, individuals with both XRCC1 and XRCC2 polymorphic variants had 3.2 fold elevated risk (P<0.041). An enhanced risk of NPC was also observed in smoked meat (OR=4.07; P=0.004) and fermented fish consumers (OR=4.34, P=0.001), and tobacco-betel quid chewers (OR=7.00; P=0.0001) carrying XRCC1 polymorphic variants. However, smokers carrying defective XRCC1 gene showed the highest risk (OR = 7.47; P<0.0001). On MDR analysis, the best model for NPC risk was the five-factor model combination of XRCC1 variant genotype, fermented fish, smoked meat, smoking and chewing (CVC=10/10; TBA=0.636; P<0.0001); whereas in interaction entropy graphs, smoked meat and tobacco chewing showed synergistic interactions with XRCC1. These findings suggest that interaction of genetic and environmental factors might increase susceptibility to NPC in Northeast Indian populations.

A Feature Selection Method Based on Fuzzy Cluster Analysis (퍼지 클러스터 분석 기반 특징 선택 방법)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.14B no.2
    • /
    • pp.135-140
    • /
    • 2007
  • Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.

Hydrodynamic Aspects on Three-dimensional Effects of Vertical-axis Tidal Stream Turbine (조류발전용 수직축 터빈의 유체동력학적 3차원 효과에 관한 연구)

  • Hyun, B.S.;Lee, J.K.
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.16 no.2
    • /
    • pp.61-70
    • /
    • 2013
  • Hydrodynamic aspects on three-dimensional effects were investigated in this study for simple and convenient conversion of tidal stream energy using a Vertical-Axis Turbine (VAT). Numerical approach was made to reveal the differences of flow physics between 2-D estimation and rigorous 3-D simulation. It was shown that the 3-D effects were dominant mainly due to the variation of tip vortices around the tip region of rotor blade, causing the loss of lift for steadily translating hydrofoil and the reduction of torque for rotating turbine blade. The 3-D effect was found to be rather prominent for the typical VATs considered in this paper. Simple and yet efficient 2-D approach with the correction of its three-dimensionality was also proposed for practical design and analysis of VAT.

Random Forest Based Abnormal ECG Dichotomization using Linear and Nonlinear Feature Extraction (선형-비선형 특징추출에 의한 비정상 심전도 신호의 랜덤포레스트 기반 분류)

  • Kim, Hye-Jin;Kim, Byeong-Nam;Jang, Won-Seuk;Yoo, Sun-K.
    • Journal of Biomedical Engineering Research
    • /
    • v.37 no.2
    • /
    • pp.61-67
    • /
    • 2016
  • This paper presented a method for random forest based the arrhythmia classification using both heart rate (HR) and heart rate variability (HRV) features. We analyzed the MIT-BIH arrhythmia database which contains half-hour ECG recorded from 48 subjects. This study included not only the linear features but also non-linear features for the improvement of classification performance. We classified abnormal ECG using mean_NN (mean of heart rate), SD1/SD2 (geometrical feature of poincare HRV plot), SE (spectral entropy), pNN100 (percentage of a heart rate longer than 100 ms) affecting accurate classification among combined of linear and nonlinear features. We compared our proposed method with Neural Networks to evaluate the accuracy of the algorithm. When we used the features extracted from the HRV as an input variable for classifier, random forest used only the most contributed variable for classification unlike the neural networks. The characteristics of random forest enable the dimensionality reduction of the input variables, increase a efficiency of classifier and can be obtained faster, 11.1% higher accuracy than the neural networks.

Data anomaly detection and Data fusion based on Incremental Principal Component Analysis in Fog Computing

  • Yu, Xue-Yong;Guo, Xin-Hui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3989-4006
    • /
    • 2020
  • The intelligent agriculture monitoring is based on the perception and analysis of environmental data, which enables the monitoring of the production environment and the control of environmental regulation equipment. As the scale of the application continues to expand, a large amount of data will be generated from the perception layer and uploaded to the cloud service, which will bring challenges of insufficient bandwidth and processing capacity. A fog-based offline and real-time hybrid data analysis architecture was proposed in this paper, which combines offline and real-time analysis to enable real-time data processing on resource-constrained IoT devices. Furthermore, we propose a data process-ing algorithm based on the incremental principal component analysis, which can achieve data dimensionality reduction and update of principal components. We also introduce the concept of Squared Prediction Error (SPE) value and realize the abnormal detection of data through the combination of SPE value and data fusion algorithm. To ensure the accuracy and effectiveness of the algorithm, we design a regular-SPE hybrid model update strategy, which enables the principal component to be updated on demand when data anomalies are found. In addition, this strategy can significantly reduce resource consumption growth due to the data analysis architectures. Practical datasets-based simulations have confirmed that the proposed algorithm can perform data fusion and exception processing in real-time on resource-constrained devices; Our model update strategy can reduce the overall system resource consumption while ensuring the accuracy of the algorithm.

Genetic Design of Granular-oriented Radial Basis Function Neural Network Based on Information Proximity (정보 유사성 기반 입자화 중심 RBF NN의 진화론적 설계)

  • Park, Ho-Sung;Oh, Sung-Kwun;Kim, Hyun-Ki
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.2
    • /
    • pp.436-444
    • /
    • 2010
  • In this study, we introduce and discuss a concept of a granular-oriented radial basis function neural networks (GRBF NNs). In contrast to the typical architectures encountered in radial basis function neural networks(RBF NNs), our main objective is to develop a design strategy of GRBF NNs as follows : (a) The architecture of the network is fully reflective of the structure encountered in the training data which are granulated with the aid of clustering techniques. More specifically, the output space is granulated with use of K-Means clustering while the information granules in the multidimensional input space are formed by using a so-called context-based Fuzzy C-Means which takes into account the structure being already formed in the output space, (b) The innovative development facet of the network involves a dynamic reduction of dimensionality of the input space in which the information granules are formed in the subspace of the overall input space which is formed by selecting a suitable subset of input variables so that the this subspace retains the structure of the entire space. As this search is of combinatorial character, we use the technique of genetic optimization to determine the optimal input subspaces. A series of numeric studies exploiting some nonlinear process data and a dataset coming from the machine learning repository provide a detailed insight into the nature of the algorithm and its parameters as well as offer some comparative analysis.

Sonar Target Classification using Generalized Discriminant Analysis (일반화된 판별분석 기법을 이용한 능동소나 표적 식별)

  • Kim, Dong-wook;Kim, Tae-hwan;Seok, Jong-won;Bae, Keun-sung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.125-130
    • /
    • 2018
  • Linear discriminant analysis is a statistical analysis method that is generally used for dimensionality reduction of the feature vectors or for class classification. However, in the case of a data set that cannot be linearly separated, it is possible to make a linear separation by mapping a feature vector into a higher dimensional space using a nonlinear function. This method is called generalized discriminant analysis or kernel discriminant analysis. In this paper, we carried out target classification experiments with active sonar target signals available on the Internet using both liner discriminant and generalized discriminant analysis methods. Experimental results are analyzed and compared with discussions. For 104 test data, LDA method has shown correct recognition rate of 73.08%, however, GDA method achieved 95.19% that is also better than the conventional MLP or kernel-based SVM.

The Impact of Choline Acetyltransferase Polymorphism on the Expression of Mild Cognitive Impairment (Choline Acetyltransferase 유전자 다형성이 경도인지손상 발현에 미치는 영향)

  • Lee, Jung-Jae;Park, Joon-Hyuk;Lee, Seok-Bum;Huh, Yoon-Seok;Kim, Tae-Hui;Youn, Jong-Chul;Jhoo, Jin-Hyeong;Lee, Dong-Young;Park, Koung-Un;Kim, Ki-Woong
    • Korean Journal of Biological Psychiatry
    • /
    • v.17 no.4
    • /
    • pp.218-225
    • /
    • 2010
  • Objectives : The potential association between choline acetyltransferase(CHAT) polymorphism and the risk of mild cognitive impairment(MCI) has not been investigated in Korea. We examined the main effect of CHAT polymorphism and its interaction with apolipoprotein E(APOE) polymorphism in the development of MCI in elderly Korean sample. Methods : We analyzed CHAT 2384G > A polymorphism and APOE polymorphism among 149 MCI subjects with MCI and 298 normal controls. We tested the association between MCI and CHAT A allele status using a logistic regression model. In addition, we employed generalized multifactor dimensionality reduction(GMDR) to investigate the interaction between CHAT and APOE with regard to the risk of MCI. Results : The CHAT A allele was associated with AD risk(OR = 1.59, 95% CI = 1.02-2.48, p = 0.042). No significant gene-gene interaction between CHAT and APOE was found in GMDR method(testing balanced accuracy = 0.540, p = 0.055). Conclusion : The CHAT A allele was associated with MCI risk in the Korean elderly. Its interaction with the APOE ${\varepsilon}4$ allele was not significant with regard to the development of MCI.

The Study of Facebook Marketing Application Method: Facebook 'Likes' Feature and Predicting Demographic Information (페이스북 마케팅 활용 방안에 대한 연구: 페이스북 '좋아요' 기능과 인구통계학적 정보 추출)

  • Yu, Seong Jong;Ahn, Seun;Lee, Zoonky
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.61-66
    • /
    • 2016
  • With big data analysis, companies use the customized marketing strategy based on customer's information. However, because of the concerns about privacy issue and identity theft, people start erasing their personal information or changing the privacy settings on social network site. Facebook, the most used social networking site, has the feature called 'Likes' which can be used as a tool to predict user's demographic profiles, such as sex and age range. To make accurate analysis model for the study, 'Likes' data has been processed by using Gaussian RBF and nFactors for dimensionality reduction. With random Forest and 5-fold cross-validation, the result shows that sex has 75% and age has 97.85% accuracy rate. From this study, we expect to provide an useful guideline for companies and marketers who are suffering to collect customers' data.

  • PDF