• Title/Summary/Keyword: Mutual Information (MI)

Search Result 71, Processing Time 0.029 seconds

A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models

  • Ghasemi, Jahan B.;Zolfonoun, Ehsan
    • Bulletin of the Korean Chemical Society
    • /
    • v.33 no.5
    • /
    • pp.1527-1535
    • /
    • 2012
  • Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.

Variable Selection Based on Mutual Information

  • Huh, Moon-Y.;Choi, Byong-Su
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.143-155
    • /
    • 2009
  • Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.

Similarity Measurement using Gabor Energy Feature and Mutual Information for Image Registration

  • Ye, Chul-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.27 no.6
    • /
    • pp.693-701
    • /
    • 2011
  • Image registration is an essential process to analyze the time series of satellite images for the purpose of image fusion and change detection. The Mutual Information (MI) is commonly used as similarity measure for image registration because of its robustness to noise. Due to the radiometric differences, it is not easy to apply MI to multi-temporal satellite images using directly the pixel intensity. Image features for MI are more abundantly obtained by employing a Gabor filter which varies adaptively with the filter characteristics such as filter size, frequency and orientation for each pixel. In this paper we employed Bidirectional Gabor Filter Energy (BGFE) defined by Gabor filter features and applied the BGFE to similarity measure calculation as an image feature for MI. The experiment results show that the proposed method is more robust than the conventional MI method combined with intensity or gradient magnitude.

Sample-spacing Approach for the Estimation of Mutual Information (SAMPLE-SPACING 방법에 의한 상호정보의 추정)

  • Huh, Moon-Yul;Cha, Woon-Ock
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.301-312
    • /
    • 2008
  • Mutual information is a measure of association of explanatory variable for predicting target variable. It is used for variable ranking and variable subset selection. This study is about the Sample-spacing approach which can be used for the estimation of mutual information from data consisting of continuous explanation variables and categorical target variable without estimating a joint probability density function. The results of Monte-Carlo simulation and experiments with real-world data show that m = 1 is preferable in using Sample-spacing.

Mutual Information Technique for Selecting Input Variables of RDAPS (RDAPS 입력자료 선정을 위한 Mutual Information기법 적용)

  • Han, Kwang-Hee;Ryu, Yong-Jun;Kim, Tae-Soon;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2009.05a
    • /
    • pp.1141-1144
    • /
    • 2009
  • 인공신경망(artificial neural network) 기법은 인간의 두뇌 신경세포의 활동을 모형화한 것으로 오랜 시간동안 발전해 왔으며 여러 분야에서 활용되고 있고 수문분야에서도 인공신경망을 이용한 연구가 활발히 진행되어 왔다. RDAPS와 같은 단기수치예보 자료는 강우의 유무 판단과 같은 정성적인 분석에서 비교적 정확도가 높지만 정확한 강우량의 추정과 같은 정량적인 부분에서는 정확도가 매우 낮으므로 인공신경망 기법과 같은 후처리 기법을 통해서 정확도를 높이게 된다. 인공신경망 기법을 수행할 때, 가장 중요한 것은 입력변수선택(input variable selection)으로 입력 변수의 적절한 선택이 결과값에 큰 영향을 주게 된다. 본 연구에서는 mutual information을 입력 변수 선택 기법으로 채택하여, 인공신경망의 입력변수 선정의 정확도를 알아보고자 한다. Mutual information은 주어진 자료의 엔트로피값을 이용하여 변수들 간의 독립과 종속의 관계를 나타내는 기법으로서, MI값은 '0'에서 '1'의 값을 가지며 '0'에 가까울수록 변수들 간의 관계가 독립적이고 '1'에 가까울수록 종속적인 관계를 나타낸다. 인공신경망의 입력변수선정에 대한 mutual information의 정확도를 알아보기 위해, 기존 입력변수선택 기법과 mutual information을 이용했을 경우의 인공신경망의 처리능력, 정확도를 비교 검토하였다.

  • PDF

Input Variables Selection by Principal Component Analysis and Mutual Information Estimation (주요성분분석과 상호정보 추정에 의한 입력변수선택)

  • Cho, Yong-Hyun;Hong, Seong-Jun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.2
    • /
    • pp.220-225
    • /
    • 2007
  • This paper presents an efficient input variable selection method using both principal component analysis(PCA) and adaptive partition mutual information(AP-MI) estimation. PCA which is based on 2nd order statistics, is applied to prevent a overestimation by quickly removing the dependence between input variables. AP-MI estimation is also applied to estimate an accurate dependence information by equally partitioning the samples of input variable for calculating the probability density function. The proposed method has been applied to 2 problems for selecting the input variables, which are the 7 artificial signals of 500 samples and the 24 environmental pollution signals of 55 samples, respectively. The experimental results show that the proposed methods has a fast and accurate selection performance. The proposed method has also respectively better performance than AP-MI estimation without the PCA and regular partition MI estimation.

Input Variable Selection by Using Fixed-Point ICA and Adaptive Partition Mutual Information Estimation (고정점 알고리즘의 독립성분분석과 적응분할의 상호정보 추정에 의한 입력변수선택)

  • Cho, Yong-Hyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.5
    • /
    • pp.525-530
    • /
    • 2006
  • This paper presents an efficient input variable selection method using both fixed-point independent component analysis(FP-ICA) and adaptive partition mutual information(AP-MI) estimation. FP-ICA which is based on secant method, is applied to quickly find the independence between input variables. AP-MI estimation is also applied to estimate an accurate dependence information by equally partitioning the samples of input variable for calculating the probability density function(PDF). The proposed method has been applied to 2 problems for selecting the input variables, which are the 7 artificial signals of 500 samples and the 24 environmental pollution signals of 55 samples, respectively The experimental results show that the proposed methods has a fast and accurate selection performance. The proposed method has also respectively better performance than AP-MI estimation without the FP-ICA and regular partition MI estimation.

k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information (상호정보 추정을 위한 k-최근접이웃 기반방법)

  • Cha, Woon-Ock;Huh, Moon-Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.977-991
    • /
    • 2008
  • This study is about the k-nearest neighbor-based approach for the estimation of mutual information when the type of target variable is categorical and continuous. The results of Monte-Carlo simulation and experiments with real-world data show that k=1 is preferable. In practical application with real world data, our study shows that jittering and bootstrapping is needed.

Hybrid Feature Selection Using Genetic Algorithm and Information Theory

  • Cho, Jae Hoon;Lee, Dae-Jong;Park, Jin-Il;Chun, Myung-Geun
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.73-82
    • /
    • 2013
  • In pattern classification, feature selection is an important factor in the performance of classifiers. In particular, when classifying a large number of features or variables, the accuracy and computational time of the classifier can be improved by using the relevant feature subset to remove the irrelevant, redundant, or noisy data. The proposed method consists of two parts: a wrapper part with an improved genetic algorithm(GA) using a new reproduction method and a filter part using mutual information. We also considered feature selection methods based on mutual information(MI) to improve computational complexity. Experimental results show that this method can achieve better performance in pattern recognition problems than other conventional solutions.