A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models

  • Ghasemi, Jahan B. (Chemistry Department, Faculty of Sciences, K.N. Toosi University of Technology) ;
  • Zolfonoun, Ehsan (Chemistry Department, Faculty of Sciences, K.N. Toosi University of Technology)
  • 투고 : 2011.11.21
  • 심사 : 2012.02.03
  • 발행 : 2012.05.20


Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.



  1. Livingstone, D. J. J. Chem. Inf. Comput. Sci. 2000, 40, 195.
  2. Ghasemi, J.; Saaidpour, S. Anal. Chim. Acta 2007, 604, 99.
  3. Chen, K. X.; Li, Z. G.; Xie, H. Y.; Gao, J. R.; Zou, J. W. Eur. J. Med. Chem. 2009, 44, 4367.
  4. Mercader, A. G.; Duchowicz, P. R.; Fernandez, F. M.; Castro, E. A. J. Chem. Inf. Model 2010, 50, 1542.
  5. Shamsipur, M.; Zare-Shahabadi, V.; Hemmateenejad, B.; Akhond, M. Anal. Chim. Acta 2009, 646, 39.
  6. Jouan-Rimbaud, D.; Walczack, B.; Massart, D.; Last, I.; Prebble, K. Anal. Chim. Acta 1995, 304, 285.
  7. Ghasemi, J.; Abdolmaleki, A.; Mandoumi, N. J. Hazard. Mater. 2009, 161, 74.
  8. Guptaa, V. K.; Khanic, H.; Ahmadi-Roudid, B.; Mirakhorlic, S.; Fereyduni, E.; Agarwale, S. Talanta 2011, 83, 1014.
  9. Ghasemi, J.; Asadpour, S.; Abdolmaleki, A. Anal. Chim. Acta 2007, 588, 200.
  10. Deswal, S.; Roy, N. Eur. J. Med. Chem. 2006, 41, 1339.
  11. Xia, B.; Ma, W.; Zheng, B.; Zhang, X.; Fan, B. Eur. J. Med. Chem. 2008, 43, 1489.
  12. Blank, T. B.; Brown, S. D. Anal. Chem. 1993, 65, 3081.
  13. Vapnik, V. Statistical Learning Theory; John Wiley: New York, 1998.
  14. Pourbasheer, E.; Riahi, S.; Ganjali, M. R.; Norouzi, P. Eur. J. Med. Chem. 2010, 45, 1087.
  15. Hemmateenejad, B.; Shamsipur, M.; Miri, R.; Elyasi, M.; Foroghini, F.; Sharghi, H. Anal. Chim. Acta 2008, 610, 25.
  16. Benoudjita, N.; François, D.; Meurensc, M.; Verleysen, M. Chemom. Intell. Lab. Syst. 2004, 74, 243.
  17. Amiri, F.; Rezaei Yousefi, M.; Lucas, C.; Shakery, A.; Yazdani, N. J. Netw. Comput. Appl. 2011, 34, 1184.
  18. Liu, H.; Sun, J.; Liu, L.; Zhang, H. Pattern Recogn. 2009, 42, 1330.
  19. Huang, D.; Chow, T. W. S. Neurocomputing 2005, 63, 325.
  20. Rossi, F.; Lendasse, A.; François, D.; Wertz, V.; Verleysen, M. Chemom. Intell. Lab. Syst. 2006, 80, 215.
  21. Durand, A.; Devos, O.; Ruckebusch, C.; Huvenne, J. P. Anal. Chim. Acta 2007, 595, 72.
  22. Caetano, S.; Krier, C.; Verleysen, M.; Vander Heyden, Y. Anal. Chim. Acta 2007, 602, 37.
  23. Eckschlager, K.; Danzer, K. Information Theory in Analytical Chemistry; John Wiley and Sons: Wiley Interscience, 1994.
  24. Cover, T. M.; Thomas, J. A. Elements of Information Theory; Wiley: New Jersey, 2005.
  25. Kojadinovic, I. Comput. Stat. Data Anal. 2005, 49, 1205.
  26. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer-Verlag: New York, 2001.
  27. Kraskov, A.; Stogbauer, H.; Grassberger, P. Phys. Rev. E 2004, 69, 066138.
  28. Harald, S.; Alexander, K.; Sergey, A. A.; Peter, G. Phys. Rev. E 2004, 70, 066123.
  29. Despagne, F.; Massart, D. L. Analyst 1998, 123, 157.
  30. Perez-Marin, D.; Garrido-Varo, A.; Guerrero, J. E. Talanta 2007, 72, 28.
  31. Park, J.; Sandberg, I. W. Neural Comput. 1993, 5, 305.
  32. Akhlaghi, Y.; Kompany-Zareh, M. J. Chemom. 2006, 20, 1.
  33. Cortes, C.; Vapnik, V. Mach. Learn. 1995, 20, 273.
  34. Zvinavashe, E.; Du, T.; Griff, T.; Van den berg, H. H. J.; Soffers, J. Vervoort, A. E. M. F.; Murk, A. J.; Rietjens, I. M. C. M. Chemosphere 2009, 75, 1531.
  35. FAO, Agriculture Towards 2010; C 93/24 Document of 27th Session of FAO Conference: Rome, 1993.
  36. Cai, C. P.; Liang, M.; Wen, R. R. Chromatographia 1995, 40, 417.
  37. Yan, D.; Jiang, X.; Xu, S.; Wang, L.; Bian, Y.; Yu, G. Chemosphere 2008, 71, 1809.
  38. Tomizawa, L. Environ. Qual. Saf. 1975, 4, 117.
  39. Vogue, P. A.; Kerle, E. A.; Jenkins, J. J. National Pesticide Information Center; OSU Extension Pesticide Properties Database, 1994.
  40. Forst, L.; Conroy, L. M. In Rafson, H. J., Ed.; Odor and VOC Control Handbook; McGraw-Hill: New York, 1998; p 3.1.
  41. Calvert, J. G. Chemistry for the 21st Century. The Chemistry of the Atmosphere: Its Impact on Global Change; Blackwell Scientific Publications: Oxford, 1994.
  42. EPA Method 8260C: Volatile organic compounds by Gas chromatography- mass/spectrometry (GC/MS), 2006.
  43. Sprunger, L. M.; Gibbs, J.; Acree, W. E.; Abraham, M. H. QSAR Comb. Sci. 2009, 28, 72.
  44. Astakhov, S. A.; Grassberger, P.; Kraskov, A.; Stogbauer, H. MILCA algorithm, available at index.html.

피인용 문헌

  1. Predicting Degradation Half-life of Organophosphorus Pesticides in Soil Using Three-Dimensional Molecular Interaction Fields : vol.2, pp.2, 2012,