DOI QR코드

DOI QR Code

A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models

  • Ghasemi, Jahan B. (Chemistry Department, Faculty of Sciences, K.N. Toosi University of Technology) ;
  • Zolfonoun, Ehsan (Chemistry Department, Faculty of Sciences, K.N. Toosi University of Technology)
  • Received : 2011.11.21
  • Accepted : 2012.02.03
  • Published : 2012.05.20

Abstract

Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.

Keywords

References

  1. Livingstone, D. J. J. Chem. Inf. Comput. Sci. 2000, 40, 195. https://doi.org/10.1021/ci990162i
  2. Ghasemi, J.; Saaidpour, S. Anal. Chim. Acta 2007, 604, 99. https://doi.org/10.1016/j.aca.2007.10.004
  3. Chen, K. X.; Li, Z. G.; Xie, H. Y.; Gao, J. R.; Zou, J. W. Eur. J. Med. Chem. 2009, 44, 4367. https://doi.org/10.1016/j.ejmech.2009.05.029
  4. Mercader, A. G.; Duchowicz, P. R.; Fernandez, F. M.; Castro, E. A. J. Chem. Inf. Model 2010, 50, 1542. https://doi.org/10.1021/ci100103r
  5. Shamsipur, M.; Zare-Shahabadi, V.; Hemmateenejad, B.; Akhond, M. Anal. Chim. Acta 2009, 646, 39. https://doi.org/10.1016/j.aca.2009.05.005
  6. Jouan-Rimbaud, D.; Walczack, B.; Massart, D.; Last, I.; Prebble, K. Anal. Chim. Acta 1995, 304, 285. https://doi.org/10.1016/0003-2670(94)00590-I
  7. Ghasemi, J.; Abdolmaleki, A.; Mandoumi, N. J. Hazard. Mater. 2009, 161, 74. https://doi.org/10.1016/j.jhazmat.2008.03.089
  8. Guptaa, V. K.; Khanic, H.; Ahmadi-Roudid, B.; Mirakhorlic, S.; Fereyduni, E.; Agarwale, S. Talanta 2011, 83, 1014. https://doi.org/10.1016/j.talanta.2010.11.017
  9. Ghasemi, J.; Asadpour, S.; Abdolmaleki, A. Anal. Chim. Acta 2007, 588, 200. https://doi.org/10.1016/j.aca.2007.02.027
  10. Deswal, S.; Roy, N. Eur. J. Med. Chem. 2006, 41, 1339. https://doi.org/10.1016/j.ejmech.2006.07.001
  11. Xia, B.; Ma, W.; Zheng, B.; Zhang, X.; Fan, B. Eur. J. Med. Chem. 2008, 43, 1489. https://doi.org/10.1016/j.ejmech.2007.09.004
  12. Blank, T. B.; Brown, S. D. Anal. Chem. 1993, 65, 3081. https://doi.org/10.1021/ac00069a023
  13. Vapnik, V. Statistical Learning Theory; John Wiley: New York, 1998.
  14. Pourbasheer, E.; Riahi, S.; Ganjali, M. R.; Norouzi, P. Eur. J. Med. Chem. 2010, 45, 1087. https://doi.org/10.1016/j.ejmech.2009.12.003
  15. Hemmateenejad, B.; Shamsipur, M.; Miri, R.; Elyasi, M.; Foroghini, F.; Sharghi, H. Anal. Chim. Acta 2008, 610, 25. https://doi.org/10.1016/j.aca.2008.01.011
  16. Benoudjita, N.; François, D.; Meurensc, M.; Verleysen, M. Chemom. Intell. Lab. Syst. 2004, 74, 243. https://doi.org/10.1016/j.chemolab.2004.04.015
  17. Amiri, F.; Rezaei Yousefi, M.; Lucas, C.; Shakery, A.; Yazdani, N. J. Netw. Comput. Appl. 2011, 34, 1184. https://doi.org/10.1016/j.jnca.2011.01.002
  18. Liu, H.; Sun, J.; Liu, L.; Zhang, H. Pattern Recogn. 2009, 42, 1330. https://doi.org/10.1016/j.patcog.2008.10.028
  19. Huang, D.; Chow, T. W. S. Neurocomputing 2005, 63, 325. https://doi.org/10.1016/j.neucom.2004.01.194
  20. Rossi, F.; Lendasse, A.; François, D.; Wertz, V.; Verleysen, M. Chemom. Intell. Lab. Syst. 2006, 80, 215. https://doi.org/10.1016/j.chemolab.2005.06.010
  21. Durand, A.; Devos, O.; Ruckebusch, C.; Huvenne, J. P. Anal. Chim. Acta 2007, 595, 72. https://doi.org/10.1016/j.aca.2007.03.024
  22. Caetano, S.; Krier, C.; Verleysen, M.; Vander Heyden, Y. Anal. Chim. Acta 2007, 602, 37. https://doi.org/10.1016/j.aca.2007.08.048
  23. Eckschlager, K.; Danzer, K. Information Theory in Analytical Chemistry; John Wiley and Sons: Wiley Interscience, 1994.
  24. Cover, T. M.; Thomas, J. A. Elements of Information Theory; Wiley: New Jersey, 2005.
  25. Kojadinovic, I. Comput. Stat. Data Anal. 2005, 49, 1205. https://doi.org/10.1016/j.csda.2004.07.026
  26. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer-Verlag: New York, 2001.
  27. Kraskov, A.; Stogbauer, H.; Grassberger, P. Phys. Rev. E 2004, 69, 066138. https://doi.org/10.1103/PhysRevE.69.066138
  28. Harald, S.; Alexander, K.; Sergey, A. A.; Peter, G. Phys. Rev. E 2004, 70, 066123. https://doi.org/10.1103/PhysRevE.70.066123
  29. Despagne, F.; Massart, D. L. Analyst 1998, 123, 157. https://doi.org/10.1039/a805562i
  30. Perez-Marin, D.; Garrido-Varo, A.; Guerrero, J. E. Talanta 2007, 72, 28. https://doi.org/10.1016/j.talanta.2006.10.036
  31. Park, J.; Sandberg, I. W. Neural Comput. 1993, 5, 305.
  32. Akhlaghi, Y.; Kompany-Zareh, M. J. Chemom. 2006, 20, 1. https://doi.org/10.1002/cem.971
  33. Cortes, C.; Vapnik, V. Mach. Learn. 1995, 20, 273.
  34. Zvinavashe, E.; Du, T.; Griff, T.; Van den berg, H. H. J.; Soffers, J. Vervoort, A. E. M. F.; Murk, A. J.; Rietjens, I. M. C. M. Chemosphere 2009, 75, 1531. https://doi.org/10.1016/j.chemosphere.2009.01.081
  35. FAO, Agriculture Towards 2010; C 93/24 Document of 27th Session of FAO Conference: Rome, 1993.
  36. Cai, C. P.; Liang, M.; Wen, R. R. Chromatographia 1995, 40, 417. https://doi.org/10.1007/BF02269905
  37. Yan, D.; Jiang, X.; Xu, S.; Wang, L.; Bian, Y.; Yu, G. Chemosphere 2008, 71, 1809. https://doi.org/10.1016/j.chemosphere.2008.02.033
  38. Tomizawa, L. Environ. Qual. Saf. 1975, 4, 117.
  39. Vogue, P. A.; Kerle, E. A.; Jenkins, J. J. National Pesticide Information Center; OSU Extension Pesticide Properties Database, 1994.
  40. Forst, L.; Conroy, L. M. In Rafson, H. J., Ed.; Odor and VOC Control Handbook; McGraw-Hill: New York, 1998; p 3.1.
  41. Calvert, J. G. Chemistry for the 21st Century. The Chemistry of the Atmosphere: Its Impact on Global Change; Blackwell Scientific Publications: Oxford, 1994.
  42. EPA Method 8260C: Volatile organic compounds by Gas chromatography- mass/spectrometry (GC/MS), 2006.
  43. Sprunger, L. M.; Gibbs, J.; Acree, W. E.; Abraham, M. H. QSAR Comb. Sci. 2009, 28, 72. https://doi.org/10.1002/qsar.200860098
  44. Astakhov, S. A.; Grassberger, P.; Kraskov, A.; Stogbauer, H. MILCA algorithm, available at http://www.klab.caltech.edu/kraskov/MILCA/ index.html.

Cited by

  1. Predicting Degradation Half-life of Organophosphorus Pesticides in Soil Using Three-Dimensional Molecular Interaction Fields : vol.2, pp.2, 2012, https://doi.org/10.4018/ijqspr.2017070103