Early Detection of Lung Cancer Risk Using Data Mining

  • Ahmed, Kawsar (Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University) ;
  • Abdullah-Al-Emran, Abdullah-Al-Emran (Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University) ;
  • Jesmin, Tasnuba (Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University) ;
  • Mukti, Roushney Fatima (Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University) ;
  • Rahman, Md. Zamilur (Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University) ;
  • Ahmed, Farzana (Department of Mathematics and Natural Science, BRAC University)
  • Published : 2013.01.31


Background: Lung cancer is the leading cause of cancer death worldwide Therefore, identification of genetic as well as environmental factors is very important in developing novel methods of lung cancer prevention. However, this is a multi-layered problem. Therefore a lung cancer risk prediction system is here proposed which is easy, cost effective and time saving. Materials and Methods: Initially 400 cancer and non-cancer patients' data were collected from different diagnostic centres, pre-processed and clustered using a K-means clustering algorithm for identifying relevant and non-relevant data. Next significant frequent patterns are discovered using AprioriTid and a decision tree algorithm. Results: Finally using the significant pattern prediction tools for a lung cancer prediction system were developed. This lung cancer risk prediction system should prove helpful in detection of a person's predisposition for lung cancer. Conclusions: Most of people of Bangladesh do not even know they have lung cancer and the majority of cases are diagnosed at late stages when cure is impossible. Therefore early prediction of lung cancer should play a pivotal role in the diagnosis process and for an effective preventive strategy.


  1. Amorim R, Mirkin B (2012). Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognition, 45,1061-75.
  2. Brennan P, Hainaut P, Boffetta P (2011). Genetics of lung-cancer susceptibility. Lancet Oncol, 12, 399-408.
  3. Ferlay J, Shin HR, Bray F, et al (2010). GLOBOCAN 2008: cancer incidence and mortality worldwide: IARC, 10, 220-7.
  4. Gothwal H, Kedawat S, Kumar R (2011). Cardiac arrhythmias detection in an ECG beat signal using fast fourier transform and artificial neural network. J Bio Sci Engineering, 4, 289-96.
  5. Jayalakshmi T, Santhakumaran A (2010). A novel classification method for classification of diabetes mellitus using artificial neural networks. International Conference on Data Storage and Data Engineering. 159-63
  6. Lan C, Liu Y, Tang Z (2010). Improvement of aprioritid algorithm for mining frequent items[J]. Computer Applications And Software, 27, 234-6.
  7. Manaswini P, Ranjit KS (2011). Predict the onset of diabetes disease using artificial neural network (ANN). Int J Computer Sci & Emerging Technologies, 2, 303-11.
  8. Muhammad ASapon, Khadijah Ismail, Suehazlyn Zainudin (2011). Prediction of diabetes by using artificial neural network. 2011 International Conference on Circuits, System and Simulation, 7, 299-303.
  9. Schmid K, Kuwert T, Drexler H (2010). Radon in indoor spaces: an underestimated risk factor for lung cancer in environmental medicine. Dtsch Arztebl Int, 107, 181-6.
  10. Smith L, Brinton LA, Spitz MR, et al (2012) Body mass index and risk of lung cancer among never, former, and current smokers. J Natl Cancer Inst, 104, 778-89.
  11. Yael Ben-Haim , Elad Tom-Tov (2010) A streaming parallel decision tree algorithm. J Machine Learning Res, 11, 849-72.

Cited by

  1. Prognostic Evaluation of Categorical Platelet-based Indices Using Clustering Methods Based on the Monte Carlo Comparison for Hepatocellular Carcinoma vol.15, pp.14, 2014,
  2. Score Based Risk Assessment of Lung Cancer and its Evaluation for Bangladeshi People vol.15, pp.17, 2014,
  3. Risk Factors for Lung Cancer in the Pakistani Population vol.15, pp.7, 2014,
  4. Serum protein profiles of patients with lung cancer of different histological types vol.12, pp.1, 2015,
  5. Comparative Assessment of the Diagnostic Value of Transbronchial Lung Biopsy and Bronchoalveolar Lavage Fluid Cytology in Lung Cancer vol.16, pp.1, 2015,
  6. Advances in Optimal Detection of Cancer by Image Processing; Experience with Lung and Breast Cancers vol.16, pp.14, 2015,
  7. Association Assessment among Risk Factors and Breast Cancer in a Low Income Country: Bangladesh vol.16, pp.17, 2015,
  8. Study on Theoretical Models of Regional Humanity Lung Cancer Hazards Assessment vol.16, pp.5, 2015,
  9. Pharmacophore Development for Anti-Lung Cancer Drugs vol.16, pp.18, 2016,
  10. Epidemiology of lung cancer and approaches for its prediction: a systematic review and analysis vol.35, pp.1, 2016,
  11. Application of IT in healthcare vol.6, pp.2, 2016,
  12. Cytokine profile determined by data-mining analysis set into clusters of non-small-cell lung cancer patients according to prognosis vol.26, pp.2, 2014,
  13. Molecular understanding of lung cancers–A review vol.4, pp.22211691, 2014,
  14. Depression and Quality of Life among Postmenopausal Women in Bangladesh: A Cross-sectional Study vol.23, pp.3, 2017,