Application of Data Mining Techniques to Explore Predictors of HCC in Egyptian Patients with HCV-related Chronic Liver Disease

  • Published : 2015.02.04


Background:Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts. Methods: This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC. Results: The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ${\geq}50.3ng/ml$ was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC. Conclusion: Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (${\geq}50.3ng/ml$). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.


  1. Altekruse SF, McGlynn KA, Reichman ME ( 2009 ) Hepatocellular carcinoma incidence, mortality, and survival trends in the United States from 1975 to 2005. J Clin Oncol, 27, 1485-91.
  2. Arguedas MR, Chen VK, Eloubeidi MA, Fallon MB (2003). Screening for hepatocellular carcinoma in patients with hepatitis C cirrhosis: a cost-utility analysis. Am J Gastroenterol, 98, 679-90.
  3. Averbook BJ, Fu P, Rao JS, Mansour EG (2002) A long-term analysis of 1018 patients with melanoma by classic Cox regression and tree structured survival analysis at a major referral center: Implications on the future of cancer staging. Surgery, 132, 589-602.
  4. Baquerizo A1, Anselmo D, Shackleton C, et al (2003) Phosphorus as an early predictive factor in patients with acute liver failure. Transplantation, 75, 2007-14.
  5. Bosch FX, Ribes J, Cleries R, Diaz M (2005). Epidemiology of hepatocellular carcinoma. Clin Liver Dis, 9, 191-211.
  6. Breiman LJH, Friedman RA, Olshen CJ, Stone CM (1980). Classification and regression trees. CA: Wadsworth. ???
  7. Bruix J1, Sherman M, Llovet JM, et al (2001). Clinical management of hepatocellular carcinoma. Conclusions of the Barcelona-2000 EASL conference. European Association for the Study of the Liver. J Hepatol, 35, 421-30.
  8. Bruix J, Sherman M. (2011). American Association for the Study of Liver Diseases. Management of hepatocellular carcinoma: an update. Hepatology, 53, 1020-2.
  9. Daniele B, Bencivenga A, Megna AS, Tinessa V (2004). Alpha fetoprotein and ultrasonography screening for hepatocellular carcinoma. Gastroenterology, 127, 108-12
  10. Degos F1, Christidis C, Ganne-Carrie N, et al (2000). Hepatitis C virus related cirrhosis: time to occurrence of hepatocellular carcinoma and death. Gut, 47, 131-6.
  11. El-Serag HB (2002). Hepatocellular Carcinoma and Hepatitis C in the United States. Hepatology, 36, 74-83
  12. Ezzat S, Abdel-Hamid M, Eissa SA, et al (2005). Associations of pesticides, HCV, HBV, and hepatocellular carcinoma in Egypt. Int J Hyg Environ Health, 208, 329-39
  13. Ferlay J BF, Pisani P, Globocan (2000). Cancer incidence, mortality and prevalence worldwide. IARC Cancer Base No. 5, IARC Nonserial Publication, 2001.
  14. Franca AV, Elias Junior J, Lima BL, et al (2004). Diagnosis, staging and treatment of hepatocellular carcinoma. Braz J Med Biol Res, 37, 1689-1705
  15. Frank C1, Mohamed MK, Strickland GT, et al (2000).The role of parenteral antischistosomal therapy in the spread of hepatitis C virus in Egypt. Lancet, 355, 887-91.
  16. Garzotto M, Beer TM, Hudson RG, et al (2005). Improved detection of prostate cancer using classification and regression tree analysis. J Clin Oncol, 23, 4322-9.
  17. Han J, Kamber M ( 2006). Data mining: concepts and techniques. San Fransisco San, CA: Morgan Kaufmann Publishers. , 550.
  18. Kurosaki M, Matsunaga K, Hirayama I, et al (2010). predictive model of response to peginterferon ribavirin in chronic hepatitis C using classification and regression tree analysis. Hepatol Res, 40, 251-60.
  19. LeBlanc M, Crowley J. (1995). A review of tree-based prognostic models. Cancer Treat Res, 75, 113-24.
  20. Miyaki K, Takei I, Watanabe K, Nakashima H, Omae K (2002). Novel statistical classification model of type 2 diabetes mellitus patients for tailor-made prevention using data mining algorithm. J Epidemiol, 12, 243-8.
  21. Mokhtar N, Gouda I, Adel I (2007). Cancer Pathology Registry and time trend analysis (2003-2004). PP.76
  22. Omata M, Lesmana LA, Tateishi R, et al (2010). Asian Pacific Association for the Study of the Liver consensus recommendations on hepatocellular carcinoma. Hepatol Int, 4, 439-74
  23. Parkin DM, Bray F, Ferlay J, Pisani P (2001). Estimating the world cancer burden: Globocan 2000. Int J Cancer, 94, 153-6.
  24. Salim EI1, Moore MA, Al-Lawati JA, et al (2009). Cancer epidemiology and control in the Arab world - past, present and future. Asian Pac J Cancer Prev, 10, 3-16.
  25. Saar B, Kellner-Weldon F (2008). Radiological diagnosis of hepatocellular carcinoma. Liver Int, 28, 189-99.
  26. Saeed NM, Bawazir AA, Al-Zuraiqi M, Al-Negri F, Yunus F (2012). Why is hepatocellular carcinoma less attributable to viral hepatitis in Yemen? Asian Pac J Cancer Prev, 13, 3663-7.
  27. Shiratori Y, Shiina S, Imamura M, et al (1995). Characteristic difference of hepatocellular carcinoma between hepatitis B- and C- viral infection in Japan. Hepatology, 22, 1027-33.
  28. Singal AG, Conjeevaram HS, Volk ML, et al (2012). Effectiveness of hepatocellular carcinoma surveillance in patients with cirrhosis. Cancer Epidemiol Biomarkers Prev, 21, 793-9.
  29. Strickland GT, Elhefni H, Salman T, et al (2002). Role of hepatitis C infection in chronic liver disease in Egypt. Am J Trop Med Hyg, 67, 436-42.
  30. Tao LY, Cai L, He XD, Liu W, Qu Q (2010). Comparison of serum tumor markers for intrahepatic cholangiocarcinoma and hepatocellular carcinoma. Am Surg,76, 1210-3.
  31. Valera VA1, Walter BA, Yokoyama N, et al (2007). Prognostic groups in colorectal carcinoma patients based on tumor cell proliferation and classification and regression tree (CART) survival analysis. Ann Surg Oncol, 14, 34-40.
  32. Witten IH, Frank E (2005). Data mining: practical machine learning tools and techniques. San Francisco, CA: Morgan Kaufmann Publishers, p 525.
  33. Wong GLl, Wong VW, Tan GM, et al (2008). Surveillance programme for hepatocellular carcinoma improves the survival of patients with chronic viral hepatitis. Liver Int, 28, 79-87

Cited by

  1. Fibro markers for prediction of hepatocellular carcinoma in Egyptian patients with chronic liver disease vol.89, pp.6, 2016,
  2. Determinants of Infection Outcome in HCV-Genotype 4 vol.30, pp.8, 2017,
  3. Application of data mining techniques to explore predictors of upper urinary tract damage in patients with neurogenic bladder vol.50, pp.10, 2017,