Improving the Accuracy of Early Diagnosis of Thyroid Nodule Type Based on the SCAD Method

  • Shahraki, Hadi Raeisi (Department of Biostatistics, Shiraz University of Medical Sciences) ;
  • Pourahmad, Saeedeh (Department of Biostatistics, Shiraz University of Medical Sciences) ;
  • Paydar, Shahram (Trauma Research Center, Department of Surgery, Shiraz University of Medical Sciences) ;
  • Azad, Mohsen (Mother and Child Welfare Research Center, Hormozgan University of Medical Sciences)
  • Published : 2016.06.01


Although early diagnosis of thyroid nodule type is very important, the diagnostic accuracy of standard tests is a challenging issue. We here aimed to find an optimal combination of factors to improve diagnostic accuracy for distinguishing malignant from benign thyroid nodules before surgery. In a prospective study from 2008 to 2012, 345 patients referred for thyroidectomy were enrolled. The sample size was split into a training set and testing set as a ratio of 7:3. The former was used for estimation and variable selection and obtaining a linear combination of factors. We utilized smoothly clipped absolute deviation (SCAD) logistic regression to achieve the sparse optimal combination of factors. To evaluate the performance of the estimated model in the testing set, a receiver operating characteristic (ROC) curve was utilized. The mean age of the examined patients (66 male and 279 female) was $40.9{\pm}13.4years$ (range 15- 90 years). Some 54.8% of the patients (24.3% male and 75.7% female) had benign and 45.2% (14% male and 86% female) malignant thyroid nodules. In addition to maximum diameters of nodules and lobes, their volumes were considered as related factors for malignancy prediction (a total of 16 factors). However, the SCAD method estimated the coefficients of 8 factors to be zero and eliminated them from the model. Hence a sparse model which combined the effects of 8 factors to distinguish malignant from benign thyroid nodules was generated. An optimal cut off point of the ROC curve for our estimated model was obtained (p=0.44) and the area under the curve (AUC) was equal to 77% (95% CI: 68%-85%). Sensitivity, specificity, positive predictive value and negative predictive values for this model were 70%, 72%, 71% and 76%, respectively. An increase of 10 percent and a greater accuracy rate in early diagnosis of thyroid nodule type by statistical methods (SCAD and ANN methods) compared with the results of FNA testing revealed that the statistical modeling methods are helpful in disease diagnosis. In addition, the factor ranking offered by these methods is valuable in the clinical context.


Supported by : Shiraz University of Medical Sciences Research Council


  1. Fan J, Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Statistical Associat, 96, 1348-60.
  2. Finley DJ, Zhu B, Barden CB, et al (2004). Discrimination of benign and malignant thyroid nodules by molecular profiling. Ann Surg, 240, 425.
  3. Ghosh D, Chinnaiyan AM (2005). Classification and selection of biomarkers in genomic data using LASSO. Bio Med Res Int, 2005, 147-54.
  4. Hong Y, Liu X, Li Z, et al (2009). Real-time ultrasound elastography in the differential diagnosis of benign and malignant thyroid nodules. J Ultrasound Med, 28, 861-7.
  5. Lin H, Zhou L, Peng H, et al (2011). Selection and combination of biomarkers using ROC method for disease classification and prediction. Canadian J Statistics, 39, 324-43.
  6. Ma S, Huang J (2008). Penalized feature selection and classification in bioinformatics. Briefings Bioinformatics, 9, 392-403.
  7. Mansiaux Y, Carrat F (2014). Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections. BMC Med Res Methodol, 14, 99.
  8. Mendonca LF, Vieira SM, Sousa J (2007). Decision tree search methods in fuzzy modeling and classification. Int J Approximate Reason, 44, 106-23.
  9. Pourahmad S, Azad M, Paydar S (2015). Diagnosis of malignancy in thyroid tumors by multi-layer perceptron neural networks with different batch learning algorithms. Global J Health Sci, 7, 46.
  10. Shahraki H, Salehi A, Zare N (2014). Survival prognostic factors of male breast cancer in Southern Iran: a LASSO-Cox regression approach. Asian Pac J Cancer Prev, 16, 6773-7.
  11. Talhaa M, Al-Elaiwi A (2013). Enhancement and classification of mammographic images for breast cancer diagnosis using statistical algorithms. Life Sci J, 10, 764-772.
  12. Yan F-R, Lin J-G, Liu Y (2011). Sparse logistic regression for diagnosis of liver fibrosis in rat by using SCAD-penalized likelihood. Bio Med Res Int, 8, 875309.
  13. Yang F, Wang H-z, Mi H, et al (2009). Using random forest for reliable classification and cost-sensitive learning for medical diagnosis. BMC Bioinformatics, 10, 22.
  14. Zhang GP, Berardi VL (1998). An investigation of neural networks in thyroid function diagnosis. Health Care Management Sci, 1, 29-37.

Cited by

  1. Important Neighbors: A Novel Approach to Binary Classification in High Dimensional Data vol.2017, pp.2314-6141, 2017,