DOI QR코드

DOI QR Code

Statistical methods for testing tumor heterogeneity

종양 이질성을 검정을 위한 통계적 방법론 연구

  • Lee, Dong Neuck (Department of Applied Statistics, Chung-Ang University) ;
  • Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
  • 이동녘 (중앙대학교 응용통계학과) ;
  • 임창원 (중앙대학교 응용통계학과)
  • Received : 2018.10.25
  • Accepted : 2019.01.03
  • Published : 2019.06.30

Abstract

Understanding the tumor heterogeneity due to differences in the growth pattern of metastatic tumors and rate of change is important for understanding the sensitivity of tumor cells to drugs and finding appropriate therapies. It is often possible to test for differences in population means using t-test or ANOVA when the group of N samples is distinct. However, these statistical methods can not be used unless the groups are distinguished as the data covered in this paper. Statistical methods have been studied to test heterogeneity between samples. The minimum combination t-test method is one of them. In this paper, we propose a maximum combinatorial t-test method that takes into account combinations that bisect data at different ratios. Also we propose a method based on the idea that examining the heterogeneity of a sample is equivalent to testing whether the number of optimal clusters is one in the cluster analysis. We verified that the proposed methods, maximum combination t-test method and gap statistic, have better type-I error and power than the previously proposed method based on simulation study and obtained the results through real data analysis.

전이성 종양의 성장패턴 차이와 변화율에 따른 종양 이질성(tumor heterogeneity)을 파악하는 것은 종양세포의 약물에 대한 민감성을 파악하고 적절한 치료법을 찾아내기 위해 중요하다. 일반적으로 N개의 표본의 집단이 구분된다면 t-test 혹은 ANOVA 분석을 통해 집단별 평균의 차이에 대한 검정이 가능하다. 그러나 본 논문에서 다루는 데이터와 같이 집단이 구분되지 않는 경우 이러한 방법들은 사용될 수 없다. 표본들 사이의 이질성을 검정하기 위한 통계적 방법들이 연구되어 왔다. 최소 조합 t-검정 방법은 그 중 하나이다. 본 논문에서는 상이한 비율로 데이터를 양분하는 조합도 고려하는 최대 조합 t-검정 방법을 제안한다. 한편, 표본의 이질성을 검정하는 것이 군집분석에서 최적의 군집의 개수가 2개 이상인지를 검정하는 것과 같음에 착안하여 새로운 방법을 제안한다. 최대 조합 t-검정과 gap통계량을 이용하면 이전에 제안된 방법보다 개선된 제1종의 오류를 범할 확률과 검정력을 갖는다는 것을 모의실험을 통해 확인하였고 실제 자료 분석을 통해 결과를 도출하였다.

Keywords

GCGHDE_2019_v32n3_331_f0001.png 이미지

Figure 3.1. Power across differences between means (k = 2). red: Gap statistic; green: minimum combination t-test; blue: maximum combination t-test.

GCGHDE_2019_v32n3_331_f0002.png 이미지

Figure 3.2. Power across differences between means (k = 3). red: Gap statistic; green: minimum combination t-test; blue: maximum combination t-test.

GCGHDE_2019_v32n3_331_f0003.png 이미지

Figure 4.1. Gap statistic across the numbers of cluster (k), patient number: 1–4.

GCGHDE_2019_v32n3_331_f0004.png 이미지

Figure 4.2. Gap statistic across the numbers of cluster (k), patient number: 5–10.

GCGHDE_2019_v32n3_331_f0005.png 이미지

Figure 4.3. Results for the real data set using gap statistic and maximum combination t-test, patient number:1, 2.

GCGHDE_2019_v32n3_331_f0006.png 이미지

Figure 4.4. Results for the real data set using gap statistic and maximum combination t-test, patient number:3, 5.

GCGHDE_2019_v32n3_331_f0007.png 이미지

Figure 4.5. Results for the real data set using gap statistic and maximum combination t-test, patient number:7–9.

Table 3.1. The probability of making a type I error

GCGHDE_2019_v32n3_331_t0001.png 이미지

Table 4.1. Results for the real data set using Gap statistic and Minimum combination t-test

GCGHDE_2019_v32n3_331_t0002.png 이미지

Table 4.2. Results for the real data set using maximum combination t-test

GCGHDE_2019_v32n3_331_t0003.png 이미지

References

  1. Baker, F. B. and Hubert L. J. (1976). A graph-theoretic approach to goodness-of-fit in complete-link hierarchical clustering, Journal of the American Statistical Association, 71, 870-878. https://doi.org/10.1080/01621459.1976.10480961
  2. Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, 224-227. https://doi.org/10.1109/TPAMI.1979.4766909
  3. Dunn, J. C. (1974). Well-separated clusters and optimal Fuzzy partitions, Journal of Cybernetics, 4, 95-104. https://doi.org/10.1080/01969727408546059
  4. Dunn, O. J. (1961). Multiple comparisons among means, Journal of the American Statistical Association, 56, 52-64. https://doi.org/10.1080/01621459.1961.10482090
  5. Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance, Transactions of the Royal Society of Edinburgh, 52, 399-433. https://doi.org/10.1017/S0080456800012163
  6. Hartigan, J. and Wong, M. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Society Series C (Applied Statistics), 28, 100-108.
  7. Heo, M. and Lim, C. (2017). A minimum combination t-test method for testing differences in population means based on a group of samples of size one, The Korean Journal of Applied Statistics, 30, 301-309. https://doi.org/10.5351/KJAS.2017.30.2.301
  8. Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis, Journal of the American Statistical Association, 47, 583-621. https://doi.org/10.1080/01621459.1952.10483441
  9. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2018). Cluster: Cluster analysis basics and extensions, R package version 2.0.7-1.
  10. Rousseeuw, P. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  11. Student (1908). The probable error of a mean, Biometrika, 6, 1-25. https://doi.org/10.1093/biomet/6.1.1
  12. Thorndike, R. L. (1953). Who belongs in the family?, Psychometrika, 18, 267. https://doi.org/10.1007/BF02289263
  13. Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of data clusters via the gap statistic, Journal of the Royal Statistical Society, 63, 411-423. https://doi.org/10.1111/1467-9868.00293
  14. Yan, M. and Ye, K. (2007). Determining the number of clusters using the weighted gap statistic, Biometrics, 63, 1031-1037. https://doi.org/10.1111/j.1541-0420.2007.00784.x
  15. Yoo, J., Kim, Y., Lim, C., Heo, M., Hwang, I., and Chong, S. (2017). Assessment of Spatial Tumor Het-erogeneity using CT Phenotypic Features Estimated by Semi-Automated 3D CT Volumetry of Multiple Pulmonary Metastatic Nodules: A Preliminary Study, unpublished manuscript.