종양 이질성을 검정을 위한 통계적 방법론 연구

• Lee, Dong Neuck (Department of Applied Statistics, Chung-Ang University) ;
• Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
• 이동녘 (중앙대학교 응용통계학과) ;
• 임창원 (중앙대학교 응용통계학과)
• Received : 2018.10.25
• Accepted : 2019.01.03
• Published : 2019.06.30

Abstract

Understanding the tumor heterogeneity due to differences in the growth pattern of metastatic tumors and rate of change is important for understanding the sensitivity of tumor cells to drugs and finding appropriate therapies. It is often possible to test for differences in population means using t-test or ANOVA when the group of N samples is distinct. However, these statistical methods can not be used unless the groups are distinguished as the data covered in this paper. Statistical methods have been studied to test heterogeneity between samples. The minimum combination t-test method is one of them. In this paper, we propose a maximum combinatorial t-test method that takes into account combinations that bisect data at different ratios. Also we propose a method based on the idea that examining the heterogeneity of a sample is equivalent to testing whether the number of optimal clusters is one in the cluster analysis. We verified that the proposed methods, maximum combination t-test method and gap statistic, have better type-I error and power than the previously proposed method based on simulation study and obtained the results through real data analysis.

File

Figure 3.1. Power across diﬀerences between means (k = 2). red: Gap statistic; green: minimum combination t-test; blue: maximum combination t-test.

Figure 3.2. Power across diﬀerences between means (k = 3). red: Gap statistic; green: minimum combination t-test; blue: maximum combination t-test.

Figure 4.1. Gap statistic across the numbers of cluster (k), patient number: 1–4.

Figure 4.2. Gap statistic across the numbers of cluster (k), patient number: 5–10.

Figure 4.3. Results for the real data set using gap statistic and maximum combination t-test, patient number:1, 2.

Figure 4.4. Results for the real data set using gap statistic and maximum combination t-test, patient number:3, 5.

Figure 4.5. Results for the real data set using gap statistic and maximum combination t-test, patient number:7–9.

Table 3.1. The probability of making a type I error

Table 4.1. Results for the real data set using Gap statistic and Minimum combination t-test

Table 4.2. Results for the real data set using maximum combination t-test

References

1. Baker, F. B. and Hubert L. J. (1976). A graph-theoretic approach to goodness-of-fit in complete-link hierarchical clustering, Journal of the American Statistical Association, 71, 870-878. https://doi.org/10.1080/01621459.1976.10480961
2. Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, 224-227. https://doi.org/10.1109/TPAMI.1979.4766909
3. Dunn, J. C. (1974). Well-separated clusters and optimal Fuzzy partitions, Journal of Cybernetics, 4, 95-104. https://doi.org/10.1080/01969727408546059
4. Dunn, O. J. (1961). Multiple comparisons among means, Journal of the American Statistical Association, 56, 52-64. https://doi.org/10.1080/01621459.1961.10482090
5. Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance, Transactions of the Royal Society of Edinburgh, 52, 399-433.
6. Hartigan, J. and Wong, M. (1979). Algorithm AS 136: A K-means clustering algorithm, Journal of the Royal Statistical Society Series C (Applied Statistics), 28, 100-108.
7. Heo, M. and Lim, C. (2017). A minimum combination t-test method for testing differences in population means based on a group of samples of size one, The Korean Journal of Applied Statistics, 30, 301-309. https://doi.org/10.5351/KJAS.2017.30.2.301
8. Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis, Journal of the American Statistical Association, 47, 583-621. https://doi.org/10.1080/01621459.1952.10483441
9. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2018). Cluster: Cluster analysis basics and extensions, R package version 2.0.7-1.
10. Rousseeuw, P. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
11. Student (1908). The probable error of a mean, Biometrika, 6, 1-25. https://doi.org/10.1093/biomet/6.1.1
12. Thorndike, R. L. (1953). Who belongs in the family?, Psychometrika, 18, 267. https://doi.org/10.1007/BF02289263
13. Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of data clusters via the gap statistic, Journal of the Royal Statistical Society, 63, 411-423. https://doi.org/10.1111/1467-9868.00293
14. Yan, M. and Ye, K. (2007). Determining the number of clusters using the weighted gap statistic, Biometrics, 63, 1031-1037. https://doi.org/10.1111/j.1541-0420.2007.00784.x
15. Yoo, J., Kim, Y., Lim, C., Heo, M., Hwang, I., and Chong, S. (2017). Assessment of Spatial Tumor Het-erogeneity using CT Phenotypic Features Estimated by Semi-Automated 3D CT Volumetry of Multiple Pulmonary Metastatic Nodules: A Preliminary Study, unpublished manuscript.