DOI QR코드

DOI QR Code

Trend Analysis of Thyroid Cancer Research in Korea with Text Mining Techniques

  • Lee, Tae-Gyeong (Dept. of Applied Mathematics, Kumoh National Institute of Technology) ;
  • Heo, Seong-Min (Dept. of Applied Mathematics, Kumoh National Institute of Technology) ;
  • Shin, Seung-Hyeok (Dept. of Applied Mathematics, Kumoh National Institute of Technology) ;
  • Yang, Ji-Yeon (Dept. of Applied Mathematics, Kumoh National Institute of Technology)
  • Received : 2018.10.18
  • Accepted : 2018.11.10
  • Published : 2018.12.31

Abstract

In this paper, we propose a text-centered approach to identify the research trend of thyroid cancer in Korea. We incorporate statistical analysis, text mining and machine learning techniques with our clinical insights to find connective associations between terminologies and to discover informative clusters of literatures. The incidence of thyroid cancer in Korea increased rapidly in the 2000s, which fueled the debate regarding overdiagnosis, but recently the number of patients undergoing surgery has decreased significantly due to conscious reform efforts from various circles. We analyzed the abstracts and keywords of related research papers from DBpia. It was found that most were case reports in the 1980s, and some papers in the 1990s discussed the early detection of thyroid cancer by mass screening. While many papers focused on different diagnostic techniques and the detection of small cancers in the 2000s, many emphasized more on the quality of life of patients in the 2010s. There was an apparent change in the topics of thyroid cancer research over past decades. The results of this study would serve as a reference guide for current and future research directions.

Keywords

CPTSCQ_2018_v23n12_153_f0001.png 이미지

Fig. 1. The number of publications in domestic journals over time

CPTSCQ_2018_v23n12_153_f0002.png 이미지

Fig. 4. Dendrogram from hierarchical clustering of keywords with cluster annotations

CPTSCQ_2018_v23n12_153_f0003.png 이미지

Fig. 2. Word clouds of keywords by decade

CPTSCQ_2018_v23n12_153_f0004.png 이미지

Fig. 3. (a) Complete and (b) simple social networks of keywords with cluster annotations

Table 1. The number of publications and the average length of abstracts by discipline of the journals

CPTSCQ_2018_v23n12_153_t0001.png 이미지

Table 2. The top correlated words with specific main terminologies

CPTSCQ_2018_v23n12_153_t0002.png 이미지

Table 3. Frequency of appearances of statistical related terms in abstracts

CPTSCQ_2018_v23n12_153_t0003.png 이미지

References

  1. K. Jung, Y. Won, H. Kong, and E. Lee, "Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2015," Cancer Research and Treatment: Official Journal of Korean Cancer Association, Vol. 50, No. 2, pp. 303-316, Mar. 2018. https://doi.org/10.4143/crt.2018.143
  2. International Agency for Research on Cancer, "GLOBOCAN 2012: Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2012 v1.0", http://globocan.iarc.fr
  3. H. Ahn, H. Kim, and H. Welch, "Korea's thyroid cancer epidemic-screening & overdiagnosis," The New England Journal of Medicine, Vol. 371, No. 19, pp. 1765-1767, Sep. 2014. https://doi.org/10.1056/NEJMp1409841
  4. L. Davies, "Overdiagnosis of thyroid cancer," BMJ: British Medical Journal (Online), Vol. 355, Nov. 2016.
  5. S. Jegerlehner, J. L. Bulliard, D. Aujesky, N. Rodondi, S. Germann, I. Konzelmann, A. C. Chiolero, and NICER Working Group, "Over- diagnosis and overtreatment of thyroid cancer: a population-based temporal trend study," Vol. 12, No. 6, Jun. 2017.
  6. National Health Insurance Service, "Main Surgery Statistical Yearbook for 2016," National Health Insurance Service, 2017. http://www.nhis.or.kr/bbs7/boards/B0079/22737
  7. M. A. Hearst, "Untangling text data mining," Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 3-10, Jun. 1999.
  8. H. Kim, D. Kim, and J. Jo, "Patent data analysis using clique analysis in a keyword network," Journal of the Korean Data and Information Science Society, Vol. 27, No. 5, pp. 1273-1284, Sep. 2016. https://doi.org/10.7465/jkdi.2016.27.5.1273
  9. Y. Hyun, J. Kim, J. Jeong, S. Yun, and M. Lee, "Text mining on internet-news regarding climate change and food," Journal of the Korean Data and Information Science Society, Vol. 26, No. 2, pp. 419-427, Mar. 2015. https://doi.org/10.7465/jkdi.2015.26.2.419
  10. W. S. Cho, A. Cho, K. Kwon, K. and H. Yoo, "Implementation of smart Chungbuk tourism based on SNS data analysis," Journal of the Korean Data and Information Science Society, Vol. 26, No. 2, pp. 409-418, Mar. 2015. https://doi.org/10.7465/jkdi.2015.26.2.409
  11. B. Kang, M. Huh, and S. Choi, "Performance analysis of volleyball games using the social network and text mining techniques," Journal of the Korean Data and Information Science Society, Vol. 26, No. 3, pp. 619-630, May 2015. https://doi.org/10.7465/jkdi.2015.26.3.619
  12. J. Lee and M. Lee, "Big data-based information recommendation system," Journal of the Korea Institute of Information and Communication Engineering, Vol. 22, No. 3, pp. 443-450, Mar. 2018. https://doi.org/10.6109/JKIICE.2018.22.3.443
  13. H. Park, M. Lee, S. Hwang, and S. Oh, "TF-IDF based association rule analysis system for medical data," KIPS Transactions on Software and Data Engineering, Vol. 5, No. 3, pp. 145-154, Mar. 2016. https://doi.org/10.3745/KTSDE.2016.5.3.145
  14. J. Kim, H. Kim, Y. Yeo, M. Shin, and S. Park, "Inferring disease-related genes using title and body in biomedical text," KIISE Transactions on Computing Practices, Vol. 23, No. 1, pp. 28-36, Jan. 2017. https://doi.org/10.5626/KTCP.2017.23.1.28
  15. S. Choi, S. Yoo, and H. Cho, "A study on the semiautomatic construction of domain-specific relation extraction datasets from biomedical abstracts - mainly focusing on a genic interaction dataset in Alzheimer's disease domain," Journal of Korean Library and Information Science Society, Vol. 47, No. 4, pp. 289-307, Dec. 2016. https://doi.org/10.16981/kliss.47.4.201612.289
  16. G. Jang, Y. Hwang, M. Oh, T. Lee, and Y. Yoon, "Novel Drug Similarity Measuring Method based on Text Mining for Predicting Similar Drugs," The Journal of Korean Institute of Information Technology, Vol. 14, No. 7, pp. 127-137, Jul. 2016.
  17. H. Ahn, M. Song, and G. E. Heo, "Inferring undiscovered public knowledge by using text mining analysis and main path analysis: the case of the gene-protein brings about chains of pancreatic cancer," Journal of the Korean BIBLIA Society for library and Information Science, Vol. 26, No. 1, pp. 217-231, Jan. 2015. https://doi.org/10.14699/KBIBLIA.2015.26.1.217
  18. M. J. Lee and J. W. Kim, "Design and Implementation of the Menu Navigation using Social Network Analysis among the Menus of Management Information System," Journal of the Korea Society of Computer and Information, Vol. 19, No. 9, pp. 151-160, Sep. 2014. https://doi.org/10.9708/jksci.2014.19.9.151
  19. S. J. Oh and M. K. Won, "Using Text Mining Techniques for Intrusion Detection Problem in Computer Network," Journal of the Korea Society of Computer and Information, Vol. 10, No. 5, pp. 27-32, Nov. 2005.
  20. S. J. Oh and C. W. Park, "Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory," Journal of the Korea Society of Computer and Information, Vol. 14, No. 6, pp. 135-142, Jun. 2009.
  21. Korean Thyroid Association. "Revised Korean thyroid association management guidelines for patients with thyroid nodules and thyroid cancer," Journal of the Korean Society of Radiology, Vol. 64, No. 4, pp. 389-416, Dec. 2010.

Cited by

  1. 국내 갑상선암 논문 토픽에 대한 융합연구 vol.10, pp.2, 2018, https://doi.org/10.15207/jkcs.2019.10.2.075