DOI QR코드

DOI QR Code

Frequency Analysis of Scientific Texts on the Hypoxia Using Bibliographic Data

논문 서지정보를 이용한 빈산소수괴 연구 분야의 연구용어 빈도분석

  • Lee, GiSeop (Ocean Data Science Section, KIOST) ;
  • Lee, JiYoung (Marine Environment Research Division, National Institute of Fisheries Science) ;
  • Cho, HongYeon (Ocean Data Science Section, KIOST)
  • 이기섭 (한국해양과학기술원 해양과학데이터운영실) ;
  • 이지영 (국립수산과학원 어장환경과) ;
  • 조홍연 (한국해양과학기술원 해양과학데이터운영실)
  • Received : 2019.03.18
  • Accepted : 2019.06.20
  • Published : 2019.06.30

Abstract

The frequency analysis of scientific terms using bibliographic information is a simple concept, but as relevant data become more widespread, manual analysis of all data is practically impossible or only possible to a very limited extent. In addition, as the scale of oceanographic research has expanded to become much more comprehensive and widespread, the allocation of research resources on various topics has become an important issue. In this study, the frequency analysis of scientific terms was performed using text mining. The data used in the analysis is a general-purpose scholarship database, totaling 2,878 articles. Hypoxia, which is an important issue in the marine environment, was selected as a research field and the frequencies of related words were analyzed. The most frequently used words were 'Organic matter', 'Bottom water', and 'Dead zone' and specific areas showed high frequency. The results of this research can be used as a basis for the allocation of research resources to the frequency of use of related terms in specific fields when planning a large research project represented by single word.

Keywords

HOGBB1_2019_v41n2_107_f0001.png 이미지

Fig. 1. Comparisons of visualized top 100 Bigrams according to the citation weights (a, b); Variations of Bigram distribution (c) and rank (d)

HOGBB1_2019_v41n2_107_f0002.png 이미지

Fig. 2. Bigram frequency including ‘bay’ in Hypoxia articles: non citation weighted

HOGBB1_2019_v41n2_107_f0003.png 이미지

Fig. 3. Bigram frequency including ‘bay’ in Hypoxia articles: non citation weighted

HOGBB1_2019_v41n2_107_f0004.png 이미지

Fig. 4. Bigram frequency including ‘bay’ in Hypoxia articles: citation weighted

HOGBB1_2019_v41n2_107_f0005.png 이미지

Fig. 5. Co-occuring words network during 2001−2010

HOGBB1_2019_v41n2_107_f0006.png 이미지

Fig. 6. Co-occuring words network during last decade (2001−2010)

HOGBB1_2019_v41n2_107_f0007.png 이미지

Fig. 7. Distributions of raw and normalized citation (a, b) and annual variations of total citation (c)

Table 1. Ten Most Frequent words ranked by Inverse Document Frequency (IDF)

HOGBB1_2019_v41n2_107_t0001.png 이미지

References

  1. Lee GS, Cho HY, Han JR (2018) Text mining analysis on the research field of the coastal and ocean engineering based on the SCOPUS bibliographic information. J Korean Soc Coast Ocean Eng 30(1):19-28 https://doi.org/10.9765/KSCOE.2018.30.1.19
  2. Jung YB, Park ES (2015) Keyword analysis of two SCI journals on rock engineering by using text mining. Kor Soc Rock Mech 25(4):303-319
  3. Cho GH, Lim SY, Hur S (2014) An analysis of the research methodologies and techniques in the industrial engineering using text mining. J Kor Ins Ind Eng 40(1):52-59 https://doi.org/10.7232/JKIIE.2014.40.1.052
  4. Cho SG, Kim SB (2012) Finding meaningful pattern of key words in IIE transactions using text mining. J Kor Ins Ind Eng 38(1):67-73 https://doi.org/10.7232/JKIIE.2012.38.1.067
  5. Aria M, Cuccurullo C (2017) Bliometrix: an R-tool for comprehensive science mapping analysis. J Informetr 11(4):959-975 https://doi.org/10.1016/j.joi.2017.08.007
  6. Bartlett MS (1947) The use of transformations. Biometrics 3(1):39-52 https://doi.org/10.2307/3001536
  7. Boussalis C, Coan TG (2016) Text-mining the signals of climate change doubt. Global Environ Chang 36:89-100 https://doi.org/10.1016/j.gloenvcha.2015.12.001
  8. Costa S, Caldeira R (2018) Bibliometric analysis of ocean literacy: an underrated term in the scientific literature. Mar Policy 87:149-157 https://doi.org/10.1016/j.marpol.2017.10.022
  9. Dawei L, Guan-tin C (2018) Wordcloud2: create word cloud by 'htmlwidget'. R package version 0.2.1. https://CRAN.R-project.org/package=wordcloud2 Accessed 21 Feb 2019
  10. Diaz RJ, Rosenberg R (2008) Spreading dead zones and consequences for marine ecosystems. Science 321(5891):926-929 https://doi.org/10.1126/science.1156401
  11. Donoho D (2017) 50 years of data science. J Comput Graph Stat 26(4):745-766 https://doi.org/10.1080/10618600.2017.1384734
  12. Gattuso JP, Dawson NA, Duarte CM, Middelburg JJ (2005) Patterns of publication effort in coastal biogeochemistry: a bibliometric survey (1971 to 2003). Mar Ecol-Prog Ser 294:9-22 https://doi.org/10.3354/meps294009
  13. Gavish M, Donoho D (2012) Three dream applications of verifiable computational results. Comput Sci Eng 14(4):26-31 https://doi.org/10.1109/MCSE.2012.65
  14. Hui I (2017) Shaping the coast with permits: making the state regulatory permitting process transparent with text mining. Coast Manage 45(3):179-198 https://doi.org/10.1080/08920753.2017.1303694
  15. Keeling RF, Kortzinger A, Gruber N (2010) Ocean deoxygenation in a warming world. Annu Rrev Mar Sci 2:199-229 https://doi.org/10.1146/annurev.marine.010908.163855
  16. Landhuis E (2016) Scientific literature: information overload. Nature 535(7612):457-458 https://doi.org/10.1038/nj7612-457a
  17. Meyer D, Hornik K, Feinerer I (2008) Text mining infrastructure in R. J Stat Softw 25(5):1-54
  18. Ming F, Wong F, Liu Z, Chiang M (2014) Stock market prediction from WSJ: text mining via sparse matrix factorization. In: IEEE International Conference on Data Mining, Shenzhen, 14-17 Dec 2014
  19. Oschlies A, Brandt P, Stramma L, Schmidtko S (2018) Drivers and mechanisms of ocean deoxygenation. Nat Geosci 11(7):467-473 https://doi.org/10.1038/s41561-018-0152-2
  20. Peterson RA (2017) Estimating normalization transformations with bestNormalize. https://github.com/petersonR/bestNormalize Accessed 12 Jun 2019
  21. Pita P, Villasante S, Arlinghaus R, Gomes P, Strehlow HV, Veiga P, Vingada J, Hyder K (2018) A matter of scales: does the management of marine recreational fisheries follow the ecosystem approach to fisheries in Europe? Mar Policy 97:61-71 https://doi.org/10.1016/j.marpol.2018.08.039
  22. Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503-520 https://doi.org/10.1108/00220410410560582
  23. Rudd MA (2017) What a decade (2006-15) of journal abstracts can tell us about trends in ocean and coastal sustainability challenges and solutions. Front Mar Sci 4:170. doi:10.3389/fmars.2017.00170
  24. Rudis B, Embrey B (2016) Pluralize: pluralize and singularize any (English) word, R package version 0.1.0. https://github.com/hrbrmstr/pluralize Accessed 21 Feb 2019
  25. Silge J, Robinson D (2016) Tidytext: text mining and analysis using tidy data principles in R. JOSS 1(3):37. doi:10.21105/joss.00037
  26. Sparck JK (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11-21 https://doi.org/10.1108/eb026526
  27. Sun J, Wang MH, Ho YS (2012) A historical review and bibliometric analysis of research on estuary pollution. Mar Pollut Bull 64(1):13-21 https://doi.org/10.1016/j.marpolbul.2011.10.034
  28. Tapio I, Fischer D, Blasco L, Tapio M, Wallace RJ, Bayat AR, Ventto L, Kahala M, Negussie E, Shingrield KJ, Vilkki J (2017) Taxon abundance, diversity, co-occurrence and network analysis of the ruminal microbiota in response to dietary changes in dairy cows. PLoS One 12(7):e0180260 https://doi.org/10.1371/journal.pone.0180260
  29. Van der Waerden BL (1952) Order tests for the two-sample problem and their power. Indagat Math 55:453-458 https://doi.org/10.1016/S1385-7258(52)50063-5
  30. Van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA (2006) A text-mining analysis of the human phenome. Eur J Hum Genet 14(5):535-542 https://doi.org/10.1038/sj.ejhg.5201585
  31. Westergaard D, Staerfeldt HH, Tonsberg C, Jensen LJ, Brunak S (2018) A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput Biol 14(2):e1005962. doi:10.1371/journal.pcbi.1005962
  32. Wickham H (2017) Tidyverse: easily install and load the 'Tidyverse'. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse Accessed 21 Feb 2019
  33. Wu Y, Xie L, Huang SL, Li P, Yuan Z, Liu W (2018) Using social media to strengthen public awareness of wildlife conservation. Ocean Coast Manage 153:76-83 https://doi.org/10.1016/j.ocecoaman.2017.12.010
  34. Yang D, Kleissl J, Gueymard CA, Pedro HT, Coimbra CF (2018) History and trends in solar irradiance and PV power forecasting: a preliminary assessment and review using text mining. Sol Energy 168:60-101 https://doi.org/10.1016/j.solener.2017.11.023
  35. Zhang W, Qian W, Ho YS (2009) A bibliometric analysis of research related to ocean circulation. Scientometrics 80(2):305-316 https://doi.org/10.1007/s11192-007-1863-0