DOI QR코드

DOI QR Code

Status and Quality Analysis on the Biodiversity Data of East Asian Vascular Plants Mobilized through the Global Biodiversity Information Facility (GBIF)

세계생물다양성정보기구(GBIF)에 출판된 동아시아 관속식물 생물다양성 정보 현황과 자료품질 분석

  • Chang, Chin-Sung (Department of Forest Sciences and The Arboretum, Seoul National University) ;
  • Kwon, Shin-Young (Department of Forest Sciences and The Arboretum, Seoul National University) ;
  • Kim, Hui (Department of Pharmaceutical Resources, Mokpo National University)
  • 장진성 (서울대학교 산림과학부) ;
  • 권신영 (서울대학교 산림과학부) ;
  • 김휘 (목포대학교 식의약자원개발학과)
  • Received : 2021.04.25
  • Accepted : 2021.05.28
  • Published : 2021.06.30

Abstract

Biodiversity informatics applies information technology methods in organizing, accessing, visualizing, and analyzing primary biodiversity data and quantitative data management through the scientific names of accepted names and synonyms. We reviewed the GBIF data published by China, Japan, Taiwan, and internal institutes, such as NIBR, NIE, and KNA of the Republic of Korea, and assessed data in diverse aspects of data quality using BRAHMS software. Most data from four Asian countries have quality problems with the lack of data consistency and missing information on georeferenced data, collectors, collection date, and place names (gazetteers) or other invalid data forms. The major problem is that biodiversity management institutions in East Asia are using unstructured databases and simple spreadsheet-type data. Owing to the nature of the biodiversity information, if data relationships are not structured, it would be impossible to secure the data integrity of scientific names, human names, geographical names, literature, and ecological information. For data quality, it is essential to build data integrity for database management and training systems for taxonomists who are continuous data managers to correct errors. Thus, publishers in East Asia play an essential role not only in using specialized software to manage biodiversity data but also in developing structured databases and ensuring their integration and value within biodiversity publishing platforms.

생물다양성정보학(Biodiversity Informatics)은 정보과학을 생물다양성정보에 접목한 분야로 정이명으로 구성된 학명을 비롯한 종정보를 기초로 일차종발생자료를 구축하고 이를 활용한다. 본 연구에서는 생물다양성 정보의 이용적합도를 기준으로 세계생물다양성정보기구(GBIF)에 출판된 동북아시아 자료의 품질을 BRAHMS 프로그램을 이용하여 평가하고 이를 통해 생물다양성자료 정제의 필요성을 확인하였다. 국립생물자원관, 국립생태원, 국립수목원 등의 국내 생물다양성 관련기관과 더불어 일본, 중국, 대만의 출판 자료는 자료정제과정의 문제로 학명, 지리정보, 채집자, 날짜 등에 대한 오류가 확인된다. 기본적인 속성자료에서 오류가 발생하는 원인은 동아시아의 생물다양성관리기관들이 구조화되지 않은 데이터베이스를 사용하고 평면적인 스프레드시트형 정보를 사용하기 때문이다. 생물다양성 정보 특성상 다양한 정보가 구조화가되지 않을 경우 학명, 인명, 지명, 문헌, 생태정보에 대한 데이터 무결성을 해결하지 못한다. 동아시아 생물다양성정보 관리문제를 극복하기 위해서는 자료의 구조화와 함께 자료정제에 대한 이해도를 높이고, 오류 수정을 위한 지속적인 자료 관리자인 전문 분류학자 양성이 필요하다. 생물다양성 정보관리자는 오류 원인분석을 통해 문서화된 관리 지침을 수정, 추가하는 등 향후 오류 예방을 위한 대책이 필요하며 시스템에 적용시켜야 한다. 이런 모든 과정은 데이터베이스를 기반으로 진행되고 기록되어야 한다. 동아시아의 생물다양성 출판자들은 현재 수준의 단순한 자료구조보다는 생물다양성 정보 관리를 위해 전문적인 선진 프로그램의 사용 혹은 이에 준하는 수준의 고도화된 데이터베이스의 개발이 필요하다.

Keywords

References

  1. Anderson, R.P., Araujo, M., Guisan, A., Lobo, J.M., Martinez-Meyer, E., Peterson, A.T. and Soberon, J. 2016. Final report of the task group on GBIF data fitness for use in distribution modelling. Global Biodiversity Information Facility. Copenhagen. pp. 27.
  2. Anderson, R.P., Araujo, M.B., Guisan, A., Lobo, J.M., Martinez-Meyer, E., Peterson, A.T. and Soberon, J.M. 2020. Optimizing biodiversity informatics to improve information flow, data quality, and utility for science and society. Frontiers of Biogeography 12(3): 1-14.
  3. Bebber, D.P., Carine, M.A., Wood, J.R.I., Wortley, A.H., Harris, D.J., Prance, G.T., Davidse, G., Paige, J., Pennington, T.D., Robson, N.K.B. and Scotland, R.W. 2010. Herbaria are a major frontier for species discovery. Proceedings of the National Academy of Sciences of the United States of America 107(51): 22169-22171. https://doi.org/10.1073/pnas.1011841108
  4. Beach, J. 2018. Specify Collections consortium-building durable infrastructure. Biodiversity Information Science and Standards 2: e26860. https://doi.org/10.3897/biss.2.26860
  5. Berendsohn, W.G. 2009. Data and information management and communication. pp. 253-272. In: Barthlott, W., Linsenmair, K.E. and Porembski, S. (Ed.). Biodiversity: Structure and Function - Volume I. EOLSS Publishers. Oxford, UK.
  6. Berendsohn, W.G., Guntsch, A., Hoffmann, N., Kohlbecker, A., Luther, K. and Muller, A. 2011. Biodiversity information platforms: From standards to interoperability. ZooKeys 150: 71-87. https://doi.org/10.3897/zookeys.150.2166
  7. Chavan, V. and Krishnan, S. 2003. Natural history collections: A call for national information infrastructure. Current Science-Bangalore 84(1): 34-42.
  8. Chamberlain, S., Barve, V., Mcglinn, D., Oldoni, D., Desmet, P., Geffert, L. and Ram, K. 2021. RGBIF: Interface to the global biodiversity information facility API. R package version 3.5.2.93 https://cran.r-project.org/package=rgbif.(2021. 3. 15).
  9. Chapman, A.D. 1999. Quality control and validation of pointsourced environmental resource data. In Spatial accuracy assessment: Land information uncertainty in natural resources. K. Lowell and A. Jaton (eds.), Ann Arbor Press, Chelsea.
  10. Chapman, A.D. 2005a. Principles and methods of data cleaning: Primary species and species-occurrence data, version 1.0. Report for the Global Biodiversity Information Facility. http://www.gbif.org/document/80528. (2021. 3. 15).
  11. Chapman, A.D. 2005b. Principles of data quality. Global Biod iversity Information Facility. https://doi.org/10.15468/doc.jrgg-a190. (2021. 3. 15).
  12. Chapman, A.D. et al. 2020. Developing standards for improved data quality and for selecting fit for use biodiversity data. Biodiversity Information Science and Standards 4: e50889. https://doi.org/10.3897/biss.4.50889
  13. Do, M.S., Lee, J. W., Jang, H. J., Kim, D. I., Park, J. and Yoo, J. C. 2017. Spatial distribution patterns and prediction of hotspot area for endangered herpetofauna species in Korea. Korean Journal of Environment and Ecology, 31(4): 381-396. https://doi.org/10.13047/KJEE.2017.31.4.381
  14. Fuentes, N., Pauchard, A., Sanchez, P., Esquivel, J. and Marticorena, A. 2013. A new comprehensive database of alien plant species in Chile based on herbarium records. Biological Invasions 15(4): 847-858. https://doi.org/10.1007/s10530-012-0334-6
  15. GBIF.org. 2020. GBIF.org (24th Dec 2020) GBIF Occurrence Download (Taiwan) https://www.gbif.org/occurrence/download/0172317-200613084148143. (2020.12.24).
  16. GBIF.org. 2021a. GBIF.org(29th Jan 2021) GBIF Occurrence Download (China) https://www.gbif.org/occurrence/download/0176738-200613084148143. (2021.01.29).
  17. GBIF.org. 2021b. GBIF.org(29th Jan 2021) GBIF Occurrence Download (Korea) https://www.gbif.org/occurrence/download/0176754-200613084148143. (2021.01.29).
  18. GBIF.org. 2021c. GBIF.org(29th Jan 2021) GBIF Occurrence Download (Japan) https://www.gbif.org/occurrence/download/0144048-200613084148143. (2021.01.29.).
  19. Gilbert, E., Franz, N. and Sterner, B. 2020. Historical overview of the development of the symbiota specimen management software and review of the interoperability challenges and opportunities informing future development. Biodiversity Information Science and Standards 4: e59077. https://doi.org/10.3897/biss.4.59077
  20. Goodwin, Z.A., Harris, D.J., Filer, D., Wood, J.R.I. and Scotland, R.W. 2015. Widespread mistaken identity in tropical plant collections. Current Biology 25(22): R1066-R1067. https://doi.org/10.1016/j.cub.2015.10.002
  21. Gwinn, N.E. and Rinaldo, C.A. 2009. The biodiversity heritage library: Sharing biodiversity with the world. The International Federation of Library Associations and Institutions Journal 35(1): 25-34
  22. Hardisty, A., Roberts, D. and The Biodiversity Informatics Community. 2013. A decadal view of biodiversity informatics: Challenges and priorities. BMC Ecology 13(1): 16-39. https://doi.org/10.1186/1472-6785-13-16
  23. Kier, G. and Barthlott, W. 2001. Measuring and mapping endemism and species richness: A new methodological approach and its application on the flora of Africa. Biodiversity & Conservation, 10(9): 1513-1529. https://doi.org/10.1023/A:1011812528849
  24. Kim, H.W. 2017. Status assessment and cause of herbarium database errors -Selected woody plants taxa stored in national herbarium of Korea- (Dissertation). Seoul. Seoul National University, MS.
  25. Orr, K. 1998. Data quality and systems theory. Communications of the ACM 41(2): 66-71. https://doi.org/10.1145/269012.269023
  26. Peterson, A.T., Knapp, S., Guralnick, R., Sobero N, J. and Holder, M.T. 2010. The big questions for biodiversity informatics. Systematics and Biodiversity 8(2): 159-168. https://doi.org/10.1080/14772001003739369
  27. Pouwer, R., Willemse, L.P.M., Mols, J.B. and Wieringa, J.J. 2008. Guidelines for collection data registration with BRAHMS 6. Nationaal Herbarium Nederland. Leiden, The Netherlands.
  28. Rahm, E. and Do, H.H. 2000. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin 23(4): 3-13.
  29. Sarkar, I.N. 2007. Biodiversity informatics: Organizing and linking information across the spectrum of life. Briefings in Bioinformatics 8(5): 347-357. https://doi.org/10.1093/bib/bbm037
  30. Scoble, M. J. 2010. Rationale and value of natural history collections digitisation. Biodiversity Informatics 7(2): 77-80. https://doi.org/10.17161/bi.v7i2.3994
  31. Shao, K.T., Lai, K.C., Lin, Y.C., Chen, L.S., Li, H.Y., Hsu, C.H., Lee, H., Hsu, H.W. and Mai, G.S. 2013. Experience and strategy of biodiversity data integration in Taiwan. Data Science Journal 12: WDS61-WDS69. https://doi.org/10.2481/dsj.WDS-008
  32. Shin, C.H. 2014. Report on improvement of the Herbarium specimens infrastructure for forest biodiversity on the Korean Peninsula. Korea National Arboretum. https://scienceon.kisti.re.kr/commons/util/originalView.do?cn=TRKO201500014016&dbt=TRKO&rn=. (2021. 03. 15).
  33. Stribling, J.B., Moulton, S.R. and Lester, G.T. 2003. Determining the quality of taxonomic data. Journal of the North American Benthological Society 22(4): 621-631. https://doi.org/10.2307/1468357
  34. Wen, J., Ickert-Bond, S.M., Appelhans, M.S., Dorr, L.J. and Funk, V.A. 2015. Collections-based systematics: Opportunities and outlook for 2050. Journal of Systematics and Evolution 53(6): 477-488. https://doi.org/10.1111/jse.12181
  35. Williams, P., Margules, C.R. and Hilbert, D.W. 2002. Data requirements and data sources for biodiversity priority area selection. Journal of Biosciences 27(4): 327-338. https://doi.org/10.1007/BF02704963