DOI QR코드

DOI QR Code

Development of Collaborative Environment for Community-driven Scientific Data Curation

커뮤니티 주도적 과학 데이터 큐레이션 협업 환경의 개발

  • Received : 2017.06.21
  • Accepted : 2017.07.26
  • Published : 2017.09.28

Abstract

The importance of data curation is increasingly recognized as the need of data reuse drastically grows. Due to recent data explosion, scientists invest almost 90% of their efforts in the retrieval and collection of data needed to their study. In this paper, we deal with the development and application of a collaborative environment for community-driven data curation which is essential to enhance scientific data reusability and citability. The collaborative scientific data curation environment focuses on the cross-linking between data (or data collections) and their associated literatures to capture and organize inter-relations among research results in a specific domain. Also, plenty of contextual information is provided as metadata in order to support users in understanding data. The cross-linking has been realized by using DOI system to guarantee global accessibility to data and their relationships to literatures. The curation environment has been adopted to build a community-driven curated DB by a globally well-known intrinsically-disorderd protein research group. The curated DB will drastically reduce researchers' efforts to retrieve and collect the data required for scientific discovery.

데이터 재사용 수요가 증가할수록 데이터 큐레이션의 중요성에 대한 인식은 점차 증가하고 있다. 데이터의 폭증으로 인해, 과학자들은 전체 노력의 90%를 자신의 연구에 필요한 데이터의 검색 및 수집에 들이고 있다. 이러한 노력을 절감시키기 위해, 본 논문에서는 과학 데이터의 재사용성을 높이는 데 필수적인 커뮤니티 주도적 데이터 큐레이션 협업 환경의 개발 및 적용에 대해 다룬다. 본 과학 큐레이션 협업 환경은 특정연구 분야의 연구 결과 간에 상호 연관성을 포획하고 재구성하기 위해 데이터 (또는 데이터 컬렉션) 및 관련 문헌 간의 상호 연결에 초점을 맞추고 있다. 또한 풍부한 문맥 정보를 메타데이터로 제공하여 사용자의 데이터 이해를 돕는다. 데이터 및 데이터-문헌 간의 상호 연결을 영구적으로 접근할 수 있도록 보장하기 위해, DOI 시스템을 이용하여 실현하였다. 이 큐레이션 협업 환경은 국내외 연구자들로 구성된 무정형 단백질 연구 그룹에 의해 커뮤니티 주도적인 큐레이션 데이터베이스 구축에 적용되었다. 이렇게 구축된 데이터 베이스는 무정형 단백질 연구자의 과학적 발견을 위한 데이터 검색 및 수집 노력을 절감해 줄 것이다.

Keywords

References

  1. B. Howe and T. Lewis, "Enabling Collaborative Research Data Management with SQLShare, 2012, https://www.slideshare.net/billhoweuw/research-data-managementi22012
  2. I. Faniel, D. Minor, and C. L. Palm, "Putting Research Data into Context: Scholarly, Professional, and Educational Approaches to Curating Data for Reuse," ASIST 2014.
  3. I. Faniel, E. Yakel, K. Fear, and E. Kansa, "A Context-driven Approach to Data Curation for Reuse," International Digital Curation Conference, Amsterdam, February 22, 2016.
  4. I. Faniel, E. Kansa, S. W. Kansa, J. Barrera-Gomez, and E. Yakel, "The Challenges of Digging Data: A Study of Context in Archaeological Data Reuse," JCDL 2013, pp.295-304.
  5. http://www.rcsb.org/pdb/home/home.do
  6. https://www.ncbi.nlm.nih.gov/genbank
  7. M. E. Cusick, "Literature-curated protein interaction datasets," Nat Methods, Vol.6, No.1, pp.39-465, 2009. https://doi.org/10.1038/nmeth.1284
  8. D. S. Kwon, S. Kim, S. Y. Shin, Andrew Chatr-aryamontri, and W. John Wilbur, "Assisting manual literature curation for protein-protein interactions using BioQRator," Database, 2014.
  9. D. G. Jamieson, M. Germer, F. Sarafraz, G. Nenadic, and D. L. Robertson, "Towards semi-automated curation: using text mining to recreate the HIV-1, human protein interaction database," Database, 2012.
  10. M. S. Mayernik, J. Phillips, and E. Nienhouse, "Linking Publications and Data: Challenges, Trends, and Opportunities," D-Lib Magazine, Vol.22, No.5/6, 2016(11).
  11. M. Hoogerwerf, M. Losch, J. Schirrwagen, S. Callaghan, P. Manghi, K. Iatropoulou, D. Keramida, and N. Rettberg, "Linking Data and Publications: Towards a Cross-Disciplinary Approach," The International Journal of Digital Curation, Vol.8, No.1, 2013.
  12. B. Lawrence, C. Jones, B. Mathews, S. Palmer, and S. Callaghan, "Citation and Peer Review of Data: Moving Towards Formal Data Publication," The International Journal of Digital Curation, Vol.6, No.2, 2011.
  13. H. M. Berman, J, Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, "The Protein Data Bank," Nucleic Acids Research, Vol.28, No.1, pp.235-242, 2000. https://doi.org/10.1093/nar/28.1.235
  14. https://www.doi.org
  15. H. Lee, K. H. Mok, R. Muhandiram, K. H. Park, J. E. Suk, D. H. Kim, J. Chang, Y. C. Sung, K. Y. Choi, and K. H. Han, "Local Structural Elements in the Mostly Unstructured Transcriptional Activation Domain of Human p53," The Journal of Biological Chemistry, Vol.275, No.38, pp.29426-294323, 2000. https://doi.org/10.1074/jbc.M003107200
  16. https://www.doi.or.kr
  17. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, K. Blomberg, J. W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, C. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. G. 't Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons, "The FAIR Guiding Principles for scientific data management and stewardship," Scientific Data 2016.
  18. Life Science Solutions, "Automated vs manual literature curation: extracting more information from scientific literature," Elsevier, 2014.