DOI QR코드

DOI QR Code

Effective Indexing for Evolving Data Collection by Using Ontology

온톨로지를 이용한 변화하는 데이터의 효과적인 인덱싱 방법

  • 김종욱 (상명대학교 미디어소프트웨어학과) ;
  • 배명수 (서울아산병원 영상의학과)
  • Received : 2013.11.04
  • Accepted : 2013.12.18
  • Published : 2014.02.28

Abstract

Data which is created and shared on the Web is characterized by the massive amount of user generated content on various applications and dynamically evolving content on the basis of user interests. Thus, in order to benefit from Web data, it is essential to provide (a) the mechanisms which enable scalable processing of large data collections and (b) the organization schemes which reduce the navigational overhead within complex and dynamically growing content. Between these two impending needs, in this paper, we are interested in developing an indexing scheme which aims to reduce the time and effort needed to access the relevant piece of information by leveraging ontologies. In particular, considering evolving nature of Web contents, the proposed technique in this paper computes the sub-ontology, which best matches a given data collection, from the existing large size of ontology. Case studies show that the proposed indexing scheme in this paper indeed helps organize dynamically evolving content.

웹상에서 생성 공유되는 데이터는 다양한 분야에서 대용량으로 생성되고, 콘텐츠가 사회적 관심에 따라 지속적으로 변화 한다는 특징이 있다. 이로 인하여, 웹 데이터를 분석하여 유용한 정보를 얻기 위해서는 (a) 대용량의 데이터를 빠르게 처리하고, (b) 사용자가 쉽게 정보를 찾을 수 있도록 데이터를 구성하는 것이 필수적이다. 이러한 두 가지 측면 중에서, 본 논문은 사용자의 정보 검색 부담을 덜어주기 위해 온톨로지를 활용한 데이터 구성 방법을 제시한다. 특히, 본 논문에서는 콘텐츠가 사회적 관심에 따라 지속적으로 변화하는 웹 데이터의 특성을 고려하여, 데이터 콘텐츠를 인덱싱하기에 가장 적합한 온톨로지를 기존에 존재하는 범용 온톨로지로부터 추출한다. 또한, 사례 연구를 통하여 제시한 알고리즘의 유용성을 보인다.

Keywords

References

  1. J. Dean and S. Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters," Symposium on Opearting Systems Design and Implementation, pp. 137-150, 2004.
  2. J.W. Kim, "Data Partitioning on MapReduce by Leveraging Data Utility," Journal of Korea Multimedia Society, Vol. 16, No. 5, pp. 657-666, 2013. https://doi.org/10.9717/kmms.2013.16.5.657
  3. Teradata, http://www.teradata.com. 1979.
  4. IBM Netezza Data Warehouse Appliances, http://www-01.ibm.com/software/data/netezza/, 2000.
  5. M. Cataldi, K.S. Candan, and M.L. Sapino, "Narrative-based Taxonomy Distillation for Effective Indexing of Text Collections," Data and Knowledge Engineering, Vol. 72, No 2, pp. 103-125, 2012. https://doi.org/10.1016/j.datak.2011.09.008
  6. O. Zamir and O. Etzioni, "Web Document Clustering: A Feasibility Demonstration," Proc. of the International ACM SIGIR Conference, pp. 46-54, 1998.
  7. WordNet, A lexical database for English, http://wordnet.princeton.edu/, 2013.
  8. Wikipedia, http://www.wikipedia.org/, 2001.
  9. Open Directory Project, http://www.dmoz.org/, 1998.
  10. J.W. Kim and K.S. Candna, "CP/CV: Concept Similarity Mining without Frequency Information from Domain Describing Taxonomy," Proc. of the International ACM CIKM Conference, pp. 483-492, 2006.
  11. M. Cataldi, C. Schifanella, K.S. Candan, M.L. Sapino, and L.D. Caro, "CoSeNa: A Context-based Search and Navigation System," Proc. of the International Conference on Management of Emergent Digital EcoSystems, pp. 218-225, 2009.
  12. L.D. Caro, K.S. Candan, and M.L. Sapino, "Using tagFlake for Condensing Navigable Tag Hierarchies from Tag Clouds," Proc. of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1069-1072, 2008.
  13. I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-clustering," Proc. of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 89-98, 2003.
  14. I.S. Dhillon, "Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning," Proc. of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 269-274, 2001.
  15. J. Zhao and G. Karypis, "Evaluation of Hierarchical Clustering Algorithms for Document Datasets," Proc. of the International ACM CIKM Conference, pp. 515-524, 2002.
  16. R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. of the International Conference on Very Large Data Bases, pp. 144-155, 1994.
  17. ACM Digital Library. http://portal.acm.org, 2014.

Cited by

  1. Knowledge Map Service based on Ontology of Nation R&D Information vol.14, pp.3, 2016, https://doi.org/10.14400/JDC.2016.14.3.251