GORank: Semantic Similarity Search for Gene Products using Gene Ontology

Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo;

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Volume 33 Issue 7
/
Pages.682-692
/
2006
/
1229-7739(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

GORank: Semantic Similarity Search for Gene Products using Gene Ontology

GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색

김기성 (서울대학교 전기컴퓨터공학부) ;
유상원 (서울대학교 전기컴퓨터공학부) ;
김형주 (서울대학교 전기컴퓨터공학부)

Published : 2006.12.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.

유사한 생물학적 특성을 가진 유전자 산물을 검색하는 것은 생물정보학 연구에 필수적인 기술이다. 현재 대부분의 생물학 데이타베이스에서 Gene Ontology의 용어를 사용하여 유전자 산물의 생물학적 특성을 기술하고 있다. 본 논문에서는 이런 유전자 산물의 주석 정보를 사용해 의미적으로 유사한 유전자 산물을 검색하는 방법을 제안한다. 이를 위해 우선 정보 이론에 기반한 유전자 산물간의 의미적 유사도를 정의하였다. 그리고 이 유사도를 이용한 의미적 유사성 검색 알고리즘을 제안하였다. 의미적 유사성 검색을 처리하기 위해 Fagin의 문턱값 알고리즘(threshold algorithm)을 다음과 같이 변형한 기법을 사용하였다. 우선 사용하는 유사도 함수가 단조 증가 성질을 갖지 않기 때문에 유사도 함수에 맞는 문턱값을 재정의 하였다. 또 역색인 리스트의 구조를 사용하여 중간 검색을 생략할 수 있는 클러스터 스키핑 기법과 역색인 리스트 액세스 순서를 제안하였다. 실제 GO와 주석 정보를 이용하여 성능 평가를 했으며 제안한 알고리즘은 효율적인 알고리즘임을 보였다.

Keywords

의미적 유사성 검색;

References

The Gene Ontology Consortium, Creating the Gene Ontology Resource: Design and Implementation. Genome Res, 2001. 11(8): p. 1425-33 https://doi.org/10.1101/gr.180801
Lin, D. An Information-theoretic Definition of Similarity. in 15th International Conf. on Machine Learning. 1998. San Francisco, CA
Fagin, R., A. Lotem, and M. Naor, Optimal Aggregation Algorithms for Middleware, Journal of Computer and System Sciences, 2003. 66(4): p. 614-656 https://doi.org/10.1016/S0022-0000(03)00026-6
Aslam, J.A. and M. Frost. An Information-theoretic Measure for Document Similarity. in SIGIR. 2003. Toronto, Canada https://doi.org/10.1145/860435.860545
Maguitman, A.G. and F. Menczer. Algorithmic Detection of Semantic Similarity. in WWW. 2005. Chiba, Japan https://doi.org/10.1145/1060745.1060765
Lee, J.H., M.H. Kim, and Y.J. Lee, Information Retrieval based on Conceptual Distance in is-a Hierarchies. Journal of Documentation, 1989. 49(2): p. 188-207 https://doi.org/10.1108/eb026913
Rada, R., et al., Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man and Cybernetics, 1989. 19(1): p. 17-30 https://doi.org/10.1109/21.24528
Lord, P.W., et al. Semantic Similarity Measures As Tools For Exploring the Gene Ontology. in Pacific Symposium on Biocomputing 2003
Resnik, P., Semantic Similarity in a Taxonomy: An Information-based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 1999. 11: p. 95-130
Jiang, J.J. and D.W. Conrath, Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, in International Conference Research on Computational Linguistics. 1997: Taiwan
Cover, T. and J. Thomas, Elements of Information Theory. 1991: Wiley-Interscience
Hjaltason, G.R. and H. Samet, Indexing-Driven Similarity Search in Metric Space. ACM Transactions on Database Systems, 2003. 28(4): p. 517-580 https://doi.org/10.1145/958942.958948
Bohm, C., S. Berchtold, and D.A. Keim, Searching in High-Dimensional Spaces: Index structures for Improving the Performance of Multimedia Databases. ACM Computing Surveys, 2001. 33(3): p. 322-373 https://doi.org/10.1145/502807.502809
Chavez, E., et al., Searching in Metric Spaces. ACM Computing Surveys, 2001. 33(3): pp. 273-321 https://doi.org/10.1145/502807.502808
Guntzer, U., W.-T. Balke, and W. Kie${\beta}$ling, Optimizing Multi-Feature Queries for Image Databases, in VLDB. 2000: Egypt
Azuaje, F., H. Wang, and O. Bodenreider, Ontology-driven Similarity Approached to Supporting Gene Functional Assessment, in ISMB Sig meeting on Bio-ontology. 2005

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

GORank: Semantic Similarity Search for Gene Products using Gene Ontology

GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)