DOI QR코드

DOI QR Code

Identification of Heterogeneous Prognostic Genes and Prediction of Cancer Outcome using PageRank

페이지랭크를 이용한 암환자의 이질적인 예후 유전자 식별 및 예후 예측

  • 최종환 (인천대학교 컴퓨터공학부) ;
  • 안재균 (인천대학교 컴퓨터공학부)
  • Received : 2017.07.13
  • Accepted : 2017.11.21
  • Published : 2018.01.15

Abstract

The identification of genes that contribute to the prediction of prognosis in patients with cancer is one of the challenges in providing appropriate therapies. To find the prognostic genes, several classification models using gene expression data have been proposed. However, the prediction accuracy of cancer prognosis is limited due to the heterogeneity of cancer. In this paper, we integrate microarray data with biological network data using a modified PageRank algorithm to identify prognostic genes. We also predict the prognosis of patients with 6 cancer types (including breast carcinoma) using the K-Nearest Neighbor algorithm. Before we apply the modified PageRank, we separate samples by K-Means clustering to address the heterogeneity of cancer. The proposed algorithm showed better performance than traditional algorithms for prognosis. We were also able to identify cluster-specific biological processes using GO enrichment analysis.

암환자의 예후 예측에 기여하는 유전자를 찾는 것은 환자에게 보다 적합한 치료를 제공하기 위한 도전 과제 중 하나이다. 예후 유전자를 찾기 위해 유전자 발현 데이터를 이용한 분류 모델 개발 연구가 많이 이루어지고 있다. 하지만 암의 이질성으로 인해 예후 예측의 정확도 향상에 한계가 있다는 문제가 있다. 본 논문에서는 유방암을 비롯한 6개의 암에 대한 암환자의 마이크로어레이 데이터와 생물학적 네트워크 데이터를 이용하여 페이지랭크 알고리즘을 통해 예후 유전자들을 식별하고, K-Nearest Neighbor 알고리즘을 사용하여 암 환자의 예후를 예측하는 모델을 제안한다. 그리고 페이지랭크를 사용하기 전에 K-Means 클러스터링으로 유전자 발현 패턴이 비슷한 샘플들을 나누어 이질성을 극복하고자 한다. 본 논문에서 제안한 방법은 기존의 유전자 바이오마커를 찾는 알고리즘보다 높은 예측 정확도를 보여 주었으며, GO 검증을 통해 클러스터에 특이적인 생물학적 기능을 확인하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. T. Katarzyna, P. Czerwinska, and M. Wiznerowicz, "The Cancer Genome Atlas (TCGA): An Immeasurable Source of Knowledge," Contemporary Oncology, Vol. 19(1A), A68-A77, Jan. 2015.
  2. L. Bullinger, et al., "Use of Gene-Expression Profiling to Identify Prognostic Subclasses in Adult Acute Myeloid Leukemia," The New England journal of medicine, Vol. 350, pp. 1605-1616, Apr. 2004. https://doi.org/10.1056/NEJMoa031046
  3. C. Sotiriou, et al., "Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis," Journal of the National Cancer Institute, Vol. 98, pp. 262-272, Feb. 2006. https://doi.org/10.1093/jnci/djj052
  4. M. C Abba, E. Lacunza, M. Butti, and C. M Aldaz, "Breast Cancer Biomarker Discovery in the Functional genomic Age: A Systematic Review of 42 Gene Expression Signatures," Biomarker insights, Vol. 5, pp. 103-118, Oct. 2010.
  5. M. Buyse, et al., "Validation and clinical utility of a 70-gene prognostic signature for women with nodenegative breast cancer," Journal of the National Cancer Institute, Vol. 98, pp. 1183-1192, Sep. 2006. https://doi.org/10.1093/jnci/djj329
  6. A. L. Barabasi, N. Gulbahce, and J. Loscalzo, "Network medicine: a network-based approach to human disease," Nature Reviews Genetics, Vol. 12, pp. 56-68, Jan. 2011. https://doi.org/10.1038/nrg2918
  7. L. I. Furlong, "Human diseases through the lens of network biology," Trends in genetics, Vol. 29, pp. 150-159, Mar. 2013. https://doi.org/10.1016/j.tig.2012.11.004
  8. M. J van de Vijver, et al., "A Gene-Expression Signature as a Predictor of Survival in Breast Cancer," The New England journal of Medicine, Vol. 347, pp. 1999-2009, Dec. 2002. https://doi.org/10.1056/NEJMoa021967
  9. B. Y. Sun, Z. H Zhu, and B. Linghu, "Combined feature selection and cancer prognosis using support vector machine regression," IEEE/ACM transactions on computational biology and bioinformatics, Vol. 8, pp. 1671-1677, Nov. 2011. https://doi.org/10.1109/TCBB.2010.119
  10. P. Langfelder and S. Horvath, "WGCNA: an R package for weighted correlation network analysis," BMC bioinformatics, Vol. 9, pp. 559, Dec. 2008. https://doi.org/10.1186/1471-2105-9-559
  11. G. Wu, and L. Stein, "A network module-based method for identifying cancer prognostic signatures," Genome Biology, Vol. 13, pp. R112, Jun. 2012. https://doi.org/10.1186/gb-2012-13-12-r112
  12. D. Amar, H. Safer and R. Shamir, "Dissection of Regulatory Networks that Are Altered in Disease via Differential Co-expression," PLoS Computational Biology, Vol. 9, e1002955, Mar. 2013. https://doi.org/10.1371/journal.pcbi.1002955
  13. C, Park, et al., "Graph-based Semi-Supervised Learning Method for predicting Prognosis of Cancer," Journal of KIISE : Computing Practices and Letters, Vol. 19, No. 2, pp. 71-76, Feb. 2013. (in Korean)
  14. WM. Song and B. Zhang, "Multiscale Embedded Gene Co-expression Network Analysis," PLoS Computational Biology, Vol. 11, e1004574, Nov. 2015. https://doi.org/10.1371/journal.pcbi.1004574
  15. MJ Ha, V. Baladandayuthapani and KA. Do, "DINGO: differential network analysis in genomics," Bioinformatics, Vol. 31, pp. 3413-3420, Nov. 2015. https://doi.org/10.1093/bioinformatics/btv406
  16. C. Winter, et al., "Google goes cancer: improving outcome prediction for cancer patients by network-based ranking of marker genes," PLoS Computational Biology, Vol. 8, e1002511, May 2012. https://doi.org/10.1371/journal.pcbi.1002511
  17. K. Polyak, "Heterogeneity in breast cancer," The Journal of clinical investigation, Vol. 121, pp. 3786-3788, Oct. 2011. https://doi.org/10.1172/JCI60534
  18. R.A. Burrell, N. McGranahan, J. Bartek, and C. Swanton, "The causes and consequences of genetic heterogeneity in cancer evolution," Nature, Vol. 501, pp. 338-345, Sep. 2013. https://doi.org/10.1038/nature12625
  19. H. Wang, et al., "Integrating Omic Data with a Multiplex Network-based Approach for the Identification of Cancer Subtypes," IEEE transactions on nanobioscience, Vol. 15, pp. 335-342, Apr. 2016. https://doi.org/10.1109/TNB.2016.2556640
  20. C. Szeto, et al., "Investigating tumoral and temporal heterogeneity through comprehend-sive omics profiling in patients with metastatic triple negative breast cancer," Journal of Clinical Oncology, Vol. 35, pp. 1093, May 2017.
  21. K.Q. Weinberger, and L.K. Saul, "Distance Metric Learning for Large Margin Nearest Neighbor Classification," Journal of Machine Learning Research, pp. 207-244, Feb. 2009.
  22. C. BITGDA, "Analysis-ready standardized TCGA data from Broad GDAC Firehose stddata_2016_01_28 run," Harvard BIoMa ed.
  23. D. Croft, et al., "The Reactome pathway knowledgebase," Nucleic acids research, Vol. 42, pp. D472-D477, Jan. 2014. https://doi.org/10.1093/nar/gkt1102
  24. A. Fabregat, et al., "The Reactome pathway knowledgebase," Nucleic acids research, Vol. 44, pp. D481-D487, Jan. 2016. https://doi.org/10.1093/nar/gkv1351
  25. G. Wu, X. Feng, and L. Stein, "A human functional protein interaction network and its application to cancer data analysis," Genome biology, Vol. 11, pp. R53, May 2010. https://doi.org/10.1186/gb-2010-11-5-r53
  26. W. Huang, B.T. Sherman, and R.A. Lempicki, "Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists," Nucleic Acids Res, Vol. 37, pp. 1-13, Jan. 2009. https://doi.org/10.1093/nar/gkn923
  27. W. Huang, B.T. Sherman, and R.A. Lempicki, "Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources," Nature protocols, Vol. 4, pp. 44-57, Dec. 2009. https://doi.org/10.1038/nprot.2008.211
  28. G. Wu, et. al., "Overexpression of Glycosylphosphatidylinositol (GPI) Transamidase Subunits Phosphatidylinositol Glycan Class T and/or GPI Anchor Attachment 1 Induces Tumorigenesis and Contributes to Invasion in Human Breast Cancer," Cancer Res, Vol. 66, pp. 9829-9836, Oct. 2006. https://doi.org/10.1158/0008-5472.CAN-06-0506
  29. B. Newcomb, and Y.A. Hannun, "Sphingolipids as Mediators of Breast Cancer Progression, Metastasis, Response and Resistance to Chemotherapy," Bioactive Sphingolipids in Cancer Biology and Therapy, pp. 81-106, 2015.
  30. H. Goodarzi, et. Al., "Modulated Expression of Specific tRNAs Drives Gene Expression and Cancer Progression," Cell, Vol. 165, No. 2, pp. 1416-1427, Jun. 2016. https://doi.org/10.1016/j.cell.2016.05.046