Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han;

doi:10.5808/GI.2011.9.1.019

Genomics & Informatics

Volume 9 Issue 1
/
Pages.19-27
/
2011
/
1598-866X(pISSN)
/
2234-0742(eISSN)

Korea Genome Organization (한국유전체학회)

DOI QR Code

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

Jung, Yong (Seoul National University Biomedical Informatics) ;
Seo, Hwa-Jeong (Medical Informatics, Graduate School of Public Health, Gachon University of Medicine and Science) ;
Park, Yu-Rang (Seoul National University Biomedical Informatics) ;
Kim, Ji-Hun (Seoul National University Biomedical Informatics) ;
Bien, Sang Jay (Seoul National University Biomedical Informatics) ;
Kim, Ju-Han (Seoul National University Biomedical Informatics)

Accepted : 2011.03.02
Published : 2011.03.31

https://doi.org/10.5808/GI.2011.9.1.019 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

Keywords

References

Allison, D.B., Cui, X., Page, G.P., and Sabripour, M. (2006). Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55-65. https://doi.org/10.1038/nrg1749
Argraves, G.L., Jani, S., Barth, J.L., and Argraves, W.S. (2005). ArrayQuest: a web resource for the analysis of DNA microarray data. BMC Bioinformatics 6, 287. https://doi.org/10.1186/1471-2105-6-287
Ball, C.A., and Brazma, A. (2006). MGED standards: work in progress. OMICS 10, 138-144. https://doi.org/10.1089/omi.2006.10.138
Barrett, T., and Edgar, R. (2006). Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol. 411, 352-369. https://doi.org/10.1016/S0076-6879(06)11019-8
Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I.F., Soboleva, A., Tomashevsky, M., and Edgar, R. (2007). NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucl. Acids Res. 35, D760-765. https://doi.org/10.1093/nar/gkl887
Boyle, J. (2005). Gene-Expression Omnibus integration and clustering tools in SeqExpress. Bioinformatics 21, 2550-2551. https://doi.org/10.1093/bioinformatics/bti355
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F. C., Kim, I. F., Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze- Kremer, S., Stewart, J., Taylor, R., Vilo, J. and Vingron, M. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365-371. https://doi.org/10.1038/ng1201-365
Burgarella, S., Cattaneo, D., Pinciroli, F., and Masseroli, M. (2005). MicroGen: a MIAME compliant web system for microarray experiment information and workflow management. BMC Bioinformatics 6 Suppl 4, S6. https://doi.org/10.1186/1471-2105-6-S4-S6
Butte, A.J., and Chen, R. (2006). Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics. AMIA. Annu. Symp. Proc. 106-110.
Butte, A.J., and Kohane, I.S. (2006). Creation and implications of a phenome-genome network. Nat. Biotechnol. 24, 55-62. https://doi.org/10.1038/nbt1150
Chaussabel, D., and Sher, A. (2002). Mining microarray expression data by literature profiling. Genome Biol. 3, RESEARCH0055.
Chen, D., Muller, H.M., and Sternberg, P.W. (2006). Automatic document classification of biological literature. BMC Bioinformatics 7, 370. https://doi.org/10.1186/1471-2105-7-370
Edgar, R., and Barrett, T. (2006). NCBI GEO standards and services for microarray data. Nat. Biotechnol. 24, 1471-1472. https://doi.org/10.1038/nbt1206-1471
Gollub, J., Ball, C.A., Binkley, G., Demeter, J., Finkelstein, D.B., Hebert, J.M., Hernandez-Boussard, T., Jin, H., Kaloper, M., Matese, J.C., Schroeder, M., Brown, P. O., Botstein, D. and Sherlock, G. (2003). The Stanford Microarray Database: data access and quality assessment tools. Nucl. Acids Res. 31, 94-96. https://doi.org/10.1093/nar/gkg078
Humphreys, B.L., Lindberg, D.A., Schoolman, H.M., and Barnett, G.O. (1998). The Unified Medical Language System: an informatics research collaboration. J. Am. Med. Inform. Assoc. 5, 1-11. https://doi.org/10.1136/jamia.1998.0050001
Johnson, S.B., Paul, T., and Khenina, A. (1997). Generic database design for patient management information. Proc. AMIA. Annu. Fall. Symp. 22-26.
Louie, B., Mork, P., Martin-Sanchez, F., Halevy, A., and Tarczy-Hornoch, P. (2007). Data integration and genomic medicine. J. Biomed. Inform. 40, 5-16. https://doi.org/10.1016/j.jbi.2006.02.007
Martin-Sanchez, F., Iakovidis, I., Norager, S., Maojo, V., de Groen, P., Van der Lei, J., Jones, T., Abraham-Fuchs, K., Apweiler, R., Babic, A., Baud, R., Breton, V., Cinquin, P., Doupi, P., Dugas, M., Eils, R., Engelbrecht,R., Ghazal, P., Jehenson, P., Kulikowski, C., Lampe, K., De Moor, G., Orphanoudakis, S., Rossing, N., Sarachan, B., Sousa, A., Spekowius, G., Thireos, G., Zahlmann, G., Zvarova, J., Hermosilla, I. and Vicente, F. J. . (2004). Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. J. Biomed. Inform. 37, 30-42. https://doi.org/10.1016/j.jbi.2003.09.003
Miotto, O., Tan, T.W., and Brusic, V. (2005). Supporting the curation of biological databases with reusable text mining. Genome Inform. 16, 32-44.
Parkinson, H., Kapushesky, M., Shojatalab, M., Abeygunawardena, N., Coulson, R., Farne, A., Holloway, E., Kolesnykov, N., Lilja, P., Lukk, M., Mani, R., Rayner, T., Sharma, A., William, E., Sarkans, U. and Brazma, A. (2007). ArrayExpress--a public database of microarray experiments and gene expression profiles. Nucl. Acids Res. 35, D747-750. https://doi.org/10.1093/nar/gkl995
Perou, C.M. (2001). Show me the data! Nat. Genet. 29, 373. https://doi.org/10.1038/ng1201-373
Quackenbush, J. (2002). Microarray data normalization and transformation. Nat. Genet. 32 Suppl, 496-501. https://doi.org/10.1038/ng1032
Rayner, T.F., Rocca-Serra, P., Spellman, P.T., Causton, H.C., Farne, A., Holloway, E., Irizarry, R.A., Liu, J., Maier, D.S., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C. J., Jr., White, J., Whetzel, P. L., Wymore, F., Parkinson, H., Sarkans, U., Ball, C. A. and Brazma, A. (2006). A simple spreadsheet-based, MIAMEsupportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7, 489. https://doi.org/10.1186/1471-2105-7-489
Sean, D., and Meltzer, P.S. (2007). GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23, 1846-1847. https://doi.org/10.1093/bioinformatics/btm254
Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W. L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B. J., Robinson, A., Bassett, D., Stoeckert, C. J., Jr. and Brazma, A. (2002). Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046.
The Microarray Gene Expression Data (MGED) society. The MIAME checklist [http://www.mged.org/Workgroups/MIAME/miame_checklist.html]
Vita, R., Vaughan, K., Zarebski, L., Salimi, N., Fleri, W., Grey, H., Sathiamurthy, M., Mokili, J., Bui, H.H., Bourne, P.E., Ponomarenko, J., de Castro, R., Jr., Chan, R. K., Sidney, J., Wilson, S. S., Stewart, S., Way, S., Peters, B. and Sette, A. (2006). Curation of complex, context- dependent immunological data. BMC Bioinformatics 7, 341. https://doi.org/10.1186/1471-2105-7-341
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L. Y., Kapustin, Y., Khovayko, O., Landsman, D., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Miller, V., Pruitt, K. D., Schuler, G. D., Sequeira, E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Tatusov, R. L., Tatusova, T. A., Wagner, L. and Yaschenko, E. (2007). Database resources of the National Center for Biotechnology Information. Nucl. Acids Res. 35, D5-12. https://doi.org/10.1093/nar/gkl1031
Yoon, S., Yang, Y., Choi, J., and Seong, J. (2006). Large scale data mining approach for gene-specific standardization of microarray gene expression data. Bioinformatics 22, 2898-2904. https://doi.org/10.1093/bioinformatics/btl500

Cited by

Identification of prognostic biomarkers for glioblastomas using protein expression profiling vol.40, pp.4, 2012, https://doi.org/10.3892/ijo.2011.1302

Genomics & Informatics

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)