• Title/Summary/Keyword: genomic data

Search Result 626, Processing Time 0.026 seconds

Exploring cancer genomic data from the cancer genome atlas project

  • Lee, Ju-Seog
    • BMB Reports
    • /
    • v.49 no.11
    • /
    • pp.607-611
    • /
    • 2016
  • The Cancer Genome Atlas (TCGA) has compiled genomic, epigenomic, and proteomic data from more than 10,000 samples derived from 33 types of cancer, aiming to improve our understanding of the molecular basis of cancer development. Availability of these genome-wide information provides an unprecedented opportunity for uncovering new key regulators of signaling pathways or new roles of pre-existing members in pathways. To take advantage of the advancement, it will be necessary to learn systematic approaches that can help to uncover novel genes reflecting genetic alterations, prognosis, or response to treatments. This minireview describes the updated status of TCGA project and explains how to use TCGA data.

Currents in Integrative Biochip Informatics

  • Kim, Ju-Han
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.1-9
    • /
    • 2001
  • scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational sciences and information technology. The informatics revolutions both in clinical informatics and bioinformatics will change the current paradigm of biomedical sciences and practice of clinical medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever much the same way that biochemistry did a generation ago. In this talk, 1 will describe how these technologies will in pact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics. Basic data preprocessing with normalization and filtering, primary pattern analysis, and machine teaming algorithms will be presented. Issues of integrated biochip informatics technologies including multivariate data projection, gene-metabolic pathway mapping, automated biomolecular annotation, text mining of factual and literature databases, and integrated management of biomolecular databases will be discussed. Each step will be given with real examples from ongoing research activities in the context of clinical relevance. Issues of linking molecular genotype and clinical phenotype information will be discussed.

  • PDF

Pan-Genomics of Lactobacillus plantarum Revealed Group-Specific Genomic Profiles without Habitat Association

  • Choi, Sukjung;Jin, Gwi-Deuk;Park, Jongbin;You, Inhwan;Kim, Eun Bae
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.8
    • /
    • pp.1352-1359
    • /
    • 2018
  • Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (${\text\tiny{L}}-arabinose$, ${\text\tiny{L}}-rhamnose$, and fructooligosaccharides) and that G3, G4, and G5 possessed more genes for the restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.

The influence of a first-order antedependence model and hyperparameters in BayesCπ for genomic prediction

  • Li, Xiujin;Liu, Xiaohong;Chen, Yaosheng
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.12
    • /
    • pp.1863-1870
    • /
    • 2018
  • Objective: The Bayesian first-order antedependence models, which specified single nucleotide polymorphisms (SNP) effects as being spatially correlated in the conventional BayesA/B, had more accurate genomic prediction than their corresponding classical counterparts. Given advantages of $BayesC{\pi}$ over BayesA/B, we have developed hyper-$BayesC{\pi}$, ante-$BayesC{\pi}$, and ante-hyper-$BayesC{\pi}$ to evaluate influences of the antedependence model and hyperparameters for $v_g$ and $s_g^2$ on $BayesC{\pi}$.Methods: Three public data (two simulated data and one mouse data) were used to validate our proposed methods. Genomic prediction performance of proposed methods was compared to traditional $BayesC{\pi}$, ante-BayesA and ante-BayesB. Results: Through both simulation and real data analyses, we found that hyper-$BayesC{\pi}$, ante-$BayesC{\pi}$ and ante-hyper-$BayesC{\pi}$ were comparable with $BayesC{\pi}$, ante-BayesB, and ante-BayesA regarding the prediction accuracy and bias, except the situation in which ante-BayesB performed significantly worse when using a few SNPs and ${\pi}=0.95$. Conclusion: Hyper-$BayesC{\pi}$ is recommended because it avoids pre-estimated total genetic variance of a trait compared with $BayesC{\pi}$ and shortens computing time compared with ante-BayesB. Although the antedependence model in $BayesC{\pi}$ did not show the advantages in our study, larger real data with high density chip may be used to validate it again in the future.

An Ontology-Based GIS for Genomic Data Management of Rumen Microbes

  • Jelokhani-Niaraki, Saber;Tahmoorespur, Mojtaba;Minuchehr, Zarrin;Nassiri, Mohammad Reza
    • Genomics & Informatics
    • /
    • v.13 no.1
    • /
    • pp.7-14
    • /
    • 2015
  • During recent years, there has been exponential growth in biological information. With the emergence of large datasets in biology, life scientists are encountering bottlenecks in handling the biological data. This study presents an integrated geographic information system (GIS)-ontology application for handling microbial genome data. The application uses a linear referencing technique as one of the GIS functionalities to represent genes as linear events on the genome layer, where users can define/change the attributes of genes in an event table and interactively see the gene events on a genome layer. Our application adopted ontology to portray and store genomic data in a semantic framework, which facilitates data-sharing among biology domains, applications, and experts. The application was developed in two steps. In the first step, the genome annotated data were prepared and stored in a MySQL database. The second step involved the connection of the database to both ArcGIS and $Prot{\acute{e}}g{\acute{e}}$ as the GIS engine and ontology platform, respectively. We have designed this application specifically to manage the genome-annotated data of rumen microbial populations. Such a GIS-ontology application offers powerful capabilities for visualizing, managing, reusing, sharing, and querying genome-related data.

Bioinformatics services for analyzing massive genomic datasets

  • Ko, Gunhwan;Kim, Pan-Gyu;Cho, Youngbum;Jeong, Seongmun;Kim, Jae-Yoon;Kim, Kyoung Hyoun;Lee, Ho-Yeon;Han, Jiyeon;Yu, Namhee;Ham, Seokjin;Jang, Insoon;Kang, Byunghee;Shin, Sunguk;Kim, Lian;Lee, Seung-Won;Nam, Dougu;Kim, Jihyun F.;Kim, Namshin;Kim, Seon-Young;Lee, Sanghyuk;Roh, Tae-Young;Lee, Byungwook
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.8.1-8.10
    • /
    • 2020
  • The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www. bioexpress.re.kr/.

Predicting the Accuracy of Breeding Values Using High Density Genome Scans

  • Lee, Deuk-Hwan;Vasco, Daniel A.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.24 no.2
    • /
    • pp.162-172
    • /
    • 2011
  • In this paper, simulation was used to determine accuracies of genomic breeding values for polygenic traits associated with many thousands of markers obtained from high density genome scans. The statistical approach was based upon stochastically simulating a pedigree with a specified base population and a specified set of population parameters including the effective and noneffective marker distances and generation time. For this population, marker and quantitative trait locus (QTL) genotypes were generated using either a single linkage group or multiple linkage group model. Single nucleotide polymorphism (SNP) was simulated for an entire bovine genome (except for the sex chromosome, n = 29) including linkage and recombination. Individuals drawn from the simulated population with specified marker and QTL genotypes were randomly mated to establish appropriate levels of linkage disequilibrium for ten generations. Phenotype and genomic SNP data sets were obtained from individuals starting after two generations. Genetic prediction was accomplished by statistically modeling the genomic relationship matrix and standard BLUP methods. The effect of the number of linkage groups was also investigated to determine its influence on the accuracy of breeding values for genomic selection. When using high density scan data (0.08 cM marker distance), accuracies of breeding values on juveniles were obtained of 0.60 and 0.82, for a low heritable trait (0.10) and high heritable trait (0.50), respectively, in the single linkage group model. Estimates of 0.38 and 0.60 were obtained for the same cases in the multiple linkage group models. Unexpectedly, use of BLUP regression methods across many chromosomes was found to give rise to reduced accuracy in breeding value determination. The reasons for this remain a target for further research, but the role of Mendelian sampling may play a fundamental role in producing this effect.

Mutation Spectra of BRCA Genes in Iranian Women with Early Onset Breast Cancer - 15 Years Experience

  • Yassaee, Vahid Reza;Ravesh, Zeinab;Soltani, Ziba;Hashemi-Gorji, Feyzollah;Poorhosseini, Seyed Mohammad;Anbiaee, Robab;Joulaee, Azadeh
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.sup3
    • /
    • pp.149-153
    • /
    • 2016
  • Breast cancer is the most common cancer in Iran. In the recent years an upward trend has been observed in the Iranian population. Early detection by molecular approaches may reduce breast cancer morbidity and mortality. We provided consultation to 3,782 women diagnosed with early onset breast cancer during the past 15 years (1999-2014). To establish a data set for BRCA gene alterations of the Iranian families at risk, two hundred and fifty four women who met our criteria were analyzed. A total number of 46 alterations including 18 variants with unknown clinical significance (39.1%), 18 missense mutations (39.1%), 7 Indels (15.2%) and 3 large rearrangement sequences (6%) were identified. Further scanning of affected families revealed that 49% of healthy relatives harbor identical causative mutations. This is the first report of comprehensive BRCA analysis in Iranian women with early onset breast cancer. Our findings provide valuable molecular data to support physicians as well as patients for the best decision making on disease management.

Heritability Estimated Using 50K SNPs Indicates Missing Heritability Problem in Holstein Breeding

  • Shin, Donghyun;Park, Kyoung-Do;Ka, Sojoeng;Kim, Heebal;Cho, Kwang-hyeon
    • Genomics & Informatics
    • /
    • v.13 no.4
    • /
    • pp.146-151
    • /
    • 2015
  • Previous studies in Holstein have shown 35% to 51.8% heritability in milk production traits, such as milk yield, fat, and protein, using pedigree data. Other studies in complex human traits could be captured by common single-nucleotide polymorphisms (SNPs), and their genetic variations, attributed to chromosomes, are in proportion to their length. Using genome-wide estimation and partitioning approaches, we analyzed three quantitative Holstein traits relevant to milk production in Korean Holstein data harvested from 462 individuals genotyped for 54,609 SNPs. For all three traits (milk yield, fat, and protein), we estimated a nominally significant (p = 0.1) proportion of variance explained by all SNPs on the Illumina BovineSNP50 Beadchip ($h^2_G$). These common SNPs explained approximately most of the narrow-sense heritability. Longer genomic regions tended to provide more phenotypic variation information, with a correlation of 0.46~0.53 between the estimate of variance explained by individual chromosomes and their physical length. These results suggested that polygenicity was ubiquitous for Holstein milk production traits. These results will expand our knowledge on recent animal breeding, such as genomic selection in Holstein.