• Title/Summary/Keyword: Genomics Approach

Search Result 238, Processing Time 0.022 seconds

Prediction of Mammalian MicroRNA Targets - Comparative Genomics Approach with Longer 3' UTR Databases

  • Nam, Seungyoon;Kim, Young-Kook;Kim, Pora;Kim, V. Narry;Shin, Seokmin;Lee, Sanghyuk
    • Genomics & Informatics
    • /
    • v.3 no.3
    • /
    • pp.53-62
    • /
    • 2005
  • MicroRNAs play an important role in regulating gene expression, but their target identification is a difficult task due to their short length and imperfect complementarity. Burge and coworkers developed a program called TargetScan that allowed imperfect complementarity and established a procedure favoring targets with multiple binding sites conserved in multiple organisms. We improved their algorithm in two major aspects - (i) using well-defined UTR (untranslated region) database, (ii) examining the extent of conservation inside the 3' UTR specifically. Average length in our UTR database, based on the ECgene annotation, is more than twice longer than the Ensembl. Then, TargetScan was used to identify putative binding sites. The extent of conservation varies significantly inside the 3' UTR. We used the 'tight' tracks in the UCSC genome browser to select the conserved binding sites in multiple species. By combining the longer 3' UTR data, TargetScan, and tightly conserved blocks of genomic DNA, we identified 107 putative target genes with multiple binding sites conserved in multiple species, of which 85 putative targets are novel.

Molecular Genetics of the Model Legume Medicago truncatula

  • Nam, Young-Woo
    • The Plant Pathology Journal
    • /
    • v.17 no.2
    • /
    • pp.67-70
    • /
    • 2001
  • Medicago truncatula is a diploid legume plant related to the forage crop alfalfa. Recently, it has been chosen as a model species for genomic studies due to its small genome, self-fertility, short generation time, and high transformation efficiency. M. truncatula engages in symbiosis with nitrogen-fixing soil bacterium Rhizobium meliloti. M. truncatula mutants that are defective in nodulation and developmental processes have been generated. Some of these mutants exhibited altered phenotypes in symbiotic responses such as root hair deformation, expression of nodulin genes, and calcium spiking. Thus, the genes controlling these traits are likely to encode functions that are required for Nod-factor signal transduction pathways. To facilitate genome analysis and map-based cloning of symbiotic genes, a bacterial artificial chromosome library was constructed. An efficient polymerase chain reaction-based screening of the library was devised to fasten physical mapping of specific genomic regions. As a genomics approach, comparative mapping revealed high levels of macro- and microsynteny between M. truncatula and other legume genomes. Expressed sequence tags and microarray profiles reflecting the genetic and biochemical events associated with the development and environmental interactions of M. truncatula are assembled in the databases. Together, these genomics programs will help enrich our understanding of the legume biology.

  • PDF

High-performance computing for SARS-CoV-2 RNAs clustering: a data science-based genomics approach

  • Oujja, Anas;Abid, Mohamed Riduan;Boumhidi, Jaouad;Bourhnane, Safae;Mourhir, Asmaa;Merchant, Fatima;Benhaddou, Driss
    • Genomics & Informatics
    • /
    • v.19 no.4
    • /
    • pp.49.1-49.11
    • /
    • 2021
  • Nowadays, Genomic data constitutes one of the fastest growing datasets in the world. As of 2025, it is supposed to become the fourth largest source of Big Data, and thus mandating adequate high-performance computing (HPC) platform for processing. With the latest unprecedented and unpredictable mutations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the research community is in crucial need for ICT tools to process SARS-CoV-2 RNA data, e.g., by classifying it (i.e., clustering) and thus assisting in tracking virus mutations and predict future ones. In this paper, we are presenting an HPC-based SARS-CoV-2 RNAs clustering tool. We are adopting a data science approach, from data collection, through analysis, to visualization. In the analysis step, we present how our clustering approach leverages on HPC and the longest common subsequence (LCS) algorithm. The approach uses the Hadoop MapReduce programming paradigm and adapts the LCS algorithm in order to efficiently compute the length of the LCS for each pair of SARS-CoV-2 RNA sequences. The latter are extracted from the U.S. National Center for Biotechnology Information (NCBI) Virus repository. The computed LCS lengths are used to measure the dissimilarities between RNA sequences in order to work out existing clusters. In addition to that, we present a comparative study of the LCS algorithm performance based on variable workloads and different numbers of Hadoop worker nodes.

Prediction of Metal Ion Binding Sites in Proteins from Amino Acid Sequences by Using Simplified Amino Acid Alphabets and Random Forest Model

  • Kumar, Suresh
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.162-169
    • /
    • 2017
  • Metal binding proteins or metallo-proteins are important for the stability of the protein and also serve as co-factors in various functions like controlling metabolism, regulating signal transport, and metal homeostasis. In structural genomics, prediction of metal binding proteins help in the selection of suitable growth medium for overexpression's studies and also help in obtaining the functional protein. Computational prediction using machine learning approach has been widely used in various fields of bioinformatics based on the fact all the information contains in amino acid sequence. In this study, random forest machine learning prediction systems were deployed with simplified amino acid for prediction of individual major metal ion binding sites like copper, calcium, cobalt, iron, magnesium, manganese, nickel, and zinc.

Global Optimization of Clusters in Gene Expression Data of DNA Microarrays by Deterministic Annealing

  • Lee, Kwon Moo;Chung, Tae Su;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.1 no.1
    • /
    • pp.20-24
    • /
    • 2003
  • The analysis of DNA microarry data is one of the most important things for functional genomics research. The matrix representation of microarray data and its successive 'optimal' incisional hyperplanes is a useful platform for developing optimization algorithms to determine the optimal partitioning of pairwise proximity matrix representing completely connected and weighted graph. We developed Deterministic Annealing (DA) approach to determine the successive optimal binary partitioning. DA algorithm demonstrated good performance with the ability to find the 'globally optimal' binary partitions. In addition, the objects that have not been clustered at small non­zero temperature, are considered to be very sensitive to even small randomness, and can be used to estimate the reliability of the clustering.

cDNA Microarray gene expression profiling of hydroxyurea, paclitaxel and p-anisidine that are genotoxic compounds with differing tumorigenicity results

  • Lee, Michael;Jung Kwon;Kim, Se-Nyun;Kim, Ja-Eun;Koh, Woo-Suk;Song, Chang-Woo;Chung, Moon-Koo
    • Proceedings of the Korean Society of Toxicology Conference
    • /
    • 2003.05a
    • /
    • pp.36-37
    • /
    • 2003
  • The potential application of toxicogenomics to predictive toxicology has been discussed widely, but the utility of the approach remains largely unproven. Using cDNA microarrays, we have compared the gene expression profiles produced in mouse lymphoma cells by three genotoxic compounds, hydroxyurea (a carcino- gen), p-anisidine (a noncarcinogen) and paclitaxel (carcinogenicity unknown). (omitted)

  • PDF

Chemical Genetics and Chemical Genomics: High Throughput Profiling of Drugs, Therapeutic Genes and Disease Networks

  • Kim, Tae-Kook
    • Proceedings of the PSK Conference
    • /
    • 2003.10a
    • /
    • pp.97-99
    • /
    • 2003
  • With advances in determining the entire DNA sequence of the human genome, it is now critical to systematically identify the function of a number of genes in the human genome. These biological problems, especially those in human diseases including cancer, should be addressed in human cells in which genetic approaches have been extremely difficult to implement. To overcome this, my efforts have focused on the development of a novel “chemical genetic/genomic approach” that uses small molecules to “probe and identify” the function of genes in specific biological process or pathway in human cells. (omitted)

  • PDF

Digenic or oligogenic mutations in presumed monogenic disorders: A review

  • Afif Ben-Mahmoud;Vijay Gupta;Cheol-Hee Kim;Lawrence C Layman;Hyung-Goo Kim
    • Journal of Genetic Medicine
    • /
    • v.20 no.1
    • /
    • pp.15-24
    • /
    • 2023
  • Monogenic disorders are traditionally attributed to the presence of mutations in a single gene. However, recent advancements in genomics have revealed instances where the phenotypic expression of apparently monogenic disorders cannot be fully explained by mutations in a single gene alone. This review article aims to explore the emerging concept of digenic or oligogenic inheritance in seemingly monogenic disorders. We discuss the underlying mechanisms, clinical implications, and the challenges associated with deciphering the contribution of multiple genes in the development and manifestation of such disorders. We present relevant studies and highlight the importance of adopting a broader genetic approach in understanding the complex genetic architecture of these conditions.

Identification of Causal and/or Rare Genetic Variants for Complex Traits by Targeted Resequencing in Population-based Cohorts

  • Kim, Yun-Kyoung;Hong, Chang-Bum;Cho, Yoon-Shin
    • Genomics & Informatics
    • /
    • v.8 no.3
    • /
    • pp.131-137
    • /
    • 2010
  • Genome-wide association studies (GWASs) have greatly contributed to the identification of common variants responsible for numerous complex traits. There are, however, unavoidable limitations in detecting causal and/or rare variants for traits in this approach, which depends on an LD-based tagging SNP microarray chip. In an effort to detect potential casual and/or rare variants for complex traits, such as type 2 diabetes (T2D) and triglycerides (TGs), we conducted a targeted resequencing of loci identified by the Korea Association REsource (KARE) GWAS. The target regions for resequencing comprised whole exons, exon-intron boundaries, and regulatory regions of genes that appeared within 1 Mb of the GWA signal boundary. From 124 individuals selected in population-based cohorts, a total of 0.7 Mb target regions were captured by the NimbleGen sequence capture 385K array. Subsequent sequencing, carried out by the Roche 454 Genome Sequencer FLX, generated about 110,000 sequence reads per individual. Mapping of sequence reads to the human reference genome was performed using the SSAHA2 program. An average of 62.2% of total reads was mapped to targets with an average 22X-fold coverage. A total of 5,983 SNPs (average 846 SNPs per individual) were called and annotated by GATK software, with 96.5% accuracy that was estimated by comparison with Affymetrix 5.0 genotyped data in identical individuals. About 51% of total SNPs were singletons that can be considered possible rare variants in the population. Among SNPs that appeared in exons, which occupies about 20% of total SNPs, 304 nonsynonymous singletons were tested with Polyphen to predict the protein damage caused by mutation. In total, we were able to detect 9 and 6 potentially functional rare SNPs for T2D and triglycerides, respectively, evoking a further step of replication genotyping in independent populations to prove their bona fide relevance to traits.

Comparison of Plasma Proteome Expression between the Young and Mature Adult Pigs

  • Jeong, Jin Young;Nam, Jin Sun;Kim, Jang Mi;Jeong, Hak Jae;Kim, Kyung Woon;Lee, Hyun-Jeong
    • Reproductive and Developmental Biology
    • /
    • v.37 no.4
    • /
    • pp.247-253
    • /
    • 2013
  • Here, we present an approach of blood plasma proteome profiling and their comparisons between the young and the adult pigs as prerequisite for the identification of bio-markers related to the health conditions, growth performance and meat quality. To profile the proteome in porcine plasma, blood samples were collected from 19 young piglets and 20 adult male barrows and the plasma was retrieved. Then, protein profiling was initiated using one and two-dimensional electrophoresis. Proteins were spotted and then identified by MALDI-TOF-TOF and LC-MS-MS. In the results, more than thirty-six and twenty eight protein spots were selected in young piglets and adult pigs, respectively and twenty three proteins were identified. The proteome profile images were compared between those ones using Image Master Version 7.0. The image of expressed proteome showed that most of proteins from plasma of young piglet separated clearly and concentrated in 2DE display compared to ones from adult. Image analysis in detail was carried out to look for the specific proteins related to age progression. It demonstrated that the characteristics of proteome expression could be distinct to their age stages. Further investigations needed to proceed to understand the age dependent change of protein conformation and biological meaning of those differences in proteome expression between young and mature adult pigs.