• 제목/요약/키워드: biomedical informatics

검색결과 277건 처리시간 0.022초

Protein Ontology: Semantic Data Integration in Proteomics

  • Sidhu, Amandeep S.;Dillon, Tharam S.;Chang, Elizabeth;Sidhu, Baldev S.
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.388-391
    • /
    • 2005
  • The Protein Structural and Functional Conservation need a common language for data definition. With the help of common language provided by Protein Ontology the high level of sequence and functional conservation can be extended to all organisms with the likelihood that proteins that carry out core biological processes will again be probable orthologues. The structural and functional conservation in these proteins presents both opportunities and challenges. The main opportunity lies in the possibility of automated transfer of protein data annotations from experimentally traceable model organisms to a less traceable organism based on protein sequence similarity. Such information can be used to improve human health or agriculture. The challenge lies in using a common language to transfer protein data annotations among different species of organisms. First step in achieving this huge challenge is producing a structured, precisely defined common vocabulary using Protein Ontology. The Protein Ontology described in this paper covers the sequence, structure and biological roles of Protein Complexes in any organism.

  • PDF

Association Analysis of Reactive Oxygen Species-Hypertension Genes Discovered by Literature Mining

  • Lim, Ji Eun;Hong, Kyung-Won;Jin, Hyun-Seok;Oh, Bermseok
    • Genomics & Informatics
    • /
    • 제10권4호
    • /
    • pp.244-248
    • /
    • 2012
  • Oxidative stress, which results in an excessive product of reactive oxygen species (ROS), is one of the fundamental mechanisms of the development of hypertension. In the vascular system, ROS have physical and pathophysiological roles in vascular remodeling and endothelial dysfunction. In this study, ROS-hypertension-related genes were collected by the biological literature-mining tools, such as SciMiner and gene2pubmed, in order to identify the genes that would cause hypertension through ROS. Further, single nucleotide polymorphisms (SNPs) located within these gene regions were examined statistically for their association with hypertension in 6,419 Korean individuals, and pathway enrichment analysis using the associated genes was performed. The 2,945 SNPs of 237 ROS-hypertension genes were analyzed, and 68 genes were significantly associated with hypertension (p < 0.05). The most significant SNP was rs2889611 within MAPK8 (p = $2.70{\times}10^{-5}$; odds ratio, 0.82; confidence interval, 0.75 to 0.90). This study demonstrates that a text mining approach combined with association analysis may be useful to identify the candidate genes that cause hypertension through ROS or oxidative stress.

Introns: The Functional Benefits of Introns in Genomes

  • Jo, Bong-Seok;Choi, Sun Shim
    • Genomics & Informatics
    • /
    • 제13권4호
    • /
    • pp.112-118
    • /
    • 2015
  • The intron has been a big biological mystery since it was first discovered in several aspects. First, all of the completely sequenced eukaryotes harbor introns in the genomic structure, whereas no prokaryotes identified so far carry introns. Second, the amount of total introns varies in different species. Third, the length and number of introns vary in different genes, even within the same species genome. Fourth, all introns are copied into RNAs by transcription and DNAs by replication processes, but intron sequences do not participate in protein-coding sequences. The existence of introns in the genome should be a burden to some cells, because cells have to consume a great deal of energy to copy and excise them exactly at the correct positions with the help of complicated spliceosomal machineries. The existence throughout the long evolutionary history is explained, only if selective advantages of carrying introns are assumed to be given to cells to overcome the negative effect of introns. In that regard, we summarize previous research about the functional roles or benefits of introns. Additionally, several other studies strongly suggesting that introns should not be junk will be introduced.

Expressional Subpopulation of Cancers Determined by G64, a Co-regulated Module

  • Min, Jae-Woong;Choi, Sun Shim
    • Genomics & Informatics
    • /
    • 제13권4호
    • /
    • pp.132-136
    • /
    • 2015
  • Studies of cancer heterogeneity have received considerable attention recently, because the presence or absence of resistant sub-clones may determine whether or not certain therapeutic treatments are effective. Previously, we have reported G64, a co-regulated gene module composed of 64 different genes, can differentiate tumor intra- or inter-subpopulations in lung adenocarcinomas (LADCs). Here, we investigated whether the G64 module genes were also expressed distinctively in different subpopulations of other cancers. RNA sequencing-based transcriptome data derived from 22 cancers, except LADC, were downloaded from The Cancer Genome Atlas (TCGA). Interestingly, the 22 cancers also expressed the G64 genes in a correlated manner, as observed previously in an LADC study. Considering that gene expression levels were continuous among different tumor samples, tumor subpopulations were investigated using extreme expressional ranges of G64-i.e., tumor subpopulation with the lowest 15% of G64 expression, tumor subpopulation with the highest 15% of G64 expression, and tumor subpopulation with intermediate expression. In each of the 22 cancers, we examined whether patient survival was different among the three different subgroups and found that G64 could differentiate tumor subpopulations in six other cancers, including sarcoma, kidney, brain, liver, and esophageal cancers.

Allelic Frequencies of 20 Visible Phenotype Variants in the Korean Population

  • Lim, Ji Eun;Oh, Bermseok
    • Genomics & Informatics
    • /
    • 제11권2호
    • /
    • pp.93-96
    • /
    • 2013
  • The prediction of externally visible characteristics from DNA has been studied for forensic genetics over the last few years. Externally visible characteristics include hair, skin, and eye color, height, and facial morphology, which have high heritability. Recent studies using genome-wide association analysis have identified genes and variations that correlate with human visible phenotypes and developed phenotype prediction programs. However, most prediction models were constructed and validated based on genotype and phenotype information on Europeans. Therefore, we need to validate prediction models in diverse ethnic populations. In this study, we selected potentially useful variations for forensic science that are associated with hair and eye color, iris pattern, and facial morphology, based on previous studies, and analyzed their frequencies in 1,920 Koreans. Among 20 single nucleotide polymorphisms (SNPs), 10 SNPs were polymorphic, 6 SNPs were very rare (minor allele frequency < 0.005), and 4 SNPs were monomorphic in the Korean population. Even though the usability of these SNPs should be verified by an association study in Koreans, this study provides 10 potential SNP markers for forensic science for externally visible characteristics in the Korean population.

Genome-Wide Association Studies of the Korea Association REsource (KARE) Consortium

  • Hong, Kyung-Won;Kim, Hyung-Lae;Oh, Berm-Seok
    • Genomics & Informatics
    • /
    • 제8권3호
    • /
    • pp.101-102
    • /
    • 2010
  • During the last decade, large community cohorts have been established by the Korea National Institutes of Health (KNIH), and enormous epidemiological and clinical data have been accumulated. Using these information and samples in the cohorts, KNIH set out to do a large-scale genome-wide association study (GWAS) in 2007, and the Korea Association REsource (KARE) consortium was launched to analyze the data to identify the underlying genetic risk factors of diseases and diverse health indexes, such as blood pressure, obesity, bone density, and blood biochemical traits. The consortium consisted of 6 research divisions, formed by 25 principal investigators in 19 organizations, including 18 universities, 2 institutes, and 1 company. Each division focused on one of the following subjects: the identification of genetic factors, the statistical analysis of gene-gene interactions, the genetic epidemiology of gene-environment interactions, copy number variation, the bioinformatics related to a GWAS, and a GWAS of nutrigenomics. In this special issue, the study results of the KARE consortium are provided as 9 articles. We hope that this special issue might encourage the genomics community to share data and scientists, including clinicians, to analyze the valuable Korean data of KARE.

Improving methods for normalizing biomedical text entities with concepts from an ontology with (almost) no training data at BLAH5 the CONTES

  • Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
    • Genomics & Informatics
    • /
    • 제17권2호
    • /
    • pp.20.1-20.5
    • /
    • 2019
  • Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.

A Universal Analysis Pipeline for Hybrid Capture-Based Targeted Sequencing Data with Unique Molecular Indexes

  • Kim, Min-Jung;Kim, Si-Cho;Kim, Young-Joon
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.29.1-29.5
    • /
    • 2018
  • Hybrid capture-based targeted sequencing is being used increasingly for genomic variant profiling in tumor patients. Unique molecular index (UMI) technology has recently been developed and helps to increase the accuracy of variant calling by minimizing polymerase chain reaction biases and sequencing errors. However, UMI-adopted targeted sequencing data analysis is slightly different from the methods for other types of omics data, and its pipeline for variant calling is still being optimized in various study groups for their own purposes. Due to this provincial usage of tools, our group built an analysis pipeline for global application to many studies of targeted sequencing generated with different methods. First, we generated hybrid capture-based data using genomic DNA extracted from tumor tissues of colorectal cancer patients. Sequencing libraries were prepared and pooled together, and an 8-plexed capture library was processed to the enrichment step before 150-bp paired-end sequencing with Illumina HiSeq series. For the analysis, we evaluated several published tools. We focused mainly on the compatibility of the input and output of each tool. Finally, our laboratory built an analysis pipeline specialized for UMI-adopted data. Through this pipeline, we were able to estimate even on-target rates and filtered consensus reads for more accurate variant calling. These results suggest the potential of our analysis pipeline in the precise examination of the quality and efficiency of conducted experiments.

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

  • Lithgow-Serrano, Oscar;Cornelius, Joseph;Kanjirangat, Vani;Mendez-Cruz, Carlos-Francisco;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.22.1-22.5
    • /
    • 2021
  • Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.

Identification of key genes and functional enrichment analysis of liver fibrosis in nonalcoholic fatty liver disease through weighted gene co-expression network analysis

  • Yue Hu;Jun Zhou
    • Genomics & Informatics
    • /
    • 제21권4호
    • /
    • pp.45.1-45.11
    • /
    • 2023
  • Nonalcoholic fatty liver disease (NAFLD) is a common type of chronic liver disease, with severity levels ranging from nonalcoholic fatty liver to nonalcoholic steatohepatitis (NASH). The extent of liver fibrosis indicates the severity of NASH and the risk of liver cancer. However, the mechanism underlying NASH development, which is important for early screening and intervention, remains unclear. Weighted gene co-expression network analysis (WGCNA) is a useful method for identifying hub genes and screening specific targets for diseases. In this study, we utilized an mRNA dataset of the liver tissues of patients with NASH and conducted WGCNA for various stages of liver fibrosis. Subsequently, we employed two additional mRNA datasets for validation purposes. Gene set enrichment analysis (GSEA) was conducted to analyze gene function enrichment. Through WGCNA and subsequent analyses, complemented by validation using two additional datasets, we identified five genes (BICC1, C7, EFEMP1, LUM, and STMN2) as hub genes. GSEA analysis indicated that gene sets associated with liver metabolism and cholesterol homeostasis were uniformly downregulated. BICC1, C7, EFEMP1, LUM, and STMN2 were identified as hub genes of NASH, and were all related to liver metabolism, NAFLD, NASH, and related diseases. These hub genes might serve as potential targets for the early screening and treatment of NASH.