• Title/Summary/Keyword: gene annotation

Search Result 180, Processing Time 0.027 seconds

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

  • Lithgow-Serrano, Oscar;Cornelius, Joseph;Kanjirangat, Vani;Mendez-Cruz, Carlos-Francisco;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.22.1-22.5
    • /
    • 2021
  • Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.

Complete genome sequence of Lactococcus taiwanensis strain K_LL004, encoding hydrolytic enzymes of plant polysaccharides isolated from grasshopper (Oxya chinensis sinuosa)

  • Hyunok Doo;Hyeri Kim;Jin Ho Cho;Minho Song;Eun Sol Kim;Jae Hyoung Cho;Sheena Kim;Gi Beom Keum;Jinok Kwak;Sriniwas Pandey;Hyeun Bum Kim;Ju-Hoon Lee
    • Journal of Animal Science and Technology
    • /
    • v.65 no.3
    • /
    • pp.679-682
    • /
    • 2023
  • The Lactococcus taiwanensis strain K_LL004 was isolated from the gut of a grasshopper (Oxya chinensis sinuosa) collected from local farm in Korea. L. taiwanensis strain K_LL004 is the functional probiotic candidate with an ability to hydrolyse plant polysaccharides. The complete genome of the L. taiwanensis strain K_LL004 contains one circular chromosome (1,995,099 bp) with a guanine + cytosine (GC) content of 38.8%. Moreover, 1,929 Protein-coding sequence, 19 rRNA genes, and 62 tRNA genes were identified based on results of annotation. L. taiwanensis strain K_LL004 has a gene, which encodes hydrolytic enzymes such as beta-glucosidase and beta-xylosidase, that hydrolyzes plant polysaccharides.

The Brassica rapa Tissue-specific EST Database (배추의 조직 특이적 발현유전자 데이터베이스)

  • Yu, Hee-Ju;Park, Sin-Gi;Oh, Mi-Jin;Hwang, Hyun-Ju;Kim, Nam-Shin;Chung, Hee;Sohn, Seong-Han;Park, Beom-Seok;Mun, Jeong-Hwan
    • Horticultural Science & Technology
    • /
    • v.29 no.6
    • /
    • pp.633-640
    • /
    • 2011
  • Brassica rapa is an A genome model species for Brassica crop genetics, genomics, and breeding. With the completion of sequencing the B. rapa genome, functional analysis of the genome is forthcoming issue. The expressed sequence tags are fundamental resources supporting annotation and functional analysis of the genome including identification of tissue-specific genes and promoters. As of July 2011, 147,217 ESTs from 39 cDNA libraries of B. rapa are reported in the public database. However, little information can be retrieved from the sequences due to lack of organized databases. To leverage the sequence information and to maximize the use of publicly-available EST collections, the Brassica rapa tissue-specific EST database (BrTED) is developed. BrTED includes sequence information of 23,962 unigenes assembled by StackPack program. The unigene set is used as a query unit for various analyses such as BLAST against TAIR gene model, functional annotation using MIPS and UniProt, gene ontology analysis, and prediction of tissue-specific unigene sets based on statistics test. The database is composed of two main units, EST sequence processing and information retrieving unit and tissue-specific expression profile analysis unit. Information and data in both units are tightly inter-connected to each other using a web based browsing system. RT-PCR evaluation of 29 selected unigene sets successfully amplified amplicons from the target tissues of B. rapa. BrTED provided here allows the user to identify and analyze the expression of genes of interest and aid efforts to interpret the B. rapa genome through functional genomics. In addition, it can be used as a public resource in providing reference information to study the genus Brassica and other closely related crop crucifer plants.

Identification and Characterization of Protein Encoded by orf382 as $\small{L}$-Threonine Dehydrogenase

  • Ma, Fei;Wang, Tianwen;Ma, Xingyuan;Wang, Ping
    • Journal of Microbiology and Biotechnology
    • /
    • v.24 no.6
    • /
    • pp.748-755
    • /
    • 2014
  • In the genome annotation of Escherichia coli MG1655, the orf382 (1,149 bp) is designated as a gene encoding an alcohol dehydrogenase that may be Fe-dependent. In this study, the gene was amplified from the genome by PCR and overexpressed in Escherichia coli BL21(DE3). The recombinant $6{\times}$His-tag protein was then purified and characterized. In an enzymatic assay using different hydroxyl-containing substrates (n-butanol, $\small{L}$-threonine, ethanol, isopropanol, glucose, glycerol, $\small{L}$-serine, lactic acid, citric acid, methanol, or $\small{D}$-threonine), the enzyme showed the highest activity on $\small{L}$-threonine. Characterization of the mutant constructed using gene knockout of the orf382 also implied the function of the enzyme in the metabolism of $\small{L}$-threonine into glycine. Considering the presence of tested substrates in living E. coli cel ls and previous literature, we believed that the suitable nomenclature for the enzyme should be an $\small{L}$-threonine dehydrogenase (LTDH). When using $\small{L}$-threonine as the substrate, the enzyme exhibited the best catalytic performance at $39^{\circ}C$ and pH 9.8 with $NAD^+$ as the cofactor. The determination of the Km values towards $\small{L}$-threonine (Km = $11.29{\mu}M$), ethanol ($222.5{\mu}M$), and n-butanol ($8.02{\mu}M$) also confirmed the enzyme as an LTDH. Furthermore, the LTDH was shown to be an ion-containing protein based on inductively coupled plasma-atomic emission spectrometry with an isoelectronic point of pH 5.4. Moreover, a circular dichroism analysis revealed that the metal ion was structurally and enzymatically essential, as its deprivation remarkably changed the ${\alpha}$-helix percentage (from 12.6% to 6.3%).

Elucidation of the Biosynthetic Pathway of Vitamin B Groups and Potential Secondary Metabolite Gene Clusters Via Genome Analysis of a Marine Bacterium Pseudoruegeria sp. M32A2M

  • Cho, Sang-Hyeok;Lee, Eunju;Ko, So-Ra;Jin, Sangrak;Song, Yoseb;Ahn, Chi-Yong;Oh, Hee-Mock;Cho, Byung-Kwan;Cho, Suhyung
    • Journal of Microbiology and Biotechnology
    • /
    • v.30 no.4
    • /
    • pp.505-514
    • /
    • 2020
  • The symbiotic nature of the relationship between algae and marine bacteria is well-studied among the complex microbial interactions. The mutual profit between algae and bacteria occurs via nutrient and vitamin exchange. It is necessary to analyze the genome sequence of a bacterium to predict its symbiotic relationships. In this study, the genome of a marine bacterium, Pseudoruegeria sp. M32A2M, isolated from the south-eastern isles (GeoJe-Do) of South Korea, was sequenced and analyzed. A draft genome (91 scaffolds) of 5.5 Mb with a DNA G+C content of 62.4% was obtained. In total, 5,101 features were identified from gene annotation, and 4,927 genes were assigned to functional proteins. We also identified transcription core proteins, RNA polymerase subunits, and sigma factors. In addition, full flagella-related gene clusters involving the flagellar body, motor, regulator, and other accessory compartments were detected even though the genus Pseudoruegeria is known to comprise non-motile bacteria. Examination of annotated KEGG pathways revealed that Pseudoruegeria sp. M32A2M has the metabolic pathways for all seven vitamin Bs, including thiamin (vitamin B1), biotin (vitamin B7), and cobalamin (vitamin B12), which are necessary for symbiosis with vitamin B auxotroph algae. We also identified gene clusters for seven secondary metabolites including ectoine, homoserine lactone, beta-lactone, terpene, lasso peptide, bacteriocin, and non-ribosomal proteins.

Gene expression profile of the early embryonic gene of the silkworm, Bombyx mori (누에 수정란 초기발현유전자 데이터베이스 구축)

  • Choi, Kwang-Ho;Goo, Tae-Won;Kim, Seong-Ryul;Kim, Sung-Wan;Chun, Jae-Buhm;Park, Seoung-Won;Kang, Seok-Woo
    • Journal of Sericultural and Entomological Science
    • /
    • v.51 no.2
    • /
    • pp.191-196
    • /
    • 2013
  • This study was aimed for development of a useful genes that has a transcript expressional specificity in the early embryonic stage of the silkworm, Bombyx mori. We constructed and analyzed a full-length cDNA library from silkworm's eggs which after a lapse of 2 ~ 6 hours post oviposit. A total 960 clones were randomly selected, and the 5' ends of the inserts were sequenced to generate 652 expressed sequence tags(EST). 334 unique ESTs were generated after the assembly of 652 ESTs. The annotation of 334 unique ESTs by BLAST search revealed that 156(47%) of the sequences represented known genes, whereas 178(53%) of the sequences has no matches in the database. Of the 156 known genes, the most abundant genes were heat shock protein hsp20.8 gene(12 times) and ubiqutin-like protein gene(11 times). The functional groups of these ESTs with matches in the database were constructed according to their putative molecular functions. Among thirteen functional categories, the largest groups were protein synthesis(9.6%) and cellular organization( 8.1%). Further defined studies on molecular functions and biological roles of their promoters will give us wellfined information and its application.

Coordinated alteration of mRNA-microRNA transcriptomes associated with exosomes and fatty acid metabolism in adipose tissue and skeletal muscle in grazing cattle

  • Muroya, Susumu;Ogasawara, Hideki;Nohara, Kana;Oe, Mika;Ojima, Koichi;Hojito, Masayuki
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.11
    • /
    • pp.1824-1836
    • /
    • 2020
  • Objective: On the hypothesis that grazing of cattle prompts organs to secrete or internalize circulating microRNAs (c-miRNAs) in parallel with changes in energy metabolism, we aimed to clarify biological events in adipose, skeletal muscle, and liver tissues in grazing Japanese Shorthorn (JSH) steers by a transcriptomic approach. Methods: The subcutaneous fat (SCF), biceps femoris muscle (BFM), and liver in JSH steers after three months of grazing or housing were analyzed using microarray and quantitative polymerase chain reaction (qPCR), followed by gene ontology (GO) and functional annotation analyses. Results: The results of transcriptomics indicated that SCF was highly responsive to grazing compared to BFM and liver tissues. The 'Exosome', 'Carbohydrate metabolism' and 'Lipid metabolism' were extracted as the relevant GO terms in SCF and BFM, and/or liver from the >1.5-fold-altered mRNAs in grazing steers. The qPCR analyses showed a trend of upregulated gene expression related to exosome secretion and internalization (charged multivesicular body protein 4A, vacuolar protein sorting-associated protein 4B, vesicle associated membrane protein 7, caveolin 1) in the BFM and SCF, as well as upregulation of lipolysis-associated mRNAs (carnitine palmitoyltransferase 1A, hormone-sensitive lipase, perilipin 1, adipose triglyceride lipase, fatty acid binding protein 4) and most of the microRNAs (miRNAs) in SCF. Moreover, gene expression related to fatty acid uptake and inter-organ signaling (solute carrier family 27 member 4 and angiopoietin-like 4) was upregulated in BFM, suggesting activation of SCF-BFM organ crosstalk for energy metabolism. Meanwhile, expression of plasma exosomal miR-16a, miR-19b, miR-21-5p, and miR-142-5p was reduced. According to bioinformatic analyses, the c-miRNA target genes are associated with the terms 'Endosome', 'Caveola', 'Endocytosis', 'Carbohydrate metabolism', and with pathways related to environmental information processing and the endocrine system. Conclusion: Exosome and fatty acid metabolism-related gene expression was altered in SCF of grazing cattle, which could be regulated by miRNA such as miR-142-5p. These changes occurred coordinately in both the SCF and BFM, suggesting involvement of exosome in the SCF-BFM organ crosstalk to modulate energy metabolism.

Systematic approaches to identify functional genes using the FOX-hunting system in Chinese cabbage (FOX hunting system을 이용한 배추 기능유전자 탐색)

  • Lee, In-Hoo;Jung, Yu-Jin;Park, Jong-In;Nou, Ill-Sup;Kang, Kwon-Kyoo
    • Journal of Plant Biotechnology
    • /
    • v.37 no.2
    • /
    • pp.174-185
    • /
    • 2010
  • Full-length cDNAs are essential for the correct annotation of genomic sequences and for the functional analysis of genes and their products. To elucidate the functions of a large population of Chinese cabbage (Brassica rapa) genes and to search efficiently for agriculturally useful genes, we have been taking advantage of the full-length cDNA Over-eXpresser (FOX) gene hunting system. With oligo dT column it purify the each mRNA from the flower organs, leaf and stem tissue. And about 120,000 cDNAs from the library were transformed into $\lambda$-pFLCIII-F vector. Of which 115,000 cDNAs from the library were transformed into T-DNA binary vector, pBigs for transformation study. We used normalized full-length cDNA and introduced each cDNA into Arabidopsis by in planta transformation. Full-length Chinese cabbage cDNAs were expressed independently under the CaMV 35S promoter in Arabidopsis. Selfed seeds were harvested from transgenic Arabidopsis. We had selected 2,500 transgenic plants by hygromycin antibiotic tolerant test, and obtained a number of transgenic mutants. Each transgenic Arabidopsis was investigated in morphological changes, fertility and leaf colour. As a result, 285 possible morphological mutants were identified. Introduced cDNA was isolated by PCR amplification of the genomic DNA from the transgenic mutants. Sequencing result and BLAST analysis showed that most of the introduced cDNA were complete cDNAs and functional genes. Also, we examined the effect of Bromelain on enhancing resistance to soft rot in transgenic Chinese cabbage 'Osome'. The bromelain gene identified from FOX hunting system was transformed into Chinese cabbage using Agrobacterium methods. Transformants were screened by PCR, then RT-PCR and real time PCR were performed to analyze gene expression of cysteine protease in the T1 and T2 generations. The anti-bacterial activity of bromelain was tested in Chinese cabbages infected with soft rot bacteria. The results showed that the over-expressed bromelain gene from pineapple conferred enhanced resistance to soft rot in Chinese cabbage.

Draft Genome Assembly and Annotation for Cutaneotrichosporon dermatis NICC30027, an Oleaginous Yeast Capable of Simultaneous Glucose and Xylose Assimilation

  • Wang, Laiyou;Guo, Shuxian;Zeng, Bo;Wang, Shanshan;Chen, Yan;Cheng, Shuang;Liu, Bingbing;Wang, Chunyan;Wang, Yu;Meng, Qingshan
    • Mycobiology
    • /
    • v.50 no.1
    • /
    • pp.66-78
    • /
    • 2022
  • The identification of oleaginous yeast species capable of simultaneously utilizing xylose and glucose as substrates to generate value-added biological products is an area of key economic interest. We have previously demonstrated that the Cutaneotrichosporon dermatis NICC30027 yeast strain is capable of simultaneously assimilating both xylose and glucose, resulting in considerable lipid accumulation. However, as no high-quality genome sequencing data or associated annotations for this strain are available at present, it remains challenging to study the metabolic mechanisms underlying this phenotype. Herein, we report a 39,305,439 bp draft genome assembly for C. dermatis NICC30027 comprised of 37 scaffolds, with 60.15% GC content. Within this genome, we identified 524 tRNAs, 142 sRNAs, 53 miRNAs, 28 snRNAs, and eight rRNA clusters. Moreover, repeat sequences totaling 1,032,129 bp in length were identified (2.63% of the genome), as were 14,238 unigenes that were 1,789.35 bp in length on average (64.82% of the genome). The NCBI non-redundant protein sequences (NR) database was employed to successfully annotate 11,795 of these unigenes, while 3,621 and 11,902 were annotated with the Swiss-Prot and TrEMBL databases, respectively. Unigenes were additionally subjected to pathway enrichment analyses using the Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Cluster of Orthologous Groups of proteins (COG), Clusters of orthologous groups for eukaryotic complete genomes (KOG), and Non-supervised Orthologous Groups (eggNOG) databases. Together, these results provide a foundation for future studies aimed at clarifying the mechanistic basis for the ability of C. dermatis NICC30027 to simultaneously utilize glucose and xylose to synthesize lipids.

Quantitative Expression Analysis of Functional Genes in Four Dog Breeds (개의 네 품종에서 기능 유전자들에 대한 정량적 발현 분석)

  • Gim, Jeong-An;Kim, Sang-Hoon;Lee, Hee-Eun;Jeong, Hoim;Nam, Gyu-Hwi;Kim, Min Kyu;Huh, Jae-Won;Choi, Bong-Hwan;Kim, Heui-Soo
    • Journal of Life Science
    • /
    • v.25 no.8
    • /
    • pp.861-869
    • /
    • 2015
  • One of the domesticated species; the dog has been selectively bred for various aims by human. The dog has many breeds, which are artificially selected for specific behaviors and morphologies. Dogs contribute their life to human as working dogs for guide, rescue, detection or etc. Working dogs requires good personality, such as gentleness, robustness and patience for performing their special duty. Many studies have concentrated on finding genetic marker for selecting the high-quality working dog. In this study, we confirmed quantitative expression patterns of eight genes (ABAT; 4-Aminobutyrate Aminotransferase, PLCB1; Phospholipase C, Beta 1, SLC10A4; Solute Carrier Family 10, Member 4, WNT1; Wingless-Type MMTV Integration Site Family, Member 1, BARX2; BarH-Like Homeobox 2, NEUROD6; Neuronal Differentiation 6, SEPT9; Septin 9 and TBR1; T-Box, Brain, 1) among brains tissues from four dog breeds (Beagle, Sapsaree, Shepherd and Jindo), because these genes were expressed and have functions in brain mostly. Specially, BARX2, SEPT9, SLC10A4, TBR1 and WNT1 genes were highly expressed in Beagle and Jindo, and Sapsaree and German Shepherd were vice versa. The biological significance of total genes was estimated by database for annotation, visualization and integrated discovery (DAVID) to determine a different gene ontology (GO) class. In these analyses, we suppose to these eight genes could provide influential information for brain development, and intelligence of organisms. Taken together, these results could provide clues to discover biomarker related to functional traits in brain, and beneficial for selecting superior working dogs.