• Title/Summary/Keyword: Annotation tool

Search Result 72, Processing Time 0.027 seconds

Genome Sequencing and Genome-Wide Identification of Carbohydrate-Active Enzymes (CAZymes) in the White Rot Fungus Flammulina fennae

  • Lee, Chang-Soo;Kong, Won-Sik;Park, Young-Jin
    • Microbiology and Biotechnology Letters
    • /
    • v.46 no.3
    • /
    • pp.300-312
    • /
    • 2018
  • Whole-genome sequencing of the wood-rotting fungus, Flammulina fennae, was carried out to identify carbohydrate-active enzymes (CAZymes). De novo genome assembly (31 kmer) of short reads by next-generation sequencing revealed a total genome length of 32,423,623 base pairs (39% GC). A total of 11,591 gene models in the assembled genome sequence of F. fennae were predicted by ab initio gene prediction using the AUGUSTUS tool. In a genome-wide comparison, 6,715 orthologous groups shared at least one gene with F. fennae and 10,667 (92%) of 11,591 genes for F. fennae proteins had orthologs among the Dikarya. Additionally, F. fennae contained 23 species-specific genes, of which 16 were paralogous. CAZyme identification and annotation revealed 513 CAZymes, including 82 auxiliary activities, 220 glycoside hydrolases, 85 glycosyltransferases, 20 polysaccharide lyases, 57 carbohydrate esterases, and 45 carbohydrate binding-modules in the F. fennae genome. The genome information of F. fennae increases the understanding of this basidiomycete fungus. CAZyme gene information will be useful for detailed studies of lignocellulosic biomass degradation for biotechnological and industrial applications.

Annotation Tool for Construction Korean PropBank and Sejong Semantic Tagged Corpus (한국어 PropBank 및 세종 의미 표지 부착 말뭉치 구축을 위한 도구)

  • Han, Dae-Yong;Choi, Han-Gil;Lee, Jung-Kuk;Kim, Jong-Dae;Park, Chan-Young;Song, Hye-Jung;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.35-39
    • /
    • 2012
  • 의미역 결정에 있어 의미 표지 부착 말뭉치는 필수적이지만 한국어 의미 표지 부착 말뭉치는 영어나 중국어와 같은 언어에 비하여 구축이 미비한 상황이다. 본 논문에서는 한국어 의미 분석을 위한 한국어 Proposition Bank(이하 PropBank)와 세종 의미 표지 부착 말뭉치의 구축을 위한 소프트웨어 도구를 개발하였다. 본 논문에서 구현한 도구는 문장 성분의 의존관계를 이용하여 주어진 술어에 대한 논항을 찾아주고, PropBank 프레임 파일과 세종 용언 격틀 사전을 활용하여 사용자가 능률적으로 한국어 PropBank와 세종 의미 표지 부착 말뭉치를 구축할 수 있도록 하였다.

  • PDF

Development of KHapmap Browser using DAS for Korean HapMap Research

  • Jin, Hoon;Kim, Seung-Ho;Kim, Young-Uk;Park, Young-Kyu;Ji, Mi-Hyun;Kim, Young-Joo
    • Genomics & Informatics
    • /
    • v.6 no.2
    • /
    • pp.57-63
    • /
    • 2008
  • The Korean HapMap Project has been carried out for the last 5 years since it started in June, 2003. The project generated data for a sum of 1,764,000 Korean SNPs and formally registered the data to the dbSNP of NCBI (The dbSNP website. 2008). We have developed a series of software programs for association studies as well as for the comparison and analysis of Korean HapMap data with four other populations (CEPH, Yoruba, Han Chinese, and Japanese populations). The KHapmap Browser was developed and integrated to provide haplotype retrieval and comparative study tools of human ethnicities for comprehensive disease association studies (http://www.khapmap.org). On that basis, GBrowse was adopted in the KHapmap Browser for inherent Korean genetic data, and a provision of extended services was pledged with the distributed sequence annotation system (DAS). The dynamic linking service of the KHapmap Browser to other tools in our intranetwork environment provides many enhanced functions over GBrowse without DAS. KHapmap Browser is expected to be an invaluable tool for the study of Korean and international Hapmap data.

PPeditor: A Corpus Annotation Tool for Korean Dependency Structures (PPeditor: 한국어 의존구조 말뭉치 구축 도구)

  • Park, Eun-Jin;Kim, Jae-Hoon;Kim, Kang-Min;Kim, Chang-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2005.05a
    • /
    • pp.741-744
    • /
    • 2005
  • 효과적인 언어처리 시스템을 개발하기 위해서는 언어정보가 부착된 대량의 말뭉치가 필요하다. 그러나, 대량의 말뭉치를 구축하기 위해서는 많은 시간과 노력이 필요하다. 이와 같은 시간과 노력을 절약하기 위해서 일반적으로 말뭉치 구축 도구를 사용한다. 본 논문에서는 한국어 의존구조 말뭉치를 구축하기 위한 도구를 설계하고 구현하였다. 본 논문에서 개발된 구축 도구는 여러 가지 특징을 가지고 있다. 1) 특정 응용분야에 관계없이 두루 사용할 수 있다. 2) 분석 단계와 분석 오류를 연계하여 작업의 집중도를 높였다. 3) 가능한 한 오류는 축적되지 않도록 하여 구축된 말뭉치의 질을 크게 개선할 수 있었다. 4) 구축된 정보는 서로 공유할 수 있도록 하여 작업의 일관성을 극대화하였다. 5) 초보자로 사용자가 쉽게 도구를 사용할 수 있도록 인터페이스를 설계하였다. 본 논문에서 개발된 구축 도구를 이용하여 8 명의 연구원이 약 2 개월 (하루에 평균 4 시간)에 걸쳐서 10,000 문장의 의존구조 말뭉치를 구축할 수 있었다. 구축된 말뭉치에는 형태소 정보, 구묶음 정보, 의존구조 정보가 부착되어 있다.

  • PDF

Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability

  • Jung, Yong;Seo, Hwa-Jeong;Park, Yu-Rang;Kim, Ji-Hun;Bien, Sang Jay;Kim, Ju-Han
    • Genomics & Informatics
    • /
    • v.9 no.1
    • /
    • pp.19-27
    • /
    • 2011
  • Gene Expression Omnibus (GEO) has kept the largest amount of gene-expression microarray data that have grown exponentially. Microarray data in GEO have been generated in many different formats and often lack standardized annotation and documentation. It is hard to know if preprocessing has been applied to a dataset or not and in what way. Standard-based integration of heterogeneous data formats and metadata is necessary for comprehensive data query, analysis and mining. We attempted to integrate the heterogeneous microarray data in GEO based on Minimum Information About a Microarray Experiment (MIAME) standard. We unified the data fields of GEO Data table and mapped the attributes of GEO metadata into MIAME elements. We also discriminated non-preprocessed raw datasets from others and processed ones by using a two-step classification method. Most of the procedures were developed as semi-automated algorithms with some degree of text mining techniques. We localized 2,967 Platforms, 4,867 Series and 103,590 Samples with covering 279 organisms, integrated them into a standard-based relational schema and developed a comprehensive query interface to extract. Our tool, GEOQuest is available at http://www.snubi.org/software/GEOQuest/.

A Genome-Wide Analysis of Antibiotic Producing Genes in Streptomyces globisporus SP6C4

  • Kim, Da-Ran;Kwak, Youn-Sig
    • The Plant Pathology Journal
    • /
    • v.37 no.4
    • /
    • pp.389-395
    • /
    • 2021
  • Soil is the major source of plant-associated microbes. Several fungal and bacterial species live within plant tissues. Actinomycetes are well known for producing a variety of antibiotics, and they contribute to improving plant health. In our previous report, Streptomyces globisporus SP6C4 colonized plant tissues and was able to move to other tissues from the initially colonized ones. This strain has excellent antifungal and antibacterial activities and provides a suppressive effect upon various plant diseases. Here, we report the genome-wide analysis of antibiotic producing genes in S. globisporus SP6C4. A total of 15 secondary metabolite biosynthetic gene clusters were predicted using antiSMASH. We used the CRISPR/Cas9 mutagenesis system, and each biosynthetic gene was predicted via protein basic local alignment search tool (BLAST) and rapid annotation using subsystems technology (RAST) server. Three gene clusters were shown to exhibit antifungal or antibacterial activity, viz. cluster 16 (lasso peptide), cluster 17 (thiopeptide-lantipeptide), and cluster 20 (lantipeptide). The results of the current study showed that SP6C4 has a variety of antimicrobial activities, and this strain is beneficial in agriculture.

Mask Region-Based Convolutional Neural Network (R-CNN) Based Image Segmentation of Rays in Softwoods

  • Hye-Ji, YOO;Ohkyung, KWON;Jeong-Wook, SEO
    • Journal of the Korean Wood Science and Technology
    • /
    • v.50 no.6
    • /
    • pp.490-498
    • /
    • 2022
  • The current study aimed to verify the image segmentation ability of rays in tangential thin sections of conifers using artificial intelligence technology. The applied model was Mask region-based convolutional neural network (Mask R-CNN) and softwoods (viz. Picea jezoensis, Larix gmelinii, Abies nephrolepis, Abies koreana, Ginkgo biloba, Taxus cuspidata, Cryptomeria japonica, Cedrus deodara, Pinus koraiensis) were selected for the study. To take digital pictures, thin sections of thickness 10-15 ㎛ were cut using a microtome, and then stained using a 1:1 mixture of 0.5% astra blue and 1% safranin. In the digital images, rays were selected as detection objects, and Computer Vision Annotation Tool was used to annotate the rays in the training images taken from the tangential sections of the woods. The performance of the Mask R-CNN applied to select rays was as high as 0.837 mean average precision and saving the time more than half of that required for Ground Truth. During the image analysis process, however, division of the rays into two or more rays occurred. This caused some errors in the measurement of the ray height. To improve the image processing algorithms, further work on combining the fragments of a ray into one ray segment, and increasing the precision of the boundary between rays and the neighboring tissues is required.

Construction of Training Data and Model Training for YOLOv4-based Factory Operation Safety Management (YOLOv4 기반의 공장 근로자 안전관리를 위한 학습 데이터 구축과 모델 학습)

  • Lee, Taejun;Cho, Minwoo;Song, Jiho;Hwang, Chulhyun;Jung, Heokyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.252-254
    • /
    • 2021
  • According to the Institute for Occupational Safety and Health, the number of industrial injuries in 2019 was 109,242, an increase of 6.8% from 2018. In this situation, the government and companies are discussing the development of core technologies for preventing safety accidents on site based on ICT in the field of construction and construction. In these fields, technologies using computer vision and artificial intelligence have recently been widely used. In this paper, we built training data for safety management of factory workers and trained a model based on YOLOv4. It is believed that this can be used as an initial study to predict the risk situation of workers in factories.

  • PDF

SOP (Search of Omics Pathway): A Web-based Tool for Visualization of KEGG Pathway Diagrams of Omics Data

  • Kim, Jun-Sub;Yeom, Hye-Jung;Kim, Seung-Jun;Kim, Ji-Hoon;Park, Hye-Won;Oh, Moon-Ju;Hwang, Seung-Yong
    • Molecular & Cellular Toxicology
    • /
    • v.3 no.3
    • /
    • pp.208-213
    • /
    • 2007
  • With the help of a development and popularization of microarray technology that enable to us to simultaneously investigate the expression pattern of thousands of genes, the toxicogenomics experimenters can interpret the genome-scale interaction between genes exposed in toxicant or toxicant-related environment. The ultimate and primary goal of toxicogenomics identifies functional context among the group of genes that are differentially or similarly coexpressed under the specific toxic substance. On the other side, public reference databases with transcriptom, proteom, and biological pathway information are needed for the analysis of these complex omics data. However, due to the heterogeneous and independent nature of these databases, it is hard to individually analyze a large omics annotations and their pathway information. Fortunately, several web sites of the public database provide information linked to other. Nevertheless it involves not only approriate information but also unnecessary information to users. Therefore, the systematically integrated database that is suitable to a demand of experimenters is needed. For these reasons, we propose SOP (Search of Omics Pathway) database system which is constructed as the integrated biological database converting heterogeneous feature of public databases into combined feature. In addition, SOP offers user-friendly web interfaces which enable users to submit gene queries for biological interpretation of gene lists derived from omics experiments. Outputs of SOP web interface are supported as the omics annotation table and the visualized pathway maps of KEGG PATHWAY database. We believe that SOP will appear as a helpful tool to perform biological interpretation of genes or proteins traced to omics experiments, lead to new discoveries from their pathway analysis, and design new hypothesis for a next toxicogenomics experiments.

Echocardiography Core Laboratory Validation of a Novel Vendor-Independent Web-Based Software for the Assessment of Left Ventricular Global Longitudinal Strain

  • Ernest Spitzer;Benjamin Camacho;Blaz Mrevlje;Hans-Jelle Brandendburg;Claire B. Ren
    • Journal of Cardiovascular Imaging
    • /
    • v.31 no.3
    • /
    • pp.135-141
    • /
    • 2023
  • BACKGROUND: Global longitudinal strain (GLS) is an accurate and reproducible parameter of left ventricular (LV) systolic function which has shown meaningful prognostic value. Fast, user-friendly, and accurate tools are required for its widespread implementation. We aim to compare a novel web-based tool with two established algorithms for strain analysis and test its reproducibility. METHODS: Thirty echocardiographic datasets with focused LV acquisitions were analyzed using three different semi-automated endocardial GLS algorithms by two readers. Analyses were repeated by one reader for the purpose of intra-observer variability. CAAS Qardia (Pie Medical Imaging) was compared with 2DCPA and AutoLV (TomTec). RESULTS: Mean GLS values were -15.0 ± 3.5% from Qardia, -15.3 ± 4.0% from 2DCPA, and -15.2 ± 3.8% from AutoLV. Mean GLS between Qardia and 2DCPA were not statistically different (p = 0.359), with a bias of -0.3%, limits of agreement (LOA) of 3.7%, and an intraclass correlation coefficient (ICC) of 0.88. Mean GLS between Qardia and AutoLV were not statistically different (p = 0.637), with a bias of -0.2%, LOA of 3.4%, and an ICC of 0.89. The coefficient of variation (CV) for intra-observer variability was 4.4% for Qardia, 8.4% 2DCPA, and 7.7% AutoLV. The CV for inter-observer variability was 4.5%, 8.1%, and 8.0%, respectively. CONCLUSIONS: In echocardiographic datasets of good image quality analyzed at an independent core laboratory using a standardized annotation method, a novel web-based tool for GLS analysis showed consistent results when compared with two algorithms of an established platform. Moreover, inter- and intra-observer reproducibility results were excellent.