• Title/Summary/Keyword: Data annotation

Search Result 259, Processing Time 0.023 seconds

Functional Annotation and Analysis of Korean Patented Biological Sequences Using Bioinformatics

  • Lee, Byung Wook;Kim, Tae Hyung;Kim, Seon Kyu;Kim, Sang Soo;Ryu, Gee Chan;Bhak, Jong
    • Molecules and Cells
    • /
    • v.21 no.2
    • /
    • pp.269-275
    • /
    • 2006
  • A recent report of the Korean Intellectual Property Office(KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net

Chromosome-Centric Human Proteome Study of Chromosome 11 Team

  • Hwang, Heeyoun;Kim, Jin Young;Yoo, Jong Shin
    • Mass Spectrometry Letters
    • /
    • v.12 no.3
    • /
    • pp.60-65
    • /
    • 2021
  • As a part of the Chromosome-centric Human Proteome Project (C-HPP), we have developed a few algorithms for accurate identification of missing proteins, alternative splicing variants, single amino acid variants, and characterization of function unannotated proteins. We have found missing proteins, novel and known ASVs, and SAAVs using LC-MS/MS data from human brain and olfactory epithelial tissue, where we validated their existence using synthetic peptides. According to the neXtProt database, the number of missing proteins in chromosome 11 shows a decreasing pattern. The development of genomic and transcriptomic sequencing techniques make the number of protein variants in chromosome 11 tremendously increase. We developed a web solution named as SAAvpedia for identification and function annotation of SAAVs, and the SAAV information is automatically transformed into the neXtProt web page using REST API service. For the 73 uPE1 in chromosome 11, we have studied the function annotaion of CCDC90B (NX_Q9GZT6), SMAP (NX_O00193), and C11orf52 (NX_Q96A22).

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

High quality genome sequence of Treponema phagedenis KS1 isolated from bovine digital dermatitis

  • Espiritu, Hector M.;Mamuad, Lovelia L.;Jin, Su-jeong;Kim, Seon-ho;Lee, Sang-suk;Cho, Yong-il
    • Journal of Animal Science and Technology
    • /
    • v.62 no.6
    • /
    • pp.948-951
    • /
    • 2020
  • Treponema phagedenis KS1, a fastidious anaerobe, was isolated from a bovine digital dermatitis (BDD)-infected dairy cattle in Chungnam, Korea. Initial data indicated that T. phagedenis KS1 exhibited putative virulent phenotypic characteristics. This study reports the whole genome assembly and annotation of T. phagedenis KS1 (KCTC14157BP) to assist in the identification of putative pathogenicity related factors. The whole genome of T. phagedenis KS1 was sequenced using PacBio RSII and Illumina HiSeqXTen platforms. The assembled T. phagedenis KS1 genome comprises 16 contigs with a total size of 3,769,422 bp and an overall guanine-cytosine (GC) content of 40.03%. Annotation revealed 3,460 protein-coding genes, as well as 49 transfer RNA- and 6 ribosomal RNA-coding genes. The results of this study provide insight into the pathogenicity of T. phagedenis KS1.

Implementation of Annotation-Based and Content-Based Image Retrieval System using (영상의 에지 특징정보를 이용한 주석기반 및 내용기반 영상 검색 시스템의 구현)

  • Lee, Tae-Dong;Kim, Min-Koo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.5
    • /
    • pp.510-521
    • /
    • 2001
  • Image retrieval system should be construct for searching fast, efficient image be extract the accurate feature information of image with more massive and more complex characteristics. Image retrieval system are essential differences between image databases and traditional databases. These differences lead to interesting new issues in searching of image, data modeling. So, cause us to consider new generation method of database, efficient retrieval method of image. In this paper, To extract feature information of edge using in searching from input image, we was performed to extract the edge by convolution Laplacian mask and input image, and we implemented the annotation-based and content-based image retrieval system for searching fast, efficient image by generation image database from extracting feature information of edge and metadata. We can improve the performance of the image contents retrieval, because the annotation-based and content-based image retrieval system is using image index which is made up of the content-based edge feature extract information represented in the low level of image and annotation-based edge feature information represented in the high level of image. As a conclusion, image retrieval system proposed in this paper is possible the accurate management of the accumulated information for the image contents and the information sharing and reuse of image because the proposed method do construct the image database by metadata.

  • PDF

Video Event Detection according to Generating of Semantic Unit based on Moving Object (객체 움직임의 의미적 단위 생성을 통한 비디오 이벤트 검출)

  • Shin, Ju-Hyun;Baek, Sun-Kyoung;Kim, Pan-Koo
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.2
    • /
    • pp.143-152
    • /
    • 2008
  • Nowadays, many investigators are studying various methodologies concerning event expression for semantic retrieval of video data. However, most of the parts are still using annotation based retrieval that is defined into annotation of each data and content based retrieval using low-level features. So, we propose a method of creation of the motion unit and extracting event through the unit for the more semantic retrieval than existing methods. First, we classify motions by event unit. Second, we define semantic unit about classified motion of object. For using these to event extraction, we create rules that are able to match the low-level features, from which we are able to retrieve semantic event as a unit of video shot. For the evaluation of availability, we execute an experiment of extraction of semantic event in video image and get approximately 80% precision rate.

  • PDF

Implementation of a Video Retrieval System Using Annotation and Comparison Area Learning of Key-Frames (키 프레임의 주석과 비교 영역 학습을 이용한 비디오 검색 시스템의 구현)

  • Lee Keun-Wang;Kim Hee-Sook;Lee Jong-Hee
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.2
    • /
    • pp.269-278
    • /
    • 2005
  • In order to process video data effectively, it is required that the content information of video data is loaded in database and semantics-based retrieval method can be available for various queries of users. In this paper, we propose a video retrieval system which support semantics retrieval of various users for massive video data by user's keywords and comparison area learning based on automatic agent. By user's fundamental query and selection of image for key frame that extracted from query, the agent gives the detail shape for annotation of extracted key frame. Also, key frame selected by user becomes a query image and searches the most similar key frame through color histogram comparison and comparison area learning method that proposed. From experiment, the designed and implemented system showed high precision ratio in performance assessment more than 93 percents.

  • PDF

EST Knowledge Integrated Systems (EKIS): An Integrated Database of EST Information for Research Application

  • Kim, Dae-Won;Jung, Tae-Sung;Choi, Young-Sang;Nam, Seong-Hyeuk;Kwon, Hyuk-Ryul;Kim, Dong-Wook;Choi, Han-Suk;Choi, Sang-Heang;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • v.7 no.1
    • /
    • pp.38-40
    • /
    • 2009
  • The EST Knowledge Integrated System, EKIS (http://ekis.kribb.re.kr), was established as a part of Korea's Ministry of Education, Science and Technology initiative for genome sequencing and application research of the biological model organisms (GEAR) project. The goals of the EKIS are to collect EST information from GEAR projects and make an integrated database to provide transcriptomic and metabolomic information for biological scientists. The EKIS constitutes five independent categories and several retrieval systems in each category for incorporating massive EST data from high-throughput sequencing of 65 different species. Through the EKIS database, scientists can freely access information including BLAST functional annotation as well as Genechip and pathway information for KEGG. By integrating complex data into a framework of existing EST knowledge information, the EKIS provides new insights into specialized metabolic pathway information for an applied industrial material.

A Scheme for News Videos based on MPEG-7 and Its Summarization Mechanism by using the Key-Frames of Selected Shot Types (MPEG-7을 기반으로 한 뉴스 동영상 스키마 및 샷 종류별 키프레임을 이용한 요약 생성 방법)

  • Jeong, Jin-Guk;Sim, Jin-Sun;Nang, Jong-Ho;Kim, Gyung-Su;Ha, Myung-Hwan;Jung, Byung-Heei
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.5
    • /
    • pp.530-539
    • /
    • 2002
  • Recently, there have been a lot of researches to develop an archive system for news videos that usually has a fixed structure. However, since the meta-data representation and storing schemes for news video are different from each other in the previously proposed archive systems, it was very hard to exchange these meta-data. This paper proposes a scheme for news video based on MPEG-7 MDS that is an international standard to represent the contents of multimedia, and a summarization mechanism reflecting the characteristics of shots in the news videos. The proposed scheme for news video uses the MPEG-7 MDS schemes such as VideoSegment and TextAnnotation to keep the original structure of news video, and the proposed summarization mechanism uses a slide-show style presentation of key frames with associated audio to reduce the data size of the summary video.

XML Based Meta-data Specification for Industrial Speech Databases (산업용 음성 DB를 위한 XML 기반 메타데이터)

  • Joo Young-Hee;Hong Ki-Hyung
    • MALSORI
    • /
    • v.55
    • /
    • pp.77-91
    • /
    • 2005
  • In this paper, we propose an XML based meta-data specification for industrial speech databases. Building speech databases is very time-consuming and expensive. Recently, by the government supports, huge amount of speech corpus has been collected as speech databases. However, the formats and meta-data for speech databases are different depending on the constructing institutions. In order to advance the reusability and portability of speech databases, a standard representation scheme should be adopted by all speech database construction institutions. ETRI proposed a XML based annotation scheme [51 for speech databases, but the scheme has too simple and flat modeling structure, and may cause duplicated information. In order to overcome such disadvantages in this previous scheme, we first define the speech database more formally and then identify object appearing in speech databases. We then design the data model for speech databases in an object-oriented way. Based on the designed data model, we develop the meta-data specification for industrial speech databases.

  • PDF