• Title/Summary/Keyword: Data annotation

Search Result 258, Processing Time 0.021 seconds

A biomedically oriented automatically annotated Twitter COVID-19 dataset

  • Hernandez, Luis Alberto Robles;Callahan, Tiffany J.;Banda, Juan M.
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.21.1-21.5
    • /
    • 2021
  • The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

XPERNATO-TOX: an Integrated Toxicogenomics Knowledgebase

  • Woo Jung-Hoon;Kim Hyeoun-Eui;Kong Gu;Kim Ju-Han
    • Genomics & Informatics
    • /
    • v.4 no.1
    • /
    • pp.40-44
    • /
    • 2006
  • Toxicogenomics combines transcriptome, proteome and metabolome profiling with conventional toxicology to investigate the interaction between biological molecules and toxicant or environmental stress in disease caution. Toxicogenomics faces the problems of comparison and integration across different sources of data. Cause of unusual characteristics of toxicogenomic data, researcher should be assisted by data analysis and annotation for getting meaningful information. There are already existing repositories which claim to stand for toxicogenomics database. However, those just contain limited abilities for toxicogenomic research. For supporting toxicologist who comes up against toxicogenomic data flood, now we propose novel toxicogenomics knowledgebase system, XPERANTO-TOX. XPERANTO-TOX is an integrated system for toxicogenomic data management and analysis. It is composed of three distinct but closely connected parts. Firstly, Data Storage System is for reposit many kinds of '-omics' data and conventional toxicology data. Secondly, Data Analysis System consists of analytical modules for integrated toxicogenomics data. At last, Data Annotation System is for giving extensive insight of data to researcher.

Patome: Database of Patented Bio-sequences

  • Kim, SeonKyu;Lee, ByungWook
    • Genomics & Informatics
    • /
    • v.3 no.3
    • /
    • pp.94-97
    • /
    • 2005
  • We have built a database server called Patome which contains the annotation information for patented bio-sequences from the Korean Intellectual Property Office (KIPO). The aims of the Patome are to annotate Korean patent bio-sequences and to provide information on patent relationship of public database entries. The patent sequences were annotated with Reference Sequence (RefSeq) or NCBI's nr database. The raw patent data and the annotated data were stored in the database. Annotation information can be used to determine whether a particular RefSeq ID or NCBI's nr ID is related to Korean patent. Patome infrastructure consists of three components­the database itself, a sequence data loader, and an online database query interface. The database can be queried using submission number, organism, title, applicant name, or accession number. Patome can be accessed at http://www.patome.net. The information will be updated every two months.

Design And Implementation of Video Retrieval System for Using Semantic-based Annotation (의미 기반 주석을 이용한 비디오 검색 시스템의 설계 및 구현)

  • 홍수열
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.3
    • /
    • pp.99-105
    • /
    • 2000
  • Video has become an important element of multimedia computing and communication environments, with applications as varied as broadcasting, education, publishing, and military intelligence. The necessity of the efficient methods for multimedia data retrieval is increasing more and more on account of various large scale multimedia applications. According1y, the retrieval and representation of video data becomes one of the main research issues in video database. As for the representation of the video data there have been mainly two approaches: (1) content-based video retrieval, and (2) annotation-based video retrieval This paper designs and implements a video retrieval system for using semantic-based annotation.

  • PDF

Efficient Semi-automatic Annotation System based on Deep Learning

  • Hyunseok Lee;Hwa Hui Shin;Soohoon Maeng;Dae Gwan Kim;Hyojeong Moon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.6
    • /
    • pp.267-275
    • /
    • 2023
  • This paper presents the development of specialized software for annotating volume-of-interest on 18F-FDG PET/CT images with the goal of facilitating the studies and diagnosis of head and neck cancer (HNC). To achieve an efficient annotation process, we employed the SE-Norm-Residual Layer-based U-Net model. This model exhibited outstanding proficiency to segment cancerous regions within 18F-FDG PET/CT scans of HNC cases. Manual annotation function was also integrated, allowing researchers and clinicians to validate and refine annotations based on dataset characteristics. Workspace has a display with fusion of both PET and CT images, providing enhance user convenience through simultaneous visualization. The performance of deeplearning model was validated using a Hecktor 2021 dataset, and subsequently developed semi-automatic annotation functionalities. We began by performing image preprocessing including resampling, normalization, and co-registration, followed by an evaluation of the deep learning model performance. This model was integrated into the software, serving as an initial automatic segmentation step. Users can manually refine pre-segmented regions to correct false positives and false negatives. Annotation images are subsequently saved along with their corresponding 18F-FDG PET/CT fusion images, enabling their application across various domains. In this study, we developed a semi-automatic annotation software designed for efficiently generating annotated lesion images, with applications in HNC research and diagnosis. The findings indicated that this software surpasses conventional tools, particularly in the context of HNC-specific annotation with 18F-FDG PET/CT data. Consequently, developed software offers a robust solution for producing annotated datasets, driving advances in the studies and diagnosis of HNC.

Development of semi-automatic annotation tool for building land cover image data set (토지 관련 이미지 분석 데이터 셋 구축을 위한 반자동 annotation 도구 개발)

  • Jang, Dalwon;Lee, Jaewon;Lee, JongSeol
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.11a
    • /
    • pp.69-70
    • /
    • 2019
  • 본 논문에서는 토지 정보를 분류하는 연구를 수행하기 위한 이미지 데이터 셋을 개발하는데 필요한 반자동 annotation 도구를 제안한다. 논문에서 제안하는 도구는 합성개구레이더 영상을 입력으로 하고, 물/경작지/숲/건물을 구분하는 시스템을 개발하기 위해서 만들어진 것이나, 다른 목적을 가지는 토지 관련 이미지 분석 시스템의 개발에 사용될 수 있다. 제안하는 도구는 합성개구레이더 영상이 GPS 정보와 같이 입력되었을 때, GPS 정보에 기반하여 토지지목정보를 불러오고, 이를 재정리하여 1차 레이블링 결과를 자동적으로 생성한다. 국가에서 관리하는 토지지목정보는 개발하고자 하는 시스템의 분류 기준에 많은 부분 도움이 되긴 하지만, 일부분 차이점이 있기 때문에 이를 다시 수동으로 수정하는 도구을 동작하여 annotation이 완료된 이미지 데이터를 구축한다.

  • PDF

Robust Syntactic Annotation of Corpora and Memory-Based Parsing

  • Hinrichs, Erhard W.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.1-1
    • /
    • 2002
  • This talk provides an overview of current work in my research group on the syntactic annotation of the T bingen corpus of spoken German and of the German Reference Corpus (Deutsches Referenzkorpus: DEREKO) of written texts. Morpho-syntactic and syntactic annotation as well as annotation of function-argument structure for these corpora is performed automatically by a hybrid architecture that combines robust symbolic parsing with finite-state methods ("chunk parsing" in the sense Abney) with memory-based parsing (in the sense of Daelemans). The resulting robust annotations can be used by theoretical linguists, who lire interested in large-scale, empirical data, and by computational linguists, who are in need of training material for a wide range of language technology applications. To aid retrieval of annotated trees from the treebank, a query tool VIQTORYA with a graphical user interface and a logic-based query language has been developed. VIQTORYA allows users to query the treebanks for linguistic structures at the word level, at the level of individual phrases, and at the clausal level.

  • PDF

Development of Video Data-base and a Video Annotation Tool for Evaluation of Smart CCTV System (지능형CCTV시스템 성능평가를 위한 영상DB와 영상 주석도구 개발)

  • Park, Jang-Sik;Yi, Seung-Jai
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.7
    • /
    • pp.739-745
    • /
    • 2014
  • In this paper, an evaluation of intelligent CCTV system is proposed with recording and implementation video and video DB. Videos for evaluation are recorded by dividing far, mid and near zone. Video DB has video recording information, detection area, and ground truth in XML format. A video annotation tool is proposed to make ground truth effectively in this paper. A video annotation tool writes ground truths of videos and includes evaluation comparing system alarms with ground truths.

From Tombstones to Corpora: TSML for Research on Language, Culture, Identity and Gender Differences

  • Streiter, Oliver;Voltmer, Leonhard;Goudin, Yoann
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.450-458
    • /
    • 2007
  • Tombstone inscriptions represent a linguistic genre which yields insights in culture and language. Creating corpora from tombstones is thus a complementary approach for the study of languages and cultures. For the annotation of tombstone corpora, we propose TSML, the Tombstone-Markup-Language, developed during the massive annotation of Taiwanese tombstones and a number of tombstones from China, Indonesia and Europe. We discuss our conceptual framework in the annotation of tombstones and derive successively and present preliminary research data to show how the usefulness of the annotations. Finally, we will encourage researchers to participate in the specification of TSML to obtain soon an annotation language for annotations across cultures and languages.

  • PDF

Annotation Technique Development based on Apparel Attributes for Visual Apparel Search Technology (비주얼 의류 검색기술을 위한 의류 속성 기반 Annotation 기법 개발)

  • Lee, Eun-Kyung;Kim, Yang-Weon;Kim, Seon-Sook
    • Fashion & Textile Research Journal
    • /
    • v.17 no.5
    • /
    • pp.731-740
    • /
    • 2015
  • Mobile (smartphone) search engine marketing is increasingly important. Accordingly, the development of visual apparel search technology to obtain easier and faster access to visual information in the apparel field is urgently needed. This study helps establish a proper classifying system for an apparel search after an analysis of search techniques for apparel search applications and existing domestic and overseas apparel sites. An annotation technique is developed in accordance with visual attributes and apparel categories based on collected data obtained by web crawling and apparel images collecting. The categorical composition of apparel is divided into wearing, image and style. The web evaluation site traces the correlations of the apparel category and apparel factors as dependent upon visual attributes. An appraisal team of 10 individuals evaluated 2860 pieces of merchandise images. Data analysis consisted of correlations between apparel, sleeve length and apparel category (based on an average analysis), and correlation between fastener and apparel category (based on an average analysis). The study results can be considered as an epoch-making mobile apparel search system that can contribute to enhancing consumer convenience since it enables an effective search of type, price, distributor, and apparel image by a mobile photographing of the wearing state.