• 제목/요약/키워드: Data annotation

검색결과 259건 처리시간 0.026초

세종계획 언어자원 기반 한국어 명사은행 (Korean Nominal Bank, Using Language Resources of Sejong Project)

  • 김동성
    • 한국언어정보학회지:언어와정보
    • /
    • 제17권2호
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF

Functional annotation of lung cancer-associated genetic variants by cell type-specific epigenome and long-range chromatin interactome

  • Lee, Andrew J.;Jung, Inkyung
    • Genomics & Informatics
    • /
    • 제19권1호
    • /
    • pp.3.1-3.12
    • /
    • 2021
  • Functional interpretation of noncoding genetic variants associated with complex human diseases and traits remains a challenge. In an effort to enhance our understanding of common germline variants associated with lung cancer, we categorize regulatory elements based on eight major cell types of human lung tissue. Our results show that 21.68% of lung cancer-associated risk variants are linked to noncoding regulatory elements, nearly half of which are cell type-specific. Integrative analysis of high-resolution long-range chromatin interactome maps and single-cell RNA-sequencing data of lung tumors uncovers number of putative target genes of these variants and functionally relevant cell types, which display a potential biological link to cancer susceptibility. The present study greatly expands the scope of functional annotation of lung cancer-associated genetic risk factors and dictates probable cell types involved in lung carcinogenesis.

Survey of Temporal Information Extraction

  • Lim, Chae-Gyun;Jeong, Young-Seob;Choi, Ho-Jin
    • Journal of Information Processing Systems
    • /
    • 제15권4호
    • /
    • pp.931-956
    • /
    • 2019
  • Documents contain information that can be used for various applications, such as question answering (QA) system, information retrieval (IR) system, and recommendation system. To use the information, it is necessary to develop a method of extracting such information from the documents written in a form of natural language. There are several kinds of the information (e.g., temporal information, spatial information, semantic role information), where different kinds of information will be extracted with different methods. In this paper, the existing studies about the methods of extracting the temporal information are reported and several related issues are discussed. The issues are about the task boundary of the temporal information extraction, the history of the annotation languages and shared tasks, the research issues, the applications using the temporal information, and evaluation metrics. Although the history of the tasks of temporal information extraction is not long, there have been many studies that tried various methods. This paper gives which approach is known to be the better way of extracting a particular part of the temporal information, and also provides a future research direction.

Transcriptome analysis of internal and external stress mechanisms in Aster spathulifolius Maxim.

  • Sivagami, Jean Claude;Park, SeonJoo
    • 한국자원식물학회:학술대회논문집
    • /
    • 한국자원식물학회 2019년도 춘계학술대회
    • /
    • pp.35-35
    • /
    • 2019
  • Aster spathulifolius Maxim. is belongs to the Asteraceae family which is distributed only in Korea and Japan. It is recognize as a traditionally medicinal plants and economically valuable in ornamental field. However, among the Asteraceae family, the Aster genus, which is lacks in genomic resources and information of molecular function. Therefore, we used high throughput RNA-sequencing transcriptome data of the A. spathulifolius to know molecular level function. DeNovo assembly produced 98,660 unigene with N50 value 1126 bp. Unigenes was performed to analyses the functional annotation against NCBI database like plant database of nucleotide (Nt) and non-redundant protein (Nr), Pfam, Uniprot, KEGG and Transcriptional factor (TF). In addition, Distribution of SSR markers also analyzed for future perfectives. Further, Comparing with other two Asteraceae family species like, Karelinia caspica and Chrysanthemum morifolium to the A. spathulifolius shows the number of gene that regulated in internal and external stress respectively salt-tolerant and heat and drought stress to understand the molecular basis related to the different environments stress.

  • PDF

Visualizing the phenotype diversity: a case study of Alexander disease

  • Dohi, Eisuke;Bangash, Ali Haider
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.28.1-28.4
    • /
    • 2021
  • Since only a small number of patients have a rare disease, it is difficult to identify all of the features of these diseases. This is especially true for patients uncommonly presenting with rare diseases. It can also be difficult for the patient, their families, and even clinicians to know which one of a number of disease phenotypes the patient is exhibiting. To address this issue, during Biomedical Linked Annotation Hackathon 7 (BLAH7), we tried to extract Alexander disease patient data in Portable Document Format. We then visualized the phenotypic diversity of those Alexander disease patients with uncommon presentations. This led to us identifying several issues that we need to overcome in our future work.

음성 DB의 메타데이타 표준화 (Meta-data Standardization of Speech Database)

  • 김상훈
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.61-64
    • /
    • 2003
  • In this paper, we introduce a new description method of annotation information of speech database. As one of structured description methods, XML based description which has been standardized by W3C will be applied to represent metadata of speech database. It will be continuously revised through the speech technology standard forum during this year

  • PDF

Lessons from Developing an Annotated Corpus of Patient Histories

  • Rost, Thomas Brox;Huseth, Ola;Nytro, Oystein;Grimsmo, Anders
    • Journal of Computing Science and Engineering
    • /
    • 제2권2호
    • /
    • pp.162-179
    • /
    • 2008
  • We have developed a tool for annotation of electronic health record (EHR) data. Currently we are in the process of manually annotating a corpus of Norwegian general practitioners' EHRs with mainly linguistic information. The purpose of this project is to attain a linguistically annotated corpus of patient histories from general practice. This corpus will be put to future use in medical language processing and information extraction applications. The paper outlines some of our practical experiences from developing such a corpus and, in particular, the effects of semi-automated annotation. We have also done some preliminary experiments with part-of-speech tagging based on our corpus. The results indicated that relevant training data from the clinical domain gives better results for the tagging task in this domain than training the tagger on a corpus form a more general domain. We are planning to expand the corpus annotations with medical information at a later stage.

OrCanome: a Comprehensive Resource for Oral Cancer

  • Bhartiya, Deeksha;Kumar, Amit;Singh, Harpreet;Sharma, Amitesh;Kaushik, Anita;Kumari, Suchitra;Mehrotra, Ravi
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제17권3호
    • /
    • pp.1333-1336
    • /
    • 2016
  • Oral cancer is one of the most prevalent cancers in India but the underlying mechanisms are minimally unraveled. Cancer research has immensely benefited from genome scale high throughput studies which have contributed to expanding the volume of data. Such datasets also exist for oral cancer genes but there has been no consolidated approach to integrate the data to reveal meaningful biological information. OrCanome is one of the largest and comprehensive, user-friendly databases of oral cancer. It features a compilation of over 900 genes dysregulated in oral cancer and provides detailed annotations of the genes, transcripts and proteins along with additional information encompassing expression, inhibitors, epitopes and pathways. The resource has been envisioned as a one-stop solution for genomic, transcriptomic and proteomic annotation of these genes and the integrated approach will facilitate the identification of potential biomarkers and therapeutic targets.

NA-Seq를 이용한 제주산 메밀의 발아초기 전사체 프로파일 분석 (Transcriptomic Profile Analysis of Jeju Buckwheat using RNA-Seq Data)

  • 한송이;정성진;오대주;정용환;김찬식;김재훈
    • 한국산학기술학회논문지
    • /
    • 제19권1호
    • /
    • pp.537-545
    • /
    • 2018
  • 본 연구에서는 메밀의 발아초기에 발현되는 전사체의 다양한 정보 수집을 위해 양절메밀과 대관 3-3호의 RNA를 추출하여 전사체 분석을 수행하였다. 제주산 양절메밀과 대관3-3호의 종자 및 발아 후 12, 24, 36시간별로 total RNA를 추출하고, llumina Hiseq 2000 플랫폼을 사용하여 시퀀싱 하였다. SolexaQA package의 DynamicTrim과 LengthsORT 프로그램으로 이용하여 raw 데이터 분석을 실시한 후, 어셈블리(assembly)와 annotation을 수행하였다. RNA-seq raw 데이터로부터 약 84.2%, 81.5%에 해당하는 16.5Gb, 16.2Gb의 transcriptome 데이터를 확보하였다. 47Mb에 해당하는 43,494개의 대표적인 전사체(representative transcripts)를 확보하였고, 그 중에서 annotation DB와 서열 유사도를 갖는 서열은 23,165개로 확인되었다. 메밀의 representative transcripts 유전자의 유전자 온톨로지(gene ontology) 분석결과, biological process는 metabolic process (49.49%)에서, cellular components는 cell (46.12%)에서, molecular function은 catalyltic activity (80.43%)에서 유전자가 많이 분포되어 있는 것을 확인하였다. 종자의 발아에 관련된 gibberellin receptor GID1C의 경우에는 양절메밀, 대관 3-3호의 발현양이 모두 시간이 지남에 따라 증가되는 것을 확인할 수 있었으며, gibberellin 20-oxidase1의 경우에는 양절메밀에서는 발아 후 12 시간이내에 증가되었으나, 대관 3-3호에서는 36시간까지 유전자 발현양 증가하는 것을 확인할 수 있었다. 이러한 제주산 메밀의 발아초기 단계별 전사체 분석 데이터는 종간의 기능적, 형태학적 차이를 일으키는 메커니즘 규명에 도움을 줄 것으로 사료된다.

온톨로지 기반의 모션 캡처 데이터베이스 설계 및 구현 (Design & Implementation of a Motion Capture Database Based on Motion Ontologies)

  • 정현숙
    • 한국멀티미디어학회논문지
    • /
    • 제8권5호
    • /
    • pp.618-632
    • /
    • 2005
  • 본 논문에서는 모션 캡처 데이터의 효과적인 저장 및 의미 기반 검색을 위한 프레임워크를 제안한다. 모션 캡처 기술은 현실감있는 캐릭터 동작을 얻기 위해 많이 사용되고 있지만, 모션 캡처 데이터의 검색과 저장을 위한 표준화의 부족으로 인한 여러 가지 문제점을 가지고 있다. 또한 미리 캡처된 모션 데이터에 의미론적 부가설명이 없으므로 애니메이터들이 캡처된 모션 데이터로부터 필요한 부분 동작들만을 검색하고 조합하여 새로운 동작을 생성하기가 어렵다. 본 논문의 목적은 모션 캡처 데이터의 재사용성을 향상시키기 위한 것이다 먼저 상이한 모션 캡처 데이터 포맷들을 통합하기 위한 표준 포맷을 제안한다. 제안하는 표준 포맷은 XML 기반의 마크업 언어로서 MCML(Motion Capture Markup Language)라고 한다. 제안하는 MCML은 서로 상이한 포맷들의 변환 또는 통합을 하기 위해 유용할 뿐만 아니라, MCML파일로 모션 데이터 베이스를 구축하므로 모션 캡처 데이터의 재사용성을 향상시킬 수 있다. 또한 모션 캡처 데이터의 부분 동작들에 의미를 부여하고 동작들 사이에 의미론적 연결을 위해 모션 온톨로지를 정의한다. 온톨로지 기반 데이터 접근으로 인해 부분 동작 및 그와 연관된 동작들을 검색하고 항해할 수 있다.

  • PDF