• 제목/요약/키워드: NER

검색결과 105건 처리시간 0.035초

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae;Joon-Ho Lim
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.59-70
    • /
    • 2024
  • We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

자연어 처리의 개체명 인식을 통한 기록집합체의 메타데이터 추출 방안 (A method for metadata extraction from a collection of records using Named Entity Recognition in Natural Language Processing)

  • 송치호
    • 한국기록관리학회지
    • /
    • 제24권2호
    • /
    • pp.65-88
    • /
    • 2024
  • 본 연구는 인공지능의 하위분야인 자연어 처리(NLP)의 개체명 인식(NER)을 통하여 기록에 내재된 메타데이터 값과 기술 정보를 추출하는 방안에 대한 시험적 연구이다. 연구 대상은 1960~1970년대에 생산된 구로공단 수기 기록물(약 1,200 쪽, 8만여 단어)을 대상으로 하였다. 디지털화를 포함하는 전처리 과정과 함께 기록 텍스트에 대해서 구글의 BERT 언어모델에 기반하여 구현되어 공개된 언어 API를 사용하여 개체명을 인식하였다. 그 결과로 구로공단의 과거 기록에 포함된 173개의 인명과 314개의 조직 및 기관 개체명을 추출할 수 있었고, 이는 기록의 내용에 대한 직접적인 검색어로 사용될 수 있다고 기대된다. 그리고 자연어 처리의 이론적 방법론을 반·비정형의 텍스트로 이루어진 실제 기록물에 적용할 때 발생하는 문제점을 파악하여 해결 방안과 고려해야 할 시사점을 제시했다.

표면요철 매입형 FRP봉과 CFRP시트를 복합 보강한 철근콘크리트 보의 구조성능 평가 (Evaluation of Structural Performance of Reinforced Concrete Beams using Hybrid Retrofitting with Groove and Embedding FRP Rod and CFRP Sheet)

  • 하기주;하영주
    • 한국구조물진단유지관리공학회 논문집
    • /
    • 제18권4호
    • /
    • pp.41-49
    • /
    • 2014
  • 본 연구에서는 기존 철근콘크리트 건축물의 구조성능 개선을 위하여 표면요철 매입형 FRP봉과 CFRP시트를 사용한 철근콘크리트 보의 구조성능을 평가하기 위하여 실험을 수행하였다. 표면요철 매입형 FRP봉의 사용량, CFRP시트 보강 유무에 따라 총 7개의 실험체를 제작하고 실험을 수행하여 구조성능을 평가하였으며, 본 연구의 실험결과를 근거로 다음과 같은 결론을 얻었다. 표면요철 매입형 FRP봉 보강 실험체 NER 시리즈의 경우, 표준실험체 NBS와 비교하여 12~46% 내력이 증가하였고, 표면요철 매입형 FRP봉과 CFRP시트를 복합 보강한 실험체 NERL 시리즈는 표준실험체 NBS보다 최대내력이 22~77% 증가하였다. 그리고 표면요철 매입형 FRP봉으로 보강된 실험체 NER 시리즈는 부착슬립, 피복분리 형태로 파괴되었으나, 표면요철 매입형 FRP봉과 CFRP시트을 복합 보강한 실험체 NERL 시리즈는 CFRP시트의 연속보강에 따른 콘크리트 구속효과 및 모재와 표면요철 매입형 FRP봉의 부착강도 증가로 인하여 부착슬립의 형태로 파괴되었다.

Ion beam etching of sub-30nm scale Magnetic Tunnel Junction for minimizing sidewall leakage path

  • Kim, Dae-Hong;Kim, Bong-Ho;Chun, Sung-Woo;Kwon, Ji-Hun;Choi, Seon-Jun;Lee, Seung-Beck
    • 한국자기학회:학술대회 개요집
    • /
    • 한국자기학회 2011년도 자성 및 자성재료 국제학술대회
    • /
    • pp.29-30
    • /
    • 2011
  • We have demonstrated the fabrication of sub 30 nm MTJ pillars with PMA characteristics. The multi-step IBE process performed at $45^{\circ}$ and $30^{\circ}$, using NER resulted in almost vertical side profiles. There deposition on the sidewalls of the NER prevented lateral etching of the resist hard mask allowing vertical MTJ side profile formation without any reduction in the lithographically defined resist lateral dimensions. For the 28nm STT-MTJ pillars, the measured TMR ratio was 13 % with resistance of 1 $k{\Omega}$, which was due to remaining redeposition layers less than 0.1 nm thick. With further optimization in multi-step IBE conditions, it will be possible to fabricate fully operating sub 30 nm perpendicular STT-MTJ structures for application to future non-volatile memories.

  • PDF

재래식 간장 및 된장 제조가 대두단백질의 영양가에 미치는 영향 -제3보 Lysine 가용도의 변화- (The Effect of Korean Soysauce and Soypaste Making on Soybean Proteion Quality -Part III. Changes in the Lysine Availability-)

  • 이철호
    • 한국식품과학회지
    • /
    • 제8권2호
    • /
    • pp.63-69
    • /
    • 1976
  • 재래식 간장 및 된장 제조중에 일어나는 대두단백질 중의 lysine 가용도의 변화에 관하여 화학적 방법 및 생물학적 방법을 이용하여 측정하였으며 서로 다른 측정 방법에 의한 결과의 차이에 대하여 고찰 하였다. TLMI법에 의한 FDNB-reactive lysine의 동향을 보면 대두의 삶음과 메주 발효과정중 lysine의 가용도는 저하하나, 8개월간의 메주 장숙성과정에서 그 가용도가 다시 증가되어 원료대두 단백질중의 lysine가용도와 거의 같은 수준으로 되었다. 한편 백쥐의 사양시험에 의한 생물가 (BV), NPU, NER 및 상대적 lysine 가용율등에 의하면 메주 제조과정에서 뿐만 아니라 숙성과정중에도 lysine 가용도는 계속 저하되는 것으로 나타났다.

  • PDF

A Novel UV-Sensitivity Mutation Induces Nucleotide Excision Repair Phenotype and Shows Epistatic Relationships with UvsF and UvsB Groups in Aspergillus nidulans

  • Baptista, F.;Castro-Prado, M.A.A.
    • Journal of Microbiology
    • /
    • 제39권2호
    • /
    • pp.102-108
    • /
    • 2001
  • DNA damage response has a central role in the maintenance of genomic integrity while mutations in related genes may result in a range of disorders including neoplasic formations. The uvsZl characterized in this report is a navel uvs mutation in Aspergillus nidulans, resulting in a nucleotide excision repair (NER) phenotype: UV-sensitivity before DNA synthesis (quiescent cells), high UV-induced mutation frequency and probable absence of involvement with mitotic and meiotic recombinations. The mutation is recessive and nan-allelic to the previously characterized uvsA101 mutation, also located on the paba-y interval on chromosome I. uvsZl skewed wild-type sensitivity to MMS, which suggests non-involvement of this mutation with BER. Epitasis tests showed that the uvsZ gene product is probably involved in the same repair pathways as UVSB or UVSH proteins. Although mutations in these proteins result in an NER phenotype, UVSB is related with cell cycle control and UVSH is associated with the post-replicational repair pathway. The epistatic interaction among uvsZl and uvsB413 and uvsH77 mutations indicates that different repair systems may be related with the common steps of DNA damage response in Aspergillus nidulans.

  • PDF

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Liu, Yusha;Yao, Xinzhi;Xia, Jingbo
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.27.1-27.4
    • /
    • 2021
  • Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pretrained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

  • Lithgow-Serrano, Oscar;Cornelius, Joseph;Kanjirangat, Vani;Mendez-Cruz, Carlos-Francisco;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.22.1-22.5
    • /
    • 2021
  • Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.

Development of a Tourism Information QA Service for the Task-oriented Chatbot Service

  • Hoon-chul Kang;Myeong-Gyun Kang;Jeong-Woo Jwa
    • International Journal of Advanced Culture Technology
    • /
    • 제12권3호
    • /
    • pp.73-79
    • /
    • 2024
  • The smart tourism chatbot service provide smart tourism services to users easily and conveniently along with the smart tourism app. In this paper, the tourism information QA (Question Answering) service is proposed based on the task-oriented smart tourism chatbot system [13]. The tourism information QA service is an MRC (Machine reading comprehension)-based QA system that finds answers in context and provides them to users. The tourism information QA system consists of NER (Named Entity Recognition), DST (Dialogue State Tracking), Neo4J graph DB, and QA servers. We propose tourism information QA service uses the tourism information NER model and DST model to identify the intent of the user's question and retrieves appropriate context for the answer from the Neo4J tourism knowledgebase. The QA model finds answers from the context and provides them to users through the smart tourism app. We develop the tourism information QA model by transfer learning the bigBird model, which can process the context of 4,096 tokens, using the tourism information QA dataset.