• 제목/요약/키워드: NER

검색결과 105건 처리시간 0.029초

트랜스포머를 이용한 중국어 NER 관련 문자와 단어 통합 임배딩 (Integrated Char-Word Embedding on Chinese NER using Transformer)

  • 김춘광;조인휘
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 춘계학술발표대회
    • /
    • pp.415-417
    • /
    • 2021
  • Since the words and words in Chinese sentences are continuous and the length of vocabulary is huge, Chinese NER(Named Entity Recognition) always based on character representation. In recent years, many Chinese research has been reconsidered how to integrate the word information into the Chinese NER model. However, the traditional sequence model has complex structure, the slow inference speed, and an additional dictionary information is needed, which is difficult to implement in the industry. The approach in this paper has the state of the art and parallelizable, which is integrated the char-word embeddings, so that the model learns word information. The proposed model is easy to implement, and outperforms traditional model in terms of speed and efficiency, which is improved f1-score on two dataset.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.

Encoding Dictionary Feature for Deep Learning-based Named Entity Recognition

  • Ronran, Chirawan;Unankard, Sayan;Lee, Seungwoo
    • International Journal of Contents
    • /
    • 제17권4호
    • /
    • pp.1-15
    • /
    • 2021
  • Named entity recognition (NER) is a crucial task for NLP, which aims to extract information from texts. To build NER systems, deep learning (DL) models are learned with dictionary features by mapping each word in the dataset to dictionary features and generating a unique index. However, this technique might generate noisy labels, which pose significant challenges for the NER task. In this paper, we proposed DL-dictionary features, and evaluated them on two datasets, including the OntoNotes 5.0 dataset and our new infectious disease outbreak dataset named GFID. We used (1) a Bidirectional Long Short-Term Memory (BiLSTM) character and (2) pre-trained embedding to concatenate with (3) our proposed features, named the Convolutional Neural Network (CNN), BiLSTM, and self-attention dictionaries, respectively. The combined features (1-3) were fed through BiLSTM - Conditional Random Field (CRF) to predict named entity classes as outputs. We compared these outputs with other predictions of the BiLSTM character, pre-trained embedding, and dictionary features from previous research, which used the exact matching and partial matching dictionary technique. The findings showed that the model employing our dictionary features outperformed other models that used existing dictionary features. We also computed the F1 score with the GFID dataset to apply this technique to extract medical or healthcare information.

Recognition of DNA Damage in Mammals

  • Lee, Suk-Hee
    • BMB Reports
    • /
    • 제34권6호
    • /
    • pp.489-495
    • /
    • 2001
  • DNA damage by UV and environmental agents are the major cause of genomic instability that needs to be repaired, otherwise it give rise to cancer. Accordingly, mammalian cells operate several DNA repair pathways that are not only responsible for identifying various types of DNA damage but also involved in removing DNA damage. In mammals, nucleotide excision repair (NER) machinery is responsible for most, if not all, of the bulky adducts caused by UV and chemical agents. Although most of the proteins involved in NER pathway have been identified, only recently have we begun to gain some insight into the mechanism by which proteins recognize damaged DNA. Binding of Xeroderma pigmentosum group C protein (XPC)-hHR23B complex to damaged DNA is the initial damage recognition step in NER, which leads to the recruitment of XPA and RPA to form a damage recognition complex. Formation of damage recognition complex not only stabilizes low affinity binding of XPA to the damaged DNA, but also induces structural distortion, both of which are likely necessary for the recruitment of TFIIH and two structure-specific endonucleases for dual incision.

  • PDF

효모에서 Hrq1과 Rad14의 상호작용에 대한 연구 (Characterization of Hrq1-Rad14 Interaction in Saccharomyces cerevisiae)

  • 민문희;김민지;최유진;유민주;김유라;안효빈;김채현;권채연;배성호
    • 미생물학회지
    • /
    • 제50권2호
    • /
    • pp.95-100
    • /
    • 2014
  • Hrq1은 곰팡이 유전체에서 생물정보분석에 의해 발견된 새로운 RecQ helicase이다. 이 단백질은 인간의 RECQL4와 가장 상동성이 높으며 최근의 유전학적 생화학적 연구를 통해서 유전체 안정성을 유지하는데 어떤 역할을 할 것으로 예상되었다. 본 연구에서는 RECQL4와 상호작용하는 것으로 알려진 인간 유전자들과 상동성이 있는 효모 유전자들이 Hrq1과 상호작용하는지를 yeast two-hybrid assay를 이용하여 조사하였다. 총 11개의 유전자를 조사한 결과, nucleotide excision repair (NER) 인자 중의 하나인 Rad14이 Hrq1과 상호작용하는 것을 발견하였다. 또한 정제한 단백질을 이용한 pull-down assay로 Hrq1과 Rad14 사이의 직접적인 상호작용을 확인하였다. Hrq1과 Rad14 사이의 yeast two-hybrid 상호작용은 4-nitroquinoline-1-oxide에 의한 DNA 손상으로 더욱 증가하였으며, 이러한 상호작용의 증가는 또 다른 NER 인자인 Rad4에 의존적이었다. 이러한 결과들은 Hrq1이 Rad14과의 상호작용을 통하여 NER 과정에 어떤 역할을 할 가능성을 제시하고 있다.

Binding Pattern Elucidation of NNK and NNAL Cigarette Smoke Carcinogens with NER Pathway Enzymes: an Onco-Informatics Study

  • Jamal, Qazi Mohammad Sajid;Dhasmana, Anupam;Lohani, Mohtashim;Firdaus, Sumbul;Ansari, Md Yousuf;Sahoo, Ganesh Chandra;Haque, Shafiul
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권13호
    • /
    • pp.5311-5317
    • /
    • 2015
  • Cigarette smoke derivatives like NNK (4-(Methylnitrosamino)-1-(3-pyridyl)-1-butanone) and NNAL (4-(methylnitrosamino)-1-(3-pyridyl)-1-butan-1-ol) are well-known carcinogens. We analyzed the interaction of enzymes involved in the NER (nucleotide excision repair) pathway with ligands (NNK and NNAL). Binding was characterized for the enzymes sharing equivalent or better interaction as compared to +Ve control. The highest obtained docking energy between NNK and enzymes RAD23A, CCNH, CDK7, and CETN2 were -7.13 kcal/mol, -7.27 kcal/mol, -8.05 kcal/mol and -7.58 kcal/mol respectively. Similarly the highest obtained docking energy between NNAL and enzymes RAD23A, CCNH, CDK7, and CETN2 were -7.46 kcal/mol, -7.94 kcal/mol, -7.83 kcal/mol and -7.67 kcal/mol respectively. In order to find out the effect of NNK and NNAL on enzymes involved in the NER pathway applying protein-protein interaction and protein-complex (i.e. enzymes docked with NNK/NNAL) interaction analysis. It was found that carcinogens are well capable to reduce the normal functioning of genes like RAD23A (HR23A), CCNH, CDK7 and CETN2. In silico analysis indicated loss of functions of these genes and their corresponding enzymes, which possibly might be a cause for alteration of DNA repair pathways leading to damage buildup and finally contributing to cancer formation.

다중작업학습 기법을 적용한 Bi-LSTM 개체명 인식 시스템 성능 비교 분석 (Performance Comparison Analysis on Named Entity Recognition system with Bi-LSTM based Multi-task Learning)

  • 김경민;한승규;오동석;임희석
    • 디지털융복합연구
    • /
    • 제17권12호
    • /
    • pp.243-248
    • /
    • 2019
  • 다중작업학습(Multi-Task Learning, MTL) 기법은 하나의 신경망을 통해 다양한 작업을 동시에 수행하고 각 작업 간에 상호적으로 영향을 미치면서 학습하는 방식을 말한다. 본 연구에서는 전통문화 말뭉치를 직접 구축 및 학습데이터로 활용하여 다중작업학습 기법을 적용한 개체명 인식 모델에 대해 성능 비교 분석을 진행한다. 학습 과정에서 각각의 품사 태깅(Part-of-Speech tagging, POS-tagging) 과 개체명 인식(Named Entity Recognition, NER) 학습 파라미터에 대해 Bi-LSTM 계층을 통과시킨 후 각각의 Bi-LSTM을 계층을 통해 최종적으로 두 loss의 joint loss를 구한다. 결과적으로, Bi-LSTM 모델을 활용하여 단일 Bi-LSTM 모델보다 MTL 기법을 적용한 모델에서 1.1%~4.6%의 성능 향상이 있음을 보인다.

Deep recurrent neural networks with word embeddings for Urdu named entity recognition

  • Khan, Wahab;Daud, Ali;Alotaibi, Fahd;Aljohani, Naif;Arafat, Sachi
    • ETRI Journal
    • /
    • 제42권1호
    • /
    • pp.90-100
    • /
    • 2020
  • Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state-of-the-art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short-term memory and back propagation through time approaches. The proposed models consider both language-dependent features, such as part-of-speech tags, and language-independent features, such as the "context windows" of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f-measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.

Growth of abalone (Haliotis discus hannai) in cages using epibiont control measures

  • Han, Jido;Jeon, Mi Ae;Kim, Da Woon;Park, Hon;Kim, Byong Hak;Lee, Deok Chan
    • Fisheries and Aquatic Sciences
    • /
    • 제24권12호
    • /
    • pp.400-405
    • /
    • 2021
  • In this study, the relationship between the growth of abalone and the presence of epibionts was investigated in abalone cultured in Goheung, Jeollanam-do, where there are severe problems high water temperatures and attachment. The experiment was conducted for eight months (May-December 2020), and 40 abalone were collected every month. Water temperature was at its highest at a range of 13.5℃-26.6℃ and dissolved oxygen levels were at their lowest at a range of 4.0-10.2 ㎍/L in August. The shell height (mm) of abalone grew to 117.7% (81.8 ± 1.9 mm) in cultures where epibionts were removed (ER) and 111% (77.4 ± 3.3 mm) where they were not (non-epibionts, NER). Their total weight (TW) and body weight increased significantly and steadily with ER, whereas the TW increased sharply after August with NER. In the condition index, no significant difference was observed between ER and NER. The monthly proportion of epibionts increased significantly in July, accounting for the value of 69.9% reached in December.

Bi-directional LSTM-CNN-CRF를 이용한 한국어 개체명 인식 시스템 (Korean Entity Recognition System using Bi-directional LSTM-CNN-CRF)

  • 이동엽;임희석
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2017년도 제29회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.327-329
    • /
    • 2017
  • 개체명 인식(Named Entity Recognition) 시스템은 문서에서 인명(PS), 지명(LC), 단체명(OG)과 같은 개체명을 가지는 단어나 어구를 해당 개체명으로 인식하는 시스템이다. 개체명 인식 시스템을 개발하기 위해 딥러닝 기반의 워드 임베딩(word embedding) 자질과 문장의 형태적 특징 및 기구축 사전(lexicon) 기반의 자질 구성 방법을 제안하고, bi-directional LSTM, CNN, CRF과 같은 모델을 이용하여 구성된 자질을 학습하는 방법을 제안한다. 실험 데이터는 2017 국어 정보시스템 경진대회에서 제공한 2016klpNER 데이터를 이용하였다. 실험은 전체 4258 문장 중 학습 데이터 3406 문장, 검증 데이터 426 문장, 테스트 데이터 426 문장으로 데이터를 나누어 실험을 진행하였다. 실험 결과 본 연구에서 제안하는 모델은 BIO 태깅 방식의 개체 청크 단위 성능 평가 결과 98.9%의 테스트 정확도(test accuracy)와 89.4%의 f1-score를 나타냈다.

  • PDF