• Title/Summary/Keyword: Noisy Labels

Search Result 14, Processing Time 0.028 seconds

Encoding Dictionary Feature for Deep Learning-based Named Entity Recognition

  • Ronran, Chirawan;Unankard, Sayan;Lee, Seungwoo
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.1-15
    • /
    • 2021
  • Named entity recognition (NER) is a crucial task for NLP, which aims to extract information from texts. To build NER systems, deep learning (DL) models are learned with dictionary features by mapping each word in the dataset to dictionary features and generating a unique index. However, this technique might generate noisy labels, which pose significant challenges for the NER task. In this paper, we proposed DL-dictionary features, and evaluated them on two datasets, including the OntoNotes 5.0 dataset and our new infectious disease outbreak dataset named GFID. We used (1) a Bidirectional Long Short-Term Memory (BiLSTM) character and (2) pre-trained embedding to concatenate with (3) our proposed features, named the Convolutional Neural Network (CNN), BiLSTM, and self-attention dictionaries, respectively. The combined features (1-3) were fed through BiLSTM - Conditional Random Field (CRF) to predict named entity classes as outputs. We compared these outputs with other predictions of the BiLSTM character, pre-trained embedding, and dictionary features from previous research, which used the exact matching and partial matching dictionary technique. The findings showed that the model employing our dictionary features outperformed other models that used existing dictionary features. We also computed the F1 score with the GFID dataset to apply this technique to extract medical or healthcare information.

A Multiresolution Image Segmentation Method using Stabilized Inverse Diffusion Equation (안정화된 역 확산 방정식을 사용한 다중해상도 영상 분할 기법)

  • Lee Woong-Hee;Kim Tae-Hee;Jeong Dong-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.1
    • /
    • pp.38-46
    • /
    • 2004
  • Image segmentation is the task which partitions the image into meaningful regions and considered to be one of the most important steps in computer vision and image processing. Image segmentation is also widely used in object-based video compression such as MPEG-4 to extract out the object regions from the given frame. Watershed algorithm is frequently used to obtain the more accurate region boundaries. But, it is well known that the watershed algorithm is extremely sensitive to gradient noise and usually results in oversegmentation. To solve such a problem, we propose an image segmentation method which is robust to noise by using stabilized inverse diffusion equation (SIDE) and is more efficient in segmentation by employing multiresolution approach. In this paper, we apply both the region projection method using labels of adjacent regions and the region merging method based on region adjacency graph (RAG). Experimental results on noisy image show that the oversegmenation is reduced and segmentation efficiency is increased.

Supervised Rank Normalization with Training Sample Selection (학습 샘플 선택을 이용한 교사 랭크 정규화)

  • Heo, Gyeongyong;Choi, Hun;Youn, Joo-Sang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.1
    • /
    • pp.21-28
    • /
    • 2015
  • Feature normalization as a pre-processing step has been widely used to reduce the effect of different scale in each feature dimension and error rate in classification. Most of the existing normalization methods, however, do not use the class labels of data points and, as a result, do not guarantee the optimality of normalization in classification aspect. A supervised rank normalization method, combination of rank normalization and supervised learning technique, was proposed and demonstrated better result than others. In this paper, another technique, training sample selection, is introduced in supervised feature normalization to reduce classification error more. Training sample selection is a common technique for increasing classification accuracy by removing noisy samples and can be applied in supervised normalization method. Two sample selection measures based on the classes of neighboring samples and the distance to neighboring samples were proposed and both of them showed better results than previous supervised rank normalization method.

Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label (약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘)

  • Park, Chungho;Kim, Donghyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.414-423
    • /
    • 2020
  • In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).