• Title/Summary/Keyword: Self- Supervised Learning

Search Result 100, Processing Time 0.025 seconds

Self-supervised Learning Method using Heterogeneous Mass Corpus for Sentence Embedding Model (이종의 말뭉치를 활용한 자기 지도 문장 임베딩 학습 방법)

  • Kim, Sung-Ju;Suh, Soo-Bin;Park, Jin-Seong;Park, Sung-Hyun;Jeon, Dong-Hyeon;Kim, Seon-Hoon;Kim, Kyung-Duk;Kang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.32-36
    • /
    • 2020
  • 문장의 의미를 잘 임베딩하는 문장 인코더를 만들기 위해 비지도 학습과 지도 학습 기반의 여러 방법이 연구되고 있다. 지도 학습 방식은 충분한 양의 정답을 구축하는데 어려움이 있다는 한계가 있다. 반면 지금까지의 비지도 학습은 단일 형식의 말뭉치에 한정해서 입력된 현재 문장의 다음 문장을 생성 또는 예측하는 형식으로 문제를 정의하였다. 본 논문에서는 위키피디아, 뉴스, 지식 백과 등 문서 형태의 말뭉치에 더해 지식인이나 검색 클릭 로그와 같은 구성이 다양한 이종의 대량 말뭉치를 활용하는 자기 지도 학습 방법을 제안한다. 각 형태의 말뭉치에 적합한 자기 지도 학습 문제를 설계하고 학습한 경우 KorSTS 데이셋의 비지도 모델 성능 평가에서 기준 모델 대비 7점 가량의 성능 향상이 있었다.

  • PDF

Self-Supervised Spatiotemporal Learning For Video Using Variable Rotate Angle And Speed Prediction (비디오에서의 다양한 회전 각도와 회전 속도를 사용한 시 공간 자기 지도학습)

  • Kim, Taehoon;Hwang, Wonjun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.07a
    • /
    • pp.732-735
    • /
    • 2020
  • 기존에 지도학습 방법은 성능은 좋지만, 학습할 때 비디오 데이터와 정답 라벨이 있어야 한다. 그러나 이러한 데이터의 라벨을 수동으로 붙여줘야 하는 문제점과 그에 필요한 시간과 돈이 크다는 것이다. 이러한 문제점을 해결하기 위한 다양한 방법 중 자기지도학습(Self-Supervised Learning) 중 하나인 회전 방법을 비디오 데이터에 적용하여 학습하는 연구를 진행하였다. 본 연구에서는 두가지 방법을 제안한다. 먼저 기존의 비디오 데이터를 입력으로 받으면 단순히 비디오 자체를 회전시키는 것이 아닌 입력으로 들어온 비디오의 각각 프레임이 시간이 지나면서 일정한 속도로 회전을 시킨다. 이때의 회전은 총 네 가지 각도[0, 90, 180, 270]를 분류하도록 하는 방법론이다. 두 번째로 비디오의 프레임이 시간이 지나면서 변할 때 프레임 별로 고정된 각도로 회전시키는데 이때 회전하는 속도 네 가지 [1x, 0.5x, 0.25x, 0.125]를 분류하도록 하는 방법론이다. 이와 같은 제안하는 pretext task들을 통해 네트워크를 학습한 뒤, 학습된 모델을 fine tune 시켜 비디오 분류에 대한 실험을 수행 및 결과를 도출하였다.

  • PDF

Traffic Attributes Correlation Mechanism based on Self-Organizing Maps for Real-Time Intrusion Detection (실시간 침입탐지를 위한 자기 조직화 지도(SOM)기반 트래픽 속성 상관관계 메커니즘)

  • Hwang, Kyoung-Ae;Oh, Ha-Young;Lim, Ji-Young;Chae, Ki-Joon;Nah, Jung-Chan
    • The KIPS Transactions:PartC
    • /
    • v.12C no.5 s.101
    • /
    • pp.649-658
    • /
    • 2005
  • Since the Network based attack Is extensive in the real state of damage, It is very important to detect intrusion quickly at the beginning. But the intrusion detection using supervised learning needs either the preprocessing enormous data or the manager's analysis. Also it has two difficulties to detect abnormal traffic that the manager's analysis might be incorrect and would miss the real time detection. In this paper, we propose a traffic attributes correlation analysis mechanism based on self-organizing maps(SOM) for the real-time intrusion detection. The proposed mechanism has three steps. First, with unsupervised learning build a map cluster composed of similar traffic. Second, label each map cluster to divide the map into normal traffic and abnormal traffic. In this step there is a rule which is created through the correlation analysis with SOM. At last, the mechanism would the process real-time detecting and updating gradually. During a lot of experiments the proposed mechanism has good performance in real-time intrusion to combine of unsupervised learning and supervised learning than that of supervised learning.

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.1
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

A Self-Supervised Detector Scheduler for Efficient Tracking-by-Detection Mechanism

  • Park, Dae-Hyeon;Lee, Seong-Ho;Bae, Seung-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.19-28
    • /
    • 2022
  • In this paper, we propose the Detector Scheduler which determines the best tracking-by-detection (TBD) mechanism to perform real-time high-accurate multi-object tracking (MOT). The Detector Scheduler determines whether to run a detector by measuring the dissimilarity of features between different frames. Furthermore, we propose a self-supervision method to learn the Detector Scheduler with tracking results since it is difficult to generate ground truth (GT) for learning the Detector Scheduler. Our proposed self-supervision method generates pseudo labels on whether to run a detector when the dissimilarity of the object cardinality or appearance between frames increases. To this end, we propose the Detector Scheduling Loss to learn the Detector Scheduler. As a result, our proposed method achieves real-time high-accurate multi-object tracking by boosting the overall tracking speed while keeping the tracking accuracy at most.

LVQ(Learning Vector Quantization)을 퍼지화한 학습 법칙을 사용한 퍼지 신경회로망 모델

  • Kim, Yong-Su
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.186-189
    • /
    • 2005
  • 본 논문에서는 LVQ를 퍼지화한 새로운 퍼지 학습 법칙들을 제안하였다. 퍼지 LVQ 학습법칙 1은 기존의 학습률 대신에 퍼지 학습률을 사용하였는데 이는 조건 확률의 퍼지화에 기반을 두고 있다. 퍼지 LVQ 학습법칙 2는 클래스들 사이에 존재하는 입력벡터가 결정 경계선에 대한 정보를 더 가지고 있는 것을 반영한 것이다. 이 새로운 퍼지 학습 법칙들을 improved IAFC(Integrted Adaptive Fuzzy Clustering)신경회로망에 적용하였다. improved IAFC신경회로망은 ART-1 (Adaptive Resonance Theory)신경회로망과 Kohonen의 Self-Organizing Feature Map의 장점을 취합한 퍼지 신경회로망이다. 제안한 supervised IAFC 신경회로망 1과 supervised IAFC neural 신경회로망 2의 성능을 오류 역전파 신경회로망의 성능과 비교하기 위하여 iris 데이터를 사용하였는데 Supervised IAFC neural network 2가 오류 역전파 신경회로망보다 성능이 우수함을 보여주었다.

  • PDF

Modeling of Self-Constructed Clustering and Performance Evaluation (자기-구성 클러스터링의 모델링 및 성능평가)

  • Ryu Jeong woong;Kim Sung Suk;Song Chang kyu;Kim Sung Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.6C
    • /
    • pp.490-496
    • /
    • 2005
  • In this paper, we propose a self-constructed clustering algorithm based on inference information of the fuzzy model. This method makes it possible to automatically detect and optimize the number of cluster and parameters by using input-output data. The propose method improves the performance of clustering by extended supervised learning technique. This technique uses the output information as well as input characteristics. For effect the similarity measure in clustering, we use the TSK fuzzy model to sent the information of output. In the conceptually, we design a learning method that use to feedback the information of output to the clustering since proposed algorithm perform to separate each classes in input data space. We show effectiveness of proposed method using simulation than previous ones

Synthetic Data Generation and Performance Analysis for Anomaly Detection (이상 탐지를 위한 합성 데이터 생성 및 성능 분석)

  • Hwang, Ju-hyo;Jin, Kyo-hong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.19-21
    • /
    • 2022
  • Anomaly detection using self-supervised learning typically generates synthetic data to learn to classify normal and abnormal, and uses real abnormal data as test data to measure anomaly detection performance. In a study using this method to generate synthetic data similar to normal data, anomaly detection was carried out by generating synthetic data by cutting and pasting a specific patch from the original image. In this way, the degree of similarity to normal data depends on the number and size of patches, which affects anomaly detection performance. In this paper, synthetic data were generated by varying patch sizes and numbers, and then similarity and analysis with normal data were conducted using a pre-trained model, and anomaly detection performance was measured by learning the model.

  • PDF

Abnormal Vibration Diagnostics Algorithm of Rotating Machinery Using Self-Organizing Feature Map nad Learing Vector Quantization (자기조직화특징지도와 학습벡터양자화를 이용한 회전기계의 이상진동진단 알고리듬)

  • 양보석;서상윤;임동수;이수종
    • Journal of KSNVE
    • /
    • v.10 no.2
    • /
    • pp.331-337
    • /
    • 2000
  • The necessity of diagnosis of the rotating machinery which is widely used in the industry is increasing. Many research has been conducted to manipulate field vibration signal data for diagnosing the fault of designated machinery. As the pattern recognition tool of that signal, neural network which use usually back-propagation algorithm was used in the diagnosis of rotating machinery. In this paper, self-organizing feature map(SOFM) which is unsupervised learning algorithm is used in the abnormal defect diagnosis of rotating machinery and then learning vector quantization(LVQ) which is supervised learning algorithm is used to improve the quality of the classifier decision regions.

  • PDF

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

  • William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.789-816
    • /
    • 2019
  • Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.