• Title/Summary/Keyword: 학습 데이터

Search Result 6,405, Processing Time 0.031 seconds

A Study on Collecting and Structuring Language Resource for Named Entity Recognition and Relation Extraction from Biomedical Abstracts (생의학 분야 학술 논문에서의 개체명 인식 및 관계 추출을 위한 언어 자원 수집 및 통합적 구조화 방안 연구)

  • Kang, Seul-Ki;Choi, Yun-Soo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.227-248
    • /
    • 2017
  • This paper introduces an integrated model for systematically constructing a linguistic resource database that can be used by machine learning-based biomedical information extraction systems. The proposed method suggests an orderly process of collecting and constructing dictionaries and training sets for both named-entity recognition and relation extraction. Multiple heterogeneous structures for the resources which are collected from diverse sources are analyzed to derive essential items and fields for constructing the integrated database. All the collected resources are converted and refined to build an integrated linguistic resource storage. In this paper, we constructed entity dictionaries of gene, protein, disease and drug, which are considered core linguistic elements or core named entities in the biomedical domains and conducted verification tests to measure their acceptability.

An Implementation of Federated Learning based on Blockchain (블록체인 기반의 연합학습 구현)

  • Park, June Beom;Park, Jong Sou
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.89-96
    • /
    • 2020
  • Deep learning using an artificial neural network has been recently researched and developed in various fields such as image recognition, big data and data analysis. However, federated learning has emerged to solve issues of data privacy invasion and problems that increase the cost and time required to learn. Federated learning presented learning techniques that would bring the benefits of distributed processing system while solving the problems of existing deep learning, but there were still problems with server-client system and motivations for providing learning data. So, we replaced the role of the server with a blockchain system in federated learning, and conducted research to solve the privacy and security problems that are associated with federated learning. In addition, we have implemented a blockchain-based system that motivates users by paying compensation for data provided by users, and requires less maintenance costs while maintaining the same accuracy as existing learning. In this paper, we present the experimental results to show the validity of the blockchain-based system, and compare the results of the existing federated learning with the blockchain-based federated learning. In addition, as a future study, we ended the thesis by presenting solutions to security problems and applicable business fields.

Effect of Application of Ensemble Method on Machine Learning with Insufficient Training Set in Developing Automated English Essay Scoring System (영작문 자동채점 시스템 개발에서 학습데이터 부족 문제 해결을 위한 앙상블 기법 적용의 효과)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • Journal of KIISE
    • /
    • v.42 no.9
    • /
    • pp.1124-1132
    • /
    • 2015
  • In order to train a supervised machine learning algorithm, it is necessary to have non-biased labels and a sufficient amount of training data. However, it is difficult to collect the required non-biased labels and a sufficient amount of training data to develop an automatic English Composition scoring system. In addition, an English writing assessment is carried out using a multi-faceted evaluation of the overall level of the answer. Therefore, it is difficult to choose an appropriate machine learning algorithm for such work. In this paper, we show that it is possible to alleviate these problems through ensemble learning. The results of the experiment indicate that the ensemble technique exhibited an overall performance that was better than that of other algorithms.

MBTI-Based Learning Types Design Using Machine Learning (머신러닝을 활용한 MBTI 기반 학습유형설계)

  • Oh, Sumin;Sohn, Seoyoung;Yang, Hyeseong;Park, Minseo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.207-213
    • /
    • 2022
  • MBTI(Myer Briggs Type Indicator) is an effective personality type test to intuitively identify and classify people's tendencies. Accordingly, there are active attempts to apply MBTI to the learning area, but research on creating new learning types using MBTI is insufficient. Therefore, this paper examines the factors that affect learning and implements new learning types MY,STI(MY, Study Type Indicator) by applying them to a machine learning algorithm that has these characteristics. Data were collected by conducting a learning type test made with Google Forms on 144 general people, and supervised learning was used during machine learning. As a result, the accuracies of MY,STI were 0.933, 0.866, 0.844, and 0.733 for each learning method, learning motivation, presence or absence of external stimulus, and learning time criteria, respectively.

Federated Learning Privacy Invasion Study in Batch Situation Using Gradient-Based Restoration Attack (그래디언트 기반 재복원공격을 활용한 배치상황에서의 연합학습 프라이버시 침해연구)

  • Jang, Jinhyeok;Ryu, Gwonsang;Choi, Daeseon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.5
    • /
    • pp.987-999
    • /
    • 2021
  • Recently, Federated learning has become an issue due to privacy invasion caused by data. Federated learning is safe from privacy violations because it does not need to be collected into a server and does not require learning data. As a result, studies on application methods for utilizing distributed devices and data are underway. However, Federated learning is no longer safe as research on the reconstruction attack to restore learning data from gradients transmitted in the Federated learning process progresses. This paper is to verify numerically and visually how well data reconstruction attacks work in various data situations. Considering that the attacker does not know how the data is constructed, divide the data with the class from when only one data exists to when multiple data are distributed within the class, and use MNIST data as an evaluation index that is MSE, LOSS, PSNR, and SSIM. The fact is that the more classes and data, the higher MSE, LOSS, and PSNR and SSIM are, the lower the reconstruction performance, but sufficient privacy invasion is possible with several reconstructed images.

A Study on Designing Metadata Standard for Building AI Training Dataset of Landmark Images (랜드마크 이미지 AI 학습용 데이터 구축을 위한 메타데이터 표준 설계 방안 연구)

  • Kim, Jinmook
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.54 no.2
    • /
    • pp.419-434
    • /
    • 2020
  • The purpose of the study is to design and propose metadata standard for building AI training dataset of landmark images. In order to achieve the purpose, we first examined and analyzed the state of art of the types of image retrieval systems and their indexing methods, comprehensively. We then investigated open training dataset and machine learning tools for image object recognition. Sequentially, we selected metadata elements optimized for the AI training dataset of landmark images and defined the input data for each element. We then concluded the study with implications and suggestions for the development of application services using the results of the study.

Numerical Reasoning Dataset Augmentation Using Large Language Model and In-Context Learning (대규모 언어 모델 및 인컨텍스트 러닝을 활용한 수치 추론 데이터셋 증강)

  • Yechan Hwang;Jinsu Lim;Young-Jun Lee;Ho-Jin Choi
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.203-208
    • /
    • 2023
  • 본 논문에서는 대규모 언어 모델의 인컨텍스트 러닝과 프롬프팅을 활용하여 수치 추론 태스크 데이터셋을 효과적으로 증강시킬 수 있는 방법론을 제안한다. 또한 모델로 하여금 수치 추론 데이터의 이해를 도울 수 있는 전처리와 요구사항을 만족하지 못하는 결과물을 필터링 하는 검증 단계를 추가하여 생성되는 데이터의 퀄리티를 보장하고자 하였다. 이렇게 얻어진 증강 절차를 거쳐 증강을 진행한 뒤 추론용 모델 학습을 통해 다른 증강 방법론보다 우리의 방법론으로 증강된 데이터셋으로 학습된 모델이 더 높은 성능을 낼 수 있음을 보였다. 실험 결과 우리의 증강 데이터로 학습된 모델은 원본 데이터로 학습된 모델보다 모든 지표에서 2%p 이상의 성능 향상을 보였으며 다양한 케이스를 통해 우리의 모델이 수치 추론 학습 데이터의 다양성을 크게 향상시킬 수 있음을 확인하였다.

  • PDF

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

A Supervised Learning Framework for Physics-based Controllers Using Stochastic Model Predictive Control (확률적 모델예측제어를 이용한 물리기반 제어기 지도 학습 프레임워크)

  • Han, Daseong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.27 no.1
    • /
    • pp.9-17
    • /
    • 2021
  • In this paper, we present a simple and fast supervised learning framework based on model predictive control so as to learn motion controllers for a physic-based character to track given example motions. The proposed framework is composed of two components: training data generation and offline learning. Given an example motion, the former component stochastically controls the character motion with an optimal controller while repeatedly updating the controller for tracking the example motion through model predictive control over a time window from the current state of the character to a near future state. The repeated update of the optimal controller and the stochastic control make it possible to effectively explore various states that the character may have while mimicking the example motion and collect useful training data for supervised learning. Once all the training data is generated, the latter component normalizes the data to remove the disparity for magnitude and units inherent in the data and trains an artificial neural network with a simple architecture for a controller. The experimental results for walking and running motions demonstrate how effectively and fast the proposed framework produces physics-based motion controllers.

Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems (클래스 불균형 문제에서 베이지안 알고리즘의 학습 행위 분석)

  • Hwang, Doo-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.6
    • /
    • pp.179-186
    • /
    • 2008
  • In this paper we analyse the effects of Bayesian algorithm in teaming class imbalance problems and compare the performance evaluation methods. The teaming performance of the Bayesian algorithm is evaluated over the class imbalance problems generated by priori data distribution, imbalance data rate and discrimination complexity. The experimental results are calculated by the AUC(Area Under the Curve) values of both ROC(Receiver Operator Characteristic) and PR(Precision-Recall) evaluation measures and compared according to imbalance data rate and discrimination complexity. In comparison and analysis, the Bayesian algorithm suffers from the imbalance rate, as the same result in the reported researches, and the data overlapping caused by discrimination complexity is the another factor that hampers the learning performance. As the discrimination complexity and class imbalance rate of the problems increase, the learning performance of the AUC of a PR measure is much more variant than that of the AUC of a ROC measure. But the performances of both measures are similar with the low discrimination complexity and class imbalance rate of the problems. The experimental results show 4hat the AUC of a PR measure is more proper in evaluating the learning of class imbalance problem and furthermore gets the benefit in designing the optimal learning model considering a misclassification cost.