• Title/Summary/Keyword: 편향된 데이터

Search Result 163, Processing Time 0.026 seconds

A Study on Selecting Principle Component Variables Using Adaptive Correlation (적응적 상관도를 이용한 주성분 변수 선정에 관한 연구)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.3
    • /
    • pp.79-84
    • /
    • 2021
  • A feature extraction method capable of reflecting features well while mainaining the properties of data is required in order to process high-dimensional data. The principal component analysis method that converts high-level data into low-dimensional data and express high-dimensional data with fewer variables than the original data is a representative method for feature extraction of data. In this study, we propose a principal component analysis method based on adaptive correlation when selecting principal component variables in principal component analysis for data feature extraction when the data is high-dimensional. The proposed method analyzes the principal components of the data by adaptively reflecting the correlation based on the correlation between the input data. I want to exclude them from the candidate list. It is intended to analyze the principal component hierarchy by the eigen-vector coefficient value, to prevent the selection of the principal component with a low hierarchy, and to minimize the occurrence of data duplication inducing data bias through correlation analysis. Through this, we propose a method of selecting a well-presented principal component variable that represents the characteristics of actual data by reducing the influence of data bias when selecting the principal component variable.

Adaptive Face Recognition System Using Genetic Alogrithm (유전 알고리즘을 사용한 환경 적응형 얼굴 인식 시스템)

  • 조병모;전인자;이필규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.574-576
    • /
    • 2002
  • 2D 영상을 가지고 인식 작업을 수행하는데 있어서 입력 영상의 질은 매우 중요한 요소이다. 특히 얼굴 인식과 같은 실시간 입력 데이터와 미리 등록되어진 데이터와 비교하는 경우는 입력 영상과 등록 영상의 상태 차이가 크면 좋은 알고리즘이라 할지라도 높은 성능을 내기는 힘들다. 즉, 테스트를 위한 입력 영상을 등록 영상의 수준과 유사하게 만들어 전체적인 성능을 높일 수 있는 적응형 방법이 필요하다. 본 논문에서는 유전 알고리즘을 이용하여, 하나의 샘플 이미지에서 환경 의존적인 요소를 제거 하기 위한 최적의 필터 조합과 특징 추출 마스크를 생성하였으며, 그것을 사용하여 인식 테스트를 수행하였다. 가상의 편향조명 노이즈를 첨가한 실험에서 진화 전의 약 25% 인식율은 진화 후 약 92% 까지 향상되었으며, 임의의 임펄스 노이즈에 관한 실험에서도 진화 전의 약 47%의 인식율에서 진화 후 약 84%의 높은 인식율 향상 결과를 보여주었다.

  • PDF

Dynamic Sampling Scheduler for Unbalanced Data Classification (불균형 범주 분류를 위한 동적 샘플링 스케줄러)

  • Seong, Su-Jin;Park, Won-Joo;Lee, Yong-Tae;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.221-226
    • /
    • 2021
  • 우리는 범주 불균형 분류 문제를 해결하기 위해 학습 과정 중 범주 크기 기반 배치 샘플링 방법 전환을 위한 스케줄링 방법을 제안한다. 범주별 샘플링 확률로 범주 크기의 역수(LWRS-Reciporcal)와 범주 비율의 반수(LWRS-Ratio)를 적용하여 각각 실험을 진행하였고, LWRS-Reciporcal 방법이 F1 성능 개선에 더 효과적인 것을 확인하였다. 더하여 고정된 샘플링 확률값으로 인해 발생할 수 있는 또 다른 편향 문제를 완화하기 위해 학습 과정 중 샘플링 방법을 전환하는 스케줄링 방법을 설계하였다. 결과적으로 검증 성능의 갱신 유무로 샘플링 방법을 전환하였을 때 naver shopping 데이터셋과 KLUE-TC에 대하여 f1 score와 accuracy의 성능 합이 베이스라인보다 각각 0.7%, 0.8% 향상된 가장 이상적인 성능을 보임을 확인하였다.

  • PDF

A Case Study of Data Editing for the Korean Housing Price Survey (주택가격동향조사를 위한 데이터편집 사례연구)

  • Park, Jin-Woo;Park, Hyun-Joo;Kim, Jin-Eok
    • Survey Research
    • /
    • v.6 no.1
    • /
    • pp.83-98
    • /
    • 2005
  • Large scale survey database may contain some erroneous data or missing data. Incomplete or erroneous data may be produced in the process of data collection or data capture. Since erroneous data can cause some bias and inconsistency, data editing, which is the procedure for detecting and adjusting individual errors in data records, is a very important work in statistical survey. In this paper, we introduce an editing process for the housing price survey to enhance discussions on that topic. We explain how to decide some appropriate edit rules and show some related data. Furthermore, we describe input editing procedures which is appropriate for on-line survey and how to find and eliminate erroneous data through output editing.

  • PDF

A Vision Transformer Based Recommender System Using Side Information (부가 정보를 활용한 비전 트랜스포머 기반의 추천시스템)

  • Kwon, Yujin;Choi, Minseok;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.119-137
    • /
    • 2022
  • Recent recommendation system studies apply various deep learning models to represent user and item interactions better. One of the noteworthy studies is ONCF(Outer product-based Neural Collaborative Filtering) which builds a two-dimensional interaction map via outer product and employs CNN (Convolutional Neural Networks) to learn high-order correlations from the map. However, ONCF has limitations in recommendation performance due to the problems with CNN and the absence of side information. ONCF using CNN has an inductive bias problem that causes poor performances for data with a distribution that does not appear in the training data. This paper proposes to employ a Vision Transformer (ViT) instead of the vanilla CNN used in ONCF. The reason is that ViT showed better results than state-of-the-art CNN in many image classification cases. In addition, we propose a new architecture to reflect side information that ONCF did not consider. Unlike previous studies that reflect side information in a neural network using simple input combination methods, this study uses an independent auxiliary classifier to reflect side information more effectively in the recommender system. ONCF used a single latent vector for user and item, but in this study, a channel is constructed using multiple vectors to enable the model to learn more diverse expressions and to obtain an ensemble effect. The experiments showed our deep learning model improved performance in recommendation compared to ONCF.

Characteristics of Propulsion System at the High Altitude Flight Test of 50m-long Airship (50m급 비행선의 고고도 비행시험에서 추진시스템 특성)

  • Jung Yong-Wun;Yang Soo-Seok;Kim Dong-Min
    • Proceedings of the Korean Society of Propulsion Engineers Conference
    • /
    • 2006.05a
    • /
    • pp.41-44
    • /
    • 2006
  • The propulsion system of VIA-50A airship consists of engine, generator, inverter, motor and propeller. The motor and propeller was designed that can be tilted to $120^{\circ}$ for thrust vector control. When the flight test was performed, various condition data of the airship were obtained by wireless telecommunication and analyzed in real-time. In this paper, we presented flight test results of propulsion system. Considering the designed requirement and normal range, we verified that all constituent part was operated in normal condition during the high altitude flight test.

  • PDF

A Design and Implementation of Missing Person Identification System using face Recognition

  • Shin, Jong-Hwan;Park, Chan-Mi;Lee, Heon-Ju;Lee, Seoung-Hyeon;Lee, Jae-Kwang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.19-25
    • /
    • 2021
  • In this paper proposes a method of finding missing persons based on face-recognition technology and deep learning. In this paper, a real-time face-recognition technology was developed, which performs face verification and improves the accuracy of face identification through data fortification for face recognition and convolutional neural network(CNN)-based image learning after the pre-processing of images transmitted from a mobile device. In identifying a missing person's image using the system implemented in this paper, the model that learned both original and blur-processed data performed the best. Further, a model using the pre-learned Noisy Student outperformed the one not using the same, but it has had a limitation of producing high levels of deflection and dispersion.

A Clustering-based Undersampling Method to Prevent Information Loss from Text Data (텍스트 데이터의 정보 손실을 방지하기 위한 군집화 기반 언더샘플링 기법)

  • Jong-Hwi Kim;Saim Shin;Jin Yea Jang
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.251-256
    • /
    • 2022
  • 범주 불균형은 분류 모델이 다수 범주에 편향되게 학습되어 소수 범주에 대한 분류 성능을 떨어뜨리는 문제를 야기한다. 언더 샘플링 기법은 다수 범주 데이터의 수를 줄여 소수 범주와 균형을 이루게하는 대표적인 불균형 해결 방법으로, 텍스트 도메인에서의 기존 언더 샘플링 연구에서는 단어 임베딩과 랜덤 샘플링과 같은 비교적 간단한 기법만이 적용되었다. 본 논문에서는 트랜스포머 기반 문장 임베딩과 군집화 기반 샘플링 방법을 통해 텍스트 데이터의 정보 손실을 최소화하는 언더샘플링 방법을 제안한다. 제안 방법의 검증을 위해, 감성 분석 실험에서 제안 방법과 랜덤 샘플링으로 추출한 훈련 세트로 모델을 학습하고 성능을 비교 평가하였다. 제안 방법을 활용한 모델이 랜덤 샘플링을 활용한 모델에 비해 적게는 0.2%, 많게는 2.0% 높은 분류 정확도를 보였고, 이를 통해 제안하는 군집화 기반 언더 샘플링 기법의 효과를 확인하였다.

  • PDF

Vibration Anomaly Detection of One-Class Classification using Multi-Column AutoEncoder

  • Sang-Min, Kim;Jung-Mo, Sohn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.2
    • /
    • pp.9-17
    • /
    • 2023
  • In this paper, we propose a one-class vibration anomaly detection system for bearing defect diagnosis. In order to reduce the economic and time loss caused by bearing failure, an accurate defect diagnosis system is essential, and deep learning-based defect diagnosis systems are widely studied to solve the problem. However, it is difficult to obtain abnormal data in the actual data collection environment for deep learning learning, which causes data bias. Therefore, a one-class classification method using only normal data is used. As a general method, the characteristics of vibration data are extracted by learning the compression and restoration process through AutoEncoder. Anomaly detection is performed by learning a one-class classifier with the extracted features. However, this method cannot efficiently extract the characteristics of the vibration data because it does not consider the frequency characteristics of the vibration data. To solve this problem, we propose an AutoEncoder model that considers the frequency characteristics of vibration data. As for classification performance, accuracy 0.910, precision 1.0, recall 0.820, and f1-score 0.901 were obtained. The network design considering the vibration characteristics confirmed better performance than existing methods.

A Study on Building Knowledge Base for Intelligent Battlefield Awareness Service

  • Jo, Se-Hyeon;Kim, Hack-Jun;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose a method to build a knowledge base based on natural language processing for intelligent battlefield awareness service. The current command and control system manages and utilizes the collected battlefield information and tactical data at a basic level such as registration, storage, and sharing, and information fusion and situation analysis by an analyst is performed. This is an analyst's temporal constraints and cognitive limitations, and generally only one interpretation is drawn, and biased thinking can be reflected. Therefore, it is essential to aware the battlefield situation of the command and control system and to establish the intellignet decision support system. To do this, it is necessary to build a knowledge base specialized in the command and control system and develop intelligent battlefield awareness services based on it. In this paper, among the entity names suggested in the exobrain corpus, which is the private data, the top 250 types of meaningful names were applied and the weapon system entity type was additionally identified to properly represent battlefield information. Based on this, we proposed a way to build a battlefield-aware knowledge base through mention extraction, cross-reference resolution, and relationship extraction.