• Title/Summary/Keyword: 데이터편향

Search Result 169, Processing Time 0.025 seconds

Adaptive Face Recognition System Using Genetic Alogrithm (유전 알고리즘을 사용한 환경 적응형 얼굴 인식 시스템)

  • 조병모;전인자;이필규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10d
    • /
    • pp.574-576
    • /
    • 2002
  • 2D 영상을 가지고 인식 작업을 수행하는데 있어서 입력 영상의 질은 매우 중요한 요소이다. 특히 얼굴 인식과 같은 실시간 입력 데이터와 미리 등록되어진 데이터와 비교하는 경우는 입력 영상과 등록 영상의 상태 차이가 크면 좋은 알고리즘이라 할지라도 높은 성능을 내기는 힘들다. 즉, 테스트를 위한 입력 영상을 등록 영상의 수준과 유사하게 만들어 전체적인 성능을 높일 수 있는 적응형 방법이 필요하다. 본 논문에서는 유전 알고리즘을 이용하여, 하나의 샘플 이미지에서 환경 의존적인 요소를 제거 하기 위한 최적의 필터 조합과 특징 추출 마스크를 생성하였으며, 그것을 사용하여 인식 테스트를 수행하였다. 가상의 편향조명 노이즈를 첨가한 실험에서 진화 전의 약 25% 인식율은 진화 후 약 92% 까지 향상되었으며, 임의의 임펄스 노이즈에 관한 실험에서도 진화 전의 약 47%의 인식율에서 진화 후 약 84%의 높은 인식율 향상 결과를 보여주었다.

  • PDF

Dynamic Sampling Scheduler for Unbalanced Data Classification (불균형 범주 분류를 위한 동적 샘플링 스케줄러)

  • Seong, Su-Jin;Park, Won-Joo;Lee, Yong-Tae;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.221-226
    • /
    • 2021
  • 우리는 범주 불균형 분류 문제를 해결하기 위해 학습 과정 중 범주 크기 기반 배치 샘플링 방법 전환을 위한 스케줄링 방법을 제안한다. 범주별 샘플링 확률로 범주 크기의 역수(LWRS-Reciporcal)와 범주 비율의 반수(LWRS-Ratio)를 적용하여 각각 실험을 진행하였고, LWRS-Reciporcal 방법이 F1 성능 개선에 더 효과적인 것을 확인하였다. 더하여 고정된 샘플링 확률값으로 인해 발생할 수 있는 또 다른 편향 문제를 완화하기 위해 학습 과정 중 샘플링 방법을 전환하는 스케줄링 방법을 설계하였다. 결과적으로 검증 성능의 갱신 유무로 샘플링 방법을 전환하였을 때 naver shopping 데이터셋과 KLUE-TC에 대하여 f1 score와 accuracy의 성능 합이 베이스라인보다 각각 0.7%, 0.8% 향상된 가장 이상적인 성능을 보임을 확인하였다.

  • PDF

A Case Study of Data Editing for the Korean Housing Price Survey (주택가격동향조사를 위한 데이터편집 사례연구)

  • Park, Jin-Woo;Park, Hyun-Joo;Kim, Jin-Eok
    • Survey Research
    • /
    • v.6 no.1
    • /
    • pp.83-98
    • /
    • 2005
  • Large scale survey database may contain some erroneous data or missing data. Incomplete or erroneous data may be produced in the process of data collection or data capture. Since erroneous data can cause some bias and inconsistency, data editing, which is the procedure for detecting and adjusting individual errors in data records, is a very important work in statistical survey. In this paper, we introduce an editing process for the housing price survey to enhance discussions on that topic. We explain how to decide some appropriate edit rules and show some related data. Furthermore, we describe input editing procedures which is appropriate for on-line survey and how to find and eliminate erroneous data through output editing.

  • PDF

A Vision Transformer Based Recommender System Using Side Information (부가 정보를 활용한 비전 트랜스포머 기반의 추천시스템)

  • Kwon, Yujin;Choi, Minseok;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.119-137
    • /
    • 2022
  • Recent recommendation system studies apply various deep learning models to represent user and item interactions better. One of the noteworthy studies is ONCF(Outer product-based Neural Collaborative Filtering) which builds a two-dimensional interaction map via outer product and employs CNN (Convolutional Neural Networks) to learn high-order correlations from the map. However, ONCF has limitations in recommendation performance due to the problems with CNN and the absence of side information. ONCF using CNN has an inductive bias problem that causes poor performances for data with a distribution that does not appear in the training data. This paper proposes to employ a Vision Transformer (ViT) instead of the vanilla CNN used in ONCF. The reason is that ViT showed better results than state-of-the-art CNN in many image classification cases. In addition, we propose a new architecture to reflect side information that ONCF did not consider. Unlike previous studies that reflect side information in a neural network using simple input combination methods, this study uses an independent auxiliary classifier to reflect side information more effectively in the recommender system. ONCF used a single latent vector for user and item, but in this study, a channel is constructed using multiple vectors to enable the model to learn more diverse expressions and to obtain an ensemble effect. The experiments showed our deep learning model improved performance in recommendation compared to ONCF.

Exploration of Types and Context of Errors in the Weather Data Analysis Process (기상 데이터 분석 과정에서 나타나는 오류의 유형과 맥락 탐색)

  • Seok-Young Hong
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.17 no.2
    • /
    • pp.153-167
    • /
    • 2024
  • This study explored the errors and context occurred during high school students' data analysis processes. For the study, 222 data inquiry reports produced by 74 students from 'A' High School were collected and explored the detailed error types in the data analysis processes such as data collection and preprocessing, data representation, and data interpretation. The results of study found that in the data interpretation process, students had a somewhat insufficient understanding of seasonal variations and periodic patterns about weather elements. And, various types of errors were identified in the data representation process, such as basic unit in graphs, legend settings, trend lines. The causes of these errors are the feature of authoring tools, misconceptions related to weather elements, and cognitive biases, etc. Based on the study's results, educational implications for big data education, a significant topic in future science education, were derived. And related follow-up studies were suggested.

Characteristics of Propulsion System at the High Altitude Flight Test of 50m-long Airship (50m급 비행선의 고고도 비행시험에서 추진시스템 특성)

  • Jung Yong-Wun;Yang Soo-Seok;Kim Dong-Min
    • Proceedings of the Korean Society of Propulsion Engineers Conference
    • /
    • 2006.05a
    • /
    • pp.41-44
    • /
    • 2006
  • The propulsion system of VIA-50A airship consists of engine, generator, inverter, motor and propeller. The motor and propeller was designed that can be tilted to $120^{\circ}$ for thrust vector control. When the flight test was performed, various condition data of the airship were obtained by wireless telecommunication and analyzed in real-time. In this paper, we presented flight test results of propulsion system. Considering the designed requirement and normal range, we verified that all constituent part was operated in normal condition during the high altitude flight test.

  • PDF

A Design and Implementation of Missing Person Identification System using face Recognition

  • Shin, Jong-Hwan;Park, Chan-Mi;Lee, Heon-Ju;Lee, Seoung-Hyeon;Lee, Jae-Kwang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.2
    • /
    • pp.19-25
    • /
    • 2021
  • In this paper proposes a method of finding missing persons based on face-recognition technology and deep learning. In this paper, a real-time face-recognition technology was developed, which performs face verification and improves the accuracy of face identification through data fortification for face recognition and convolutional neural network(CNN)-based image learning after the pre-processing of images transmitted from a mobile device. In identifying a missing person's image using the system implemented in this paper, the model that learned both original and blur-processed data performed the best. Further, a model using the pre-learned Noisy Student outperformed the one not using the same, but it has had a limitation of producing high levels of deflection and dispersion.

A Clustering-based Undersampling Method to Prevent Information Loss from Text Data (텍스트 데이터의 정보 손실을 방지하기 위한 군집화 기반 언더샘플링 기법)

  • Jong-Hwi Kim;Saim Shin;Jin Yea Jang
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.251-256
    • /
    • 2022
  • 범주 불균형은 분류 모델이 다수 범주에 편향되게 학습되어 소수 범주에 대한 분류 성능을 떨어뜨리는 문제를 야기한다. 언더 샘플링 기법은 다수 범주 데이터의 수를 줄여 소수 범주와 균형을 이루게하는 대표적인 불균형 해결 방법으로, 텍스트 도메인에서의 기존 언더 샘플링 연구에서는 단어 임베딩과 랜덤 샘플링과 같은 비교적 간단한 기법만이 적용되었다. 본 논문에서는 트랜스포머 기반 문장 임베딩과 군집화 기반 샘플링 방법을 통해 텍스트 데이터의 정보 손실을 최소화하는 언더샘플링 방법을 제안한다. 제안 방법의 검증을 위해, 감성 분석 실험에서 제안 방법과 랜덤 샘플링으로 추출한 훈련 세트로 모델을 학습하고 성능을 비교 평가하였다. 제안 방법을 활용한 모델이 랜덤 샘플링을 활용한 모델에 비해 적게는 0.2%, 많게는 2.0% 높은 분류 정확도를 보였고, 이를 통해 제안하는 군집화 기반 언더 샘플링 기법의 효과를 확인하였다.

  • PDF

On the Performance Biases Arising from Inconsistencies in Evaluation Methodologies of Deepfake Detection Models (딥페이크 탐지 모델의 검증 방법론 불일치에 따른 성능 편향 분석 연구)

  • Hyunjoon Kim;Hong Eun Ahn;Leo Hyun Park;Taekyoung Kwon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.5
    • /
    • pp.885-893
    • /
    • 2024
  • As deepfake technology advances, its increasing misuse has spurred extensive research into detection models. These models' performance evaluations, which include selecting train and test datasets, data preprocessing, and data augmentation, are often compromised by arbitrarily chosen validation methodologies in existing studies. This leads to biases under standardized conditions. This paper reviews these methodologies to pinpoint what diminishes evaluation reliability. Experiments in standardized environments reveal the difficulties in comparing performance absolutely. The findings highlighted the need for a consistent validation methodology to boost evaluation reliability and enable fair comparisons.

Vibration Anomaly Detection of One-Class Classification using Multi-Column AutoEncoder

  • Sang-Min, Kim;Jung-Mo, Sohn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.2
    • /
    • pp.9-17
    • /
    • 2023
  • In this paper, we propose a one-class vibration anomaly detection system for bearing defect diagnosis. In order to reduce the economic and time loss caused by bearing failure, an accurate defect diagnosis system is essential, and deep learning-based defect diagnosis systems are widely studied to solve the problem. However, it is difficult to obtain abnormal data in the actual data collection environment for deep learning learning, which causes data bias. Therefore, a one-class classification method using only normal data is used. As a general method, the characteristics of vibration data are extracted by learning the compression and restoration process through AutoEncoder. Anomaly detection is performed by learning a one-class classifier with the extracted features. However, this method cannot efficiently extract the characteristics of the vibration data because it does not consider the frequency characteristics of the vibration data. To solve this problem, we propose an AutoEncoder model that considers the frequency characteristics of vibration data. As for classification performance, accuracy 0.910, precision 1.0, recall 0.820, and f1-score 0.901 were obtained. The network design considering the vibration characteristics confirmed better performance than existing methods.