• Title/Summary/Keyword: Data Bias

Search Result 1,766, Processing Time 0.031 seconds

Fashion Category Oversampling Automation System

  • Minsun Yeu;Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.31-40
    • /
    • 2024
  • In the realm of domestic online fashion platform industry the manual registration of product information by individual business owners leads to inconvenience and reliability issues, especially when dealing with simultaneous registrations of numerous product groups. Moreover, bias is significantly heightened due to the low quality of product images and an imbalance in data quantity. Therefore, this study proposes a ResNet50 model aimed at minimizing data bias through oversampling techniques and conducting multiple classifications for 13 fashion categories. Transfer learning is employed to optimize resource utilization and reduce prolonged learning times. The results indicate improved discrimination of up to 33.4% for data augmentation in classes with insufficient data compared to the basic convolution neural network (CNN) model. The reliability of all outcomes is underscored by precision and affirmed by the recall curve. This study is suggested to advance the development of the domestic online fashion platform industry to a higher echelon.

Implementation of a Real-time Data fusion Algorithm for Flight Test Computer (비행시험통제컴퓨터용 실시간 데이터 융합 알고리듬의 구현)

  • Lee, Yong-Jae;Won, Jong-Hoon;Lee, Ja-Sung
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.8 no.4 s.23
    • /
    • pp.24-31
    • /
    • 2005
  • This paper presents an implementation of a real-time multi-sensor data fusion algorithm for Flight Test Computer. The sensor data consist of positional information of the target from a radar, a GPS receiver and an INS. The data fusion algorithm is designed by the 21st order distributed Kalman Filter which is based on the PVA model with sensor bias states. A fault detection and correction logics are included in the algorithm for bad measurements and sensor faults. The statistical parameters for the states are obtained from Monte Carlo simulations and covariance analysis using test tracking data. The designed filter is verified by using real data both in post processing and real-time processing.

A case study of competing risk analysis in the presence of missing data

  • Limei Zhou;Peter C. Austin;Husam Abdel-Qadir
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.1
    • /
    • pp.1-19
    • /
    • 2023
  • Observational data with missing or incomplete data are common in biomedical research. Multiple imputation is an effective approach to handle missing data with the ability to decrease bias while increasing statistical power and efficiency. In recent years propensity score (PS) matching has been increasingly used in observational studies to estimate treatment effect as it can reduce confounding due to measured baseline covariates. In this paper, we describe in detail approaches to competing risk analysis in the setting of incomplete observational data when using PS matching. First, we used multiple imputation to impute several missing variables simultaneously, then conducted propensity-score matching to match statin-exposed patients with those unexposed. Afterwards, we assessed the effect of statin exposure on the risk of heart failure-related hospitalizations or emergency visits by estimating both relative and absolute effects. Collectively, we provided a general methodological framework to assess treatment effect in incomplete observational data. In addition, we presented a practical approach to produce overall cumulative incidence function (CIF) based on estimates from multiple imputed and PS-matched samples.

1D-CNN-LSTM Hybrid-Model-Based Pet Behavior Recognition through Wearable Sensor Data Augmentation

  • Hyungju Kim;Nammee Moon
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.159-172
    • /
    • 2024
  • The number of healthcare products available for pets has increased in recent times, which has prompted active research into wearable devices for pets. However, the data collected through such devices are limited by outliers and missing values owing to the anomalous and irregular characteristics of pets. Hence, we propose pet behavior recognition based on a hybrid one-dimensional convolutional neural network (CNN) and long short- term memory (LSTM) model using pet wearable devices. An Arduino-based pet wearable device was first fabricated to collect data for behavior recognition, where gyroscope and accelerometer values were collected using the device. Then, data augmentation was performed after replacing any missing values and outliers via preprocessing. At this time, the behaviors were classified into five types. To prevent bias from specific actions in the data augmentation, the number of datasets was compared and balanced, and CNN-LSTM-based deep learning was performed. The five subdivided behaviors and overall performance were then evaluated, and the overall accuracy of behavior recognition was found to be about 88.76%.

A Method to Evaluate the Radar Rainfall Accuracy for Hydrological Application (수문학적 활용을 위한 레이더 강우의 정확도 평가 방법)

  • Bae, Deg-Hyo;Phuong, Tran Ahn;Yoon, Seong-Sim
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.12
    • /
    • pp.1039-1052
    • /
    • 2009
  • Radar measurement with high temporal and spatial resolutions can be a valuable source of data, especially in the areas where rain gauge installment is not practical. However, this kind of data brings with it many errors. The objective of this paper is to propose a method to evaluate statistically the quantitative and qualitative accuracy at different radar ranges, temporal intervals and raingage densities and use a bias adjustment technique to improve the quality of radar rainfall for the purpose of hydrological application. The method is tested with the data of 2 storm events collected at Jindo (S band) and Kwanak (C band) radar stations. The obtained results show that the accuracy of radar rainfall estimation increases when time interval rises. Radar data at the shorter range seems to be more accurate than the further one, especially for C-band radar. Using the Monte Carlo simulation experiment, we find out that the sampling error of the bias between radar and gauge rainfall reduces nonlinearly with increasing raingage density. The accuracy can be improved considerably if the real-time bias adjustment is applied, making adjusted radar rainfall to be adequately good to apply for hydrological application.

An Empirical Study on the Effect of Respondent Bias in PSM : Case in Apartment Pricing (PSM 가격평가 주체에 따른 아파트 가격결정 효용성 실증연구)

  • Cho, Han-Jin;Kim, Jong-Lim
    • Land and Housing Review
    • /
    • v.7 no.4
    • /
    • pp.217-223
    • /
    • 2016
  • PSM is widely used pricing tool in field by the reason of data collection convenience and analytical intuitiveness. However, In high involvement environment, strategic respondent bias influence in reducing the price. By using 3 empirical cases of LH apartment for sale, We found that latent consumers' recognition of the range of acceptable and the range of optimal price are lower than real estate agent representative respondents'. This phenomenon is considered loss aversion effect of prospect theory to reduce loss by reducing price, and more influenced in high involvement situation than latent consumer respondents'. Also we found PSM result using real estate representative data is more useful in real market than latent consumers data distorted by loss aversion effects. The meaning of this study is finding some limitation in PSM using consumer data generally used. In further study, development of PSM measurement tool to minimize the effect of strategic bias are need to be studied. Also some new approaches in reinterpretation of the range of acceptable price and the range of optimal price are need to be followed.

Design-based Properties of Least Square Estimators in Panel Regression Model (패널회귀모형에서 회귀계수 추정량의 설계기반 성질)

  • Kim, Kyu-Seong
    • Survey Research
    • /
    • v.12 no.3
    • /
    • pp.49-62
    • /
    • 2011
  • In this paper we investigate design-based properties of both the ordinary least square estimator and the weighted least square estimator for regression coefficients in panel regression model. We derive formulas of approximate bias, variance and mean square error for the ordinary least square estimator and approximate variance for the weighted least square estimator after linearization of least square estimators. Also we compare their magnitudes each other numerically through a simulation study. We consider a three years data of Korean Welfare Panel Study as a finite population and take household income as a dependent variable and choose 7 exploratory variables related household as independent variables in panel regression model. Then we calculate approximate bias, variance, mean square error for the ordinary least square estimator and approximate variance for the weighted least square estimator based on several sample sizes from 50 to 1,000 by 50. Through the simulation study we found some tendencies as follows. First, the mean square error of the ordinary least square estimator is getting larger than the variance of the weighted least square estimator as sample sizes increase. Next, the magnitude of mean square error of the ordinary least square estimator is depending on the magnitude of the bias of the estimator, which is large when the bias is large. Finally, with regard to approximate variance, variances of the ordinary least square estimator are smaller than those of the weighted least square estimator in many cases in the simulation.

  • PDF

Estimation of LRFD Resistance Bias Factors for Pullout Resistance of Soil-Nailing (쏘일네일링의 인발저항에 대한 LRFD 저항편향계수 산정)

  • Son, Byeong-Doo;Lim, Heui-Dae;Park, Joon-Mo
    • Journal of the Korean Geotechnical Society
    • /
    • v.31 no.10
    • /
    • pp.5-16
    • /
    • 2015
  • Considering the conversion of the Korea Construction Standards to Limit State Design (LSD), we analyzed the resistance bias factor for pullout resistance, as a part of the development of the Load and Resistance Factor Design (LRFD) for soil nailing; very few studies have been conducted on soil nailing. In order to reflect the local characteristics of soil nailing, such as the design and construction level, we collected statistics on pullout tests conducted on slopes and excavation construction sites around the country. In this study a database was built based on the geotechnical properties, soil nailing specifications, and pullout test results. The resistance bias factors are calculated to determine the resistance factor of the pullout resistance for gravity and pressurized grouting method, which are the most commonly used methods in Korea; moreover, we have relatively sufficient data on these methods. We found the resistance bias factors to be 1.144 and 1.325, which are relatively conservative values for predicting the actual ultimate pullout resistance. It showed that our designs are safer than those found in a research case in the United States (NCHRP Report); however, there was an uncertainty, $COV_R$, of 0.27-0.43 in the pullout resistance, which is relatively high. In addition, the pressurized grouting method has a greater margin of safety than the gravity grouting method, and the actual ultimate pullout resistance determined using the pressurized grouting method has low uncertainty.

Weighting Effect on the Weighted Mean in Finite Population (유한모집단에서 가중평균에 포함된 가중치의 효과)

  • Kim, Kyu-Seong
    • Survey Research
    • /
    • v.7 no.2
    • /
    • pp.53-69
    • /
    • 2006
  • Weights can be made and imposed in both sample design stage and analysis stage in a sample survey. While in design stage weights are related with sample data acquisition quantities such as sample selection probability and response rate, in analysis stage weights are connected with external quantities, for instance population quantities and some auxiliary information. The final weight is the product of all weights in both stage. In the present paper, we focus on the weight in analysis stage and investigate the effect of such weights imposed on the weighted mean when estimating the population mean. We consider a finite population with a pair of fixed survey value and weight in each unit, and suppose equal selection probability designs. Under the condition we derive the formulas of the bias as well as mean square error of the weighted mean and show that the weighted mean is biased and the direction and amount of the bias can be explained by the correlation between survey variate and weight: if the correlation coefficient is positive, then the weighted mein over-estimates the population mean, on the other hand, if negative, then under-estimates. Also the magnitude of bias is getting larger when the correlation coefficient is getting greater. In addition to theoretical derivation about the weighted mean, we conduct a simulation study to show quantities of the bias and mean square errors numerically. In the simulation, nine weights having correlation coefficient with survey variate from -0.2 to 0.6 are generated and four sample sizes from 100 to 400 are considered and then biases and mean square errors are calculated in each case. As a result, in the case or 400 sample size and 0.55 correlation coefficient, the amount or squared bias of the weighted mean occupies up to 82% among mean square error, which says the weighted mean might be biased very seriously in some cases.

  • PDF

A GNSS Code Tracking Scheme Based in Slope Difference of Correlation Outputs (상관 함수의 기울기 차에 기반한 GNSS의 부호 추적 기법)

  • Yoo, Seung-Soo;Yoo, Seung-Hwan;Chong, Da-Hae;Ahn, Sang-Ho;Yoon, Seok-Ho;Kim, Sun-Yong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.6C
    • /
    • pp.505-511
    • /
    • 2008
  • The global navigation satellite system (GNSS) is using a direct sequence/spread spectrum (DS/SS) modulation. In order to recover the information data, the DS/SS system first performs a two-step synchronization process: acquisition and tracking. The acquisition process adjusts the phase difference between the received and locally generated acquisition sequences within ${\pm}T_c/2$ or less, where $T_c$ is the chip period. The tracking process performs fine synchronization. In this paper, we focus on the tracking issue. The single delta delay locked loop($\Delta$-DLL) is the optimal tracking scheme for a GNSS in the absence of multipath signals, where $\Delta$ means the spacing between the early and late correlation time offset. In the multipath environments, however, the $\Delta$-DLL suffers from huge estimation bias(denoted by $\beta$) caused by distorted correlation values. Although some modified schemes such as a $\Delta$-DLL with a narrow $\Delta$ and a double delta DLL (${\Delta}^{(2)}$-DLL) were proposed to reduce the estimation bias, they cannot remove the estimation bias completely and need more accurate acquisition process. This paper proposes a novel tracking scheme that can dramatically reduce the estimation bias, using the maximum slope change among the correlation outputs.