• 제목/요약/키워드: Outlier & Missing Value

검색결과 13건 처리시간 0.018초

수질오염 감시체계 구축을 위한 수질 데이터의 통계적 예측 가능성 검토 (A Study on the Statistical Predictability of Drinking Water Qualities for Contamination Warning System)

  • 박노석;이영주;채선하;윤석민
    • 상하수도학회지
    • /
    • 제29권4호
    • /
    • pp.469-479
    • /
    • 2015
  • This study have been conducted to analyze the feasibility of establishing Contamination Warning System(CWS) that is capable of monitoring early natural or intentional water quality accidents, and providing active and quick responses for domestic C_water supply system. In order to evaluate the water quality data set, pH, turbidity and free residual chlorine concentration data were collected and each statistical value(mean, variation, range) was calculated, then the seasonal variability of those were analyzed using the independent t-test. From the results of analyzing the distribution of outliers in the measurement data using a high-pass filter, it could be confirmed that a lot of lower outliers appeared due to data missing. In addition, linear filter model based on autoregressive model(AR(1) and AR(2)) was applied for the state estimation of each water quality data set. From the results of analyzing the variability of the autocorrelation coefficient structure according to the change of window size(6hours~48hours), at least the window size longer than 12hours should be necessary for estimating the state of water quality data satisfactorily.

An Application of Support Vector Machines to Customer Loyalty Classification of Korean Retailing Company Using R Language

  • 응위엔푸티엔;이영찬
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제26권4호
    • /
    • pp.17-37
    • /
    • 2017
  • Purpose Customer Loyalty is the most important factor of customer relationship management (CRM). Especially in retailing industry, where customers have many options of where to spend their money. Classifying loyal customers through customers' data can help retailing companies build more efficient marketing strategies and gain competitive advantages. This study aims to construct classification models of distinguishing the loyal customers within a Korean retailing company using data mining techniques with R language. Design/methodology/approach In order to classify retailing customers, we used combination of support vector machines (SVMs) and other classification algorithms of machine learning (ML) with the support of recursive feature elimination (RFE). In particular, we first clean the dataset to remove outlier and impute the missing value. Then we used a RFE framework for electing most significant predictors. Finally, we construct models with classification algorithms, tune the best parameters and compare the performances among them. Findings The results reveal that ML classification techniques can work well with CRM data in Korean retailing industry. Moreover, customer loyalty is impacted by not only unique factor such as net promoter score but also other purchase habits such as expensive goods preferring or multi-branch visiting and so on. We also prove that with retailing customer's dataset the model constructed by SVMs algorithm has given better performance than others. We expect that the models in this study can be used by other retailing companies to classify their customers, then they can focus on giving services to these potential vip group. We also hope that the results of this ML algorithm using R language could be useful to other researchers for selecting appropriate ML algorithms.

농업용저수지의 실시간 수위 보정을 위한 Hampel Filter의 최적 Window Size 분석 (Analysis of the Optimal Window Size of Hampel Filter for Calibration of Real-time Water Level in Agricultural Reservoirs)

  • 주동혁;나라;김하영;최규훈;권재환;유승환
    • 한국농공학회논문집
    • /
    • 제64권3호
    • /
    • pp.9-24
    • /
    • 2022
  • Currently, a vast amount of hydrologic data is accumulated in real-time through automatic water level measuring instruments in agricultural reservoirs. At the same time, false and missing data points are also increasing. The applicability and reliability of quality control of hydrological data must be secured for efficient agricultural water management through calculation of water supply and disaster management. Considering the characteristics of irregularities in hydrological data caused by irrigation water usage and rainfall pattern, the Korea Rural Community Corporation is currently applying the Hampel filter as a water level data quality management method. This method uses window size as a key parameter, and if window size is large, distortion of data may occur and if window size is small, many outliers are not removed which reduces the reliability of the corrected data. Thus, selection of the optimal window size for individual reservoir is required. To ensure reliability, we compared and analyzed the RMSE (Root Mean Square Error) and NSE (Nash-Sutcliffe model efficiency coefficient) of the corrected data and the daily water level of the RIMS (Rural Infrastructure Management System) data, and the automatic outlier detection standards used by the Ministry of Environment. To select the optimal window size, we used the classification performance evaluation index of the error matrix and the rainfall data of the irrigation period, showing the optimal values at 3 h. The efficient reservoir automatic calibration technique can reduce manpower and time required for manual calibration, and is expected to improve the reliability of water level data and the value of water resources.