• Title/Summary/Keyword: removal of outliers

Search Result 21, Processing Time 0.027 seconds

Development of Removal Techniques for PRC Outlier & Noise to Improve NDGPS Accuracy (국토해양부 NDGPS 정확도 향상을 위한 의사거리 보정치의 이상점 및 노이즈 제거기법 개발)

  • Kim, Koon-Tack;Kim, Hye-In;Park, Kwan-Dong
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.2
    • /
    • pp.63-73
    • /
    • 2011
  • The Pseudorange Corrections (PRC), which are used in DGPS as calibration messages, can contain outliers, noise, and anomalies, and these abnormal events are unpredictable. When those irregular PRC are used, the positioning error gets higher. In this paper, we propose a strategy of detecting and correcting outliers, noise, and anomalies by modeling the changing pattern of PRC through polynomial curve fitting techniques. To validate our strategy, we compared positioning errors obtained without PRC calibation with those with PRC calibration. As a result, we found that our algorithm performs very well; the horizontal RMS error was 3.84 m before the correction and 1.49 m after the correction.

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

A Study on the Modified Adaptive MMSE Filtering for Mixed-Noise Elimination in Image Signals (영상신호에서의 복합 잡음 제거를 위한 수정된 적응 MMSE 필터링에 관한 연구)

  • Lee, Je-Il;Kim, Sung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.70-76
    • /
    • 1996
  • In the case of an image corrupted with mixed noise, conventional MMSE filter can not remove such a mixed noise properly, because the impulse moise cause a certain bias of the minimum mean-square error estimate at regions close to outliers. In this paper, we proposed the new method or removal of mixed noise by combining MMSE filtering structure with local multi-windowing method according to directions and with ranked-order method. As a result, the improvement of the image quality with the proposed was obtained between about 9.7 and 35.2 times in the sense of NMSE(normalized mean square errors) evaluation than that of MMSE filter. Also, we could obtain the enhanced image in the mixed noisy image from visual and quantitative aspect.

  • PDF

A Prediction Method of Learning Outcomes based on Regression Model for Effective Peer Review Learning (효율적인 피어리뷰 학습을 위한 회귀 모델 기반 학습성과 예측 방법)

  • Shin, Hyo-Joung;Jung, Hye-Wuk;Cho, Kwang-Su;Lee, Jee-Hyoung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.5
    • /
    • pp.624-630
    • /
    • 2012
  • The peer review learning is a method which improves learning outcome of students through feedback between students and the observation and analysis of other students. One of the important problems in a peer review system is to find proper evaluators to each learner considering characteristics of students for improving learning outcomes. Some of peer review systems randomly assign peer review evaluators to learners, or chose evaluators based on limited strategies. However, these systems have a problem that they do not consider various characteristics of learners and evaluators who participate in peer reviews. In this paper, we propose a novel prediction approach of learning outcomes to apply peer review systems considering various characteristics of learners and evaluators. The proposed approach extracts representative attributes from the profiles of students and predicts learning outcomes using various regression models. In order to verify how much outliers affect on the prediction of learning outcomes, we also apply several outlier removal methods to the regression models and compare the predictive performance of learning outcomes. The experiment result says that the SVR model which does not removes outliers shows an error rate of 0.47% on average and has the best predictive performance.

Background Removal and ROI Segmentation Algorithms for Chest X-ray Images (흉부 엑스레이 영상에서 배경 제거 및 관심영역 분할 기법)

  • Park, Jin Woo;Song, Byung Cheol
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.11
    • /
    • pp.105-114
    • /
    • 2015
  • This paper proposes methods to remove background area and segment region of interest (ROI) in chest X-ray images. Conventional algorithms to improve detail or contrast of images normally utilize brightness and frequency information. If we apply such algorithms to the entire images, we cannot obtain reliable visual quality due to unnecessary information such as background area. So, we propose two effective algorithms to remove background and segment ROI from the input X-ray images. First, the background removal algorithm analyzes the histogram distribution of the input X-ray image. Next, the initial background is estimated by a proper thresholding on histogram domain, and it is removed. Finally, the body contour or background area is refined by using a popular guided filter. On the other hand, the ROI, i.e., lung segmentation algorithm first determines an initial bounding box using the lung's inherent location information. Next, the main intensity value of the lung is computed by vertical cumulative sum within the initial bounding box. Then, probable outliers are removed by using a specific labeling and the pre-determined background information. Finally, a bounding box including lung is obtained. Simulation results show that the proposed background removal and ROI segmentation algorithms outperform the previous works.

Development and Validation of Multiple Regression Models for the Prediction of Effluent Concentration in a Sewage Treatment Process (하수처리장 방류수 수질예측을 위한 다중회귀분석 모델 개발 및 검증)

  • Min, Sang-Yun;Lee, Seung-Pil;Kim, Jin-Sik;Park, Jong-Un;Kim, Man-Soo
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.34 no.5
    • /
    • pp.312-315
    • /
    • 2012
  • In this study, the model which can predict the quality of effluent has been implemented through multiple regression analysis to use operation data of a sewage treatment plant, to which a media process is applied. Multiple regression analysis were carried out by cases according to variable selection method, removal of outliers and log transformation of variables, with using data of one year of 2011. By reviewing the results of predictable models, the accuracy of prediction for $COD_{Mn}$ of treated water of secondary clarifiers was over 0.87 and for T-N was over 0.81. Using this model, it is expected to set the range of operating conditions that do not exceed the standards of effluent quality. In conclusion, the proper guidance on the effluent quality and energy costs within the operating range is expected to be provided to operators.

A Study on BASINS/WinHSPF for Evaluation of Non-point Source Reduction Efficiency in the Upstream of Nam-Han River Watershed (BASINS/WinHSPF를 이용한 남한강 상류 유역의 비점오염원 저감효율평가)

  • Yoon, Chun-Gyeong;Shin, Ah-Hyun;Jung, Kwang-Wook;Jang, Jae-Ho
    • Journal of Korean Society on Water Environment
    • /
    • v.23 no.6
    • /
    • pp.951-960
    • /
    • 2007
  • Window interface to Hydrological Simulation Program-FORTRAN (WinHSPF) developed by the United States Environmental Protection Agency (EPA) was applied to the upstream of Nam-Han river watershed to examine its applicability for loading estimates in watershed scale and to evaluate non-point source control scenarios using BMPRAC in WinHSPF. The WinHSPF model was calibrated and verified for water flow using Ministry of Construction and Transportation (MOCT, 3 stations, 2003~2005) and water qualities using Ministry of Environment (MOE, 5 station, 2000~2006). Water flow and water quality simulation results were also satisfactory over the total simulation period. But outliers were occurred in the time series data of TN and TP at some regions and periods. Therefore, it required more profit calibration process for more various parameters. As a result, all the study was performed within the expectation considering the complexity of the watershed, pollutant sources and land uses intermixed in the watershed. The estimated pollutant load for annual average about $BOD_5$, T-N and T-P respectively. Nonpoint source loading had a great portion of total pollutant loading, about 86.5~95.2%. In WinHSPF, BMPRAC was applied to evaluate non-point source control scenarios (constructed wetland, wet detention ponds and infiltration basins). All the scenarios showed efficiency of non-point source removal. Overall, the HSPF model is adequate for simulating watersheds characteristics, and its application is recommended for watershed management and evaluation of best management practices.

A Study on Calculation of Sectional Travel Speeds of the Interrupted Traffic Flow with the Consideration of the Characteristics of Probe Data (프로브 자료의 특성을 고려한 단속류의 구간 통행속도 산출에 관한 연구)

  • Jeong, Yeon Tak;Jung, Hun Young
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.6
    • /
    • pp.1851-1861
    • /
    • 2014
  • This study aims to calculate reliable sectional travel speeds with the consideration of the characteristics of the probe data collected in the interrupted traffic flow. First, in order to analysis the characteristics of the probe data, we looked into the distribution of the sectional travel times of each probe vehicle and compared the difference in the sectional travel speeds of each probe vehicle collected by DSRC. As a result, it is shown that outliers should be removed for the distribution of the sectional travel times. However, The comparison results show that the sectional travel speeds from the DSRC probe vehicles are not significantly different. Finally, based on the distribution characteristics of the sectional travel speeds of each probe vehicle and the representative values counted during a collection period, we drew the optimal outlier removal procedure and evaluated the estimation errors. The evaluation results showed that the DSRC sectional travel speeds were found to be similar to the observed values from actually running vehicles. On the contrary, in the case of the sectional travel speeds of intra-city bus, it was analyzed that they were less accurate than the DSRC sectional travel speeds. In the future, it will be necessary to improve BIS process and make use of the travel information on intra-city buses collected in real time to find various ways of applying it as traffic information.

Comparative analysis of wavelet transform and machine learning approaches for noise reduction in water level data (웨이블릿 변환과 기계 학습 접근법을 이용한 수위 데이터의 노이즈 제거 비교 분석)

  • Hwang, Yukwan;Lim, Kyoung Jae;Kim, Jonggun;Shin, Minhwan;Park, Youn Shik;Shin, Yongchul;Ji, Bongjun
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.3
    • /
    • pp.209-223
    • /
    • 2024
  • In the context of the fourth industrial revolution, data-driven decision-making has increasingly become pivotal. However, the integrity of data analysis is compromised if data quality is not adequately ensured, potentially leading to biased interpretations. This is particularly critical for water level data, essential for water resource management, which often encounters quality issues such as missing values, spikes, and noise. This study addresses the challenge of noise-induced data quality deterioration, which complicates trend analysis and may produce anomalous outliers. To mitigate this issue, we propose a noise removal strategy employing Wavelet Transform, a technique renowned for its efficacy in signal processing and noise elimination. The advantage of Wavelet Transform lies in its operational efficiency - it reduces both time and costs as it obviates the need for acquiring the true values of collected data. This study conducted a comparative performance evaluation between our Wavelet Transform-based approach and the Denoising Autoencoder, a prominent machine learning method for noise reduction.. The findings demonstrate that the Coiflets wavelet function outperforms the Denoising Autoencoder across various metrics, including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Squared Error (MSE). The superiority of the Coiflets function suggests that selecting an appropriate wavelet function tailored to the specific application environment can effectively address data quality issues caused by noise. This study underscores the potential of Wavelet Transform as a robust tool for enhancing the quality of water level data, thereby contributing to the reliability of water resource management decisions.

The Consideration on Calculation of Optimal Travel Speeds based on Analysis of AVI Data (AVI 수집 자료 분석에 근거한 최적 통행속도 산출에 관한 고찰)

  • Jeong, Yeon Tak;Jung, Hun Young
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.35 no.3
    • /
    • pp.625-637
    • /
    • 2015
  • This study aims to calculate optimal travel speeds based on analysis of the AVI data collected in the uninterrupted traffic flow, and the results are as follows. Firstly, we looked into the distribution of the sectional travel times of each probe vehicle and compared the difference in the sectional travel speeds of each probe vehicle. As a result, it is shown that outliers should be removed for the distribution of the sectional travel times. Secondly, there were differences among type 1(passenger automobiles) & type 2(automobiles for passengers and freight) and type 4(special automobiles) in the non-congestion section. thus it was revealed that there is a necessity to remove type 4(special automobiles) when calculating the sectional travel speeds. Thirdly, Based on the results of these, the optimal outlier removal procedures were applied to this study. As a result, it showed that the MAPE was between 0.3% and 2.0% and RMSE was between 0.3 and 2.3 which are very similar figures to the actual average traffic speed. Also, the minimum sample size was satisfied at the confidence level of 95%. The result of study is expected to serve as a useful basis for the local government to build the AVI. In the future, it will be necessary to study to integrate AVI data and other data for more accurate traffic information.