• 제목/요약/키워드: regression outlier

검색결과 116건 처리시간 0.022초

어림과 나머지 성분을 이용한 연안 수온자료의 이상자료 감지 (Outlier Detection of the Coastal Water Temperature Monitoring Data Using the Approximate and Detail Components)

  • 조홍연;오지희
    • 한국해양환경ㆍ에너지학회지
    • /
    • 제15권2호
    • /
    • pp.156-162
    • /
    • 2012
  • 연안 환경모니터링 사업이 확대되면서 방대하게 축적되어 있는 연안 환경모니터링 자료의 통계적 분석을 위해서는 모니터링 자료에서 빈번하게 발생하는 이상 자료의 감지 처리가 우선적으로 필요하다. 본 연구에서는 연안 환경모니터링 자료의 어림성분과 나머지(또는 잔차)성분을 이용한 이상자료 진단기법을 제안하였다. 주기함수를 이용한 조화분석 방법과 국지 회귀함수추정 방법을 이용하여 각각 어림성분과 나머지성분을 추출한 후, 추출된 나머지성분 자료에 범용적인 Grubbs 검정기법 및 수정표본점수기법을 적용하여 이상자료를 진단 제거한 후 이상자료가 제거된 자료로 재구성하는 방법이다. 제안된 이 기법을 국립수산과학원 실시간어장정보시스템 제공하는 연안 수온 연속 모니터링 자료에 적용한 결과 이상자료가 성공적으로 제거되는 양상을 보이는 것으로 파악되었다.

다변량 장기 종속 시계열에서의 이상점 탐지 (Outlier detection for multivariate long memory processes)

  • 김경희;유승연;백창룡
    • 응용통계연구
    • /
    • 제35권3호
    • /
    • pp.395-406
    • /
    • 2022
  • 본 논문에서는 장기 종속 다변량 시계열 자료에 대한 이상점 탐지 기법을 연구한다. 기존 다변량 시계열 이상점 탐지 방법은 단기 종속 시계열 모형인 VARMA에 기반한 방법으로, 장기억성을 띈 다변량 시계열 자료에는 적합하지 않다. 자기회귀 모형을 통해서 장기 종속성, 즉 장기억성을 고려하기 위해서는 높은 차수의 모형이 필요하고, 이는 곧 추정의 불안성으로 이어지기에 장기억성을 효율적으로 다룰 수 없기 때문이다. 따라서, 본 논문은 이러한 문제를 보완하고자 VHAR 구조에 기반한 이상점 탐지 방법을 제시하고자 한다. 또한 더욱 정확한 추론을 위해서 로버스트한 방법을 이용하여 VHAR 계수를 추정하였고 이를 활용하여 이상점을 탐지하였다. 모의실험 결과 우리가 제안한 방법론이 기존 VARMA에 기반한 방법론보다 이상점 탐지에 더 효과적임을 살펴볼 수 있었다. 주가지수에 대한 실증자료 분석에서도 기존의 방법론은 탐지하지 못하는 추가 이상점을 찾음을 확인할 수 있었다.

실선에 의한 표류 예측모델에 관한 연구 (Study of estimated model of drift through real ship)

  • 이창헌;김광일;유상록;김민선;한승훈
    • 수산해양기술연구
    • /
    • 제60권1호
    • /
    • pp.57-70
    • /
    • 2024
  • In order to present a predictive drift model, Jeju National University's training ship was tested for about 11 hours and 40 minutes, and 81 samples that selected one of the entire samples at ten-minute intervals were subjected to regression analysis after verifying outliers and influence points. In the outlier and influence point analysis, although there is a part where the wind direction exceeds 1 in the DFBETAS (difference in Betas) value, the CV (cumulative variable) value is 6%, close to 1. Therefore, it was judged that there would be no problem in conducting multiple regression analyses on samples. The standard regression coefficient showed how much current and wind affect the dependent variable. It showed that current speed and direction were the most important variables for drift speed and direction, with values of 47.1% and 58.1%, respectively. The analysis showed that the statistical values indicated the fit of the model at the significance level of 0.05 for multiple regression analysis. The multiple correlation coefficients indicating the degree of influence on the dependent variable were 83.2% and 89.0%, respectively. The determination of coefficients were 69.3% and 79.3%, and the adjusted determination of coefficients were 67.6% and 78.3%, respectively. In this study, a more quantitative prediction model will be presented because it is performed after identifying outliers and influence points of sample data before multiple regression analysis. Therefore, many studies will be active in the future by combining them.

Switching Regression Analysis via Fuzzy LS-SVM

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.609-617
    • /
    • 2006
  • A new fuzzy c-regression algorithm for switching regression analysis is presented, which combines fuzzy c-means clustering and least squares support vector machine. This algorithm can detect outliers in switching regression models while yielding the simultaneous estimates of the associated parameters together with a fuzzy c-partitions of data. It can be employed for the model-free nonlinear regression which does not assume the underlying form of the regression function. We illustrate the new approach with some numerical examples that show how it can be used to fit switching regression models to almost all types of mixed data.

  • PDF

On study for change point regression problems using a difference-based regression model

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권6호
    • /
    • pp.539-556
    • /
    • 2019
  • This paper derive a method to solve change point regression problems via a process for obtaining consequential results using properties of a difference-based intercept estimator first introduced by Park and Kim (Communications in Statistics - Theory Methods, 2019) for outlier detection in multiple linear regression models. We describe the statistical properties of the difference-based regression model in a piecewise simple linear regression model and then propose an efficient algorithm for change point detection. We illustrate the merits of our proposed method in the light of comparison with several existing methods under simulation studies and real data analysis. This methodology is quite valuable, "no matter what regression lines" and "no matter what the number of change points".

유전자 알고리듬을 이용한 다중이상치 탐색

  • 고영현;이혜선;전치혁
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2000년도 추계학술발표회 논문집
    • /
    • pp.173-179
    • /
    • 2000
  • Genetic algorithm(GA) is applied for detecting multiple outliers. GA is a heuristic optimization tool solving for near optimal solution. We compare the performance of GA and the other diagnostic measures commonly used for detecting outliers in regression model. The results show that GA seems to have better performance than the others for the detection of multiple outliers.

  • PDF

2000년 미국대선 플로리다주의 투표결과 분석 (Statistical Outliers in Florida Counties at the Presidential Election 2000)

  • 김현철
    • 응용통계연구
    • /
    • 제15권1호
    • /
    • pp.21-32
    • /
    • 2002
  • We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.

Estimation of irrigation supply from agricultural reservoirs based on reservoir storage data

  • Kang, Hansol;An, Hyunuk;Lee, Kwangya
    • 농업과학연구
    • /
    • 제46권4호
    • /
    • pp.999-1006
    • /
    • 2019
  • Recently, the quantitative management of agricultural water supply, which is the main source for water consumption in Korea, has become more important due to the effective water management organization of the Korean government. In this study, the estimation method for irrigation supply based on agricultural reservoir storage data was improved compared to previous research, in which drought year selection was unclear, and the outlier data for the rainfall-irrigation supply were not eliminated in the regression analysis. In this study, the drought year was selected by the ratio of annual precipitation to mean annual precipitation and the storage rate observed before the start of irrigation. The outlier data for the rainfall-irrigation supply were eliminated by the Grubbs & Beck test. The proposed method was applied to nine agricultural reservoirs for validation. As a result, the ratio of annual precipitation to mean annual precipitation is less than 53% and the storage rate observed before the start of irrigation is less than 55% it was judged to be the drought year. In addition, the drought supply factor, K, was found to be 0.70 on average, showing closer results to the observed reservoir rates. This shows that water management at the real is appling drought year practice. It was shown that the performance of the proposed method was satisfactory with NSE (Nash-Sutcliffe model efficiency coefficient) and R2 (coefficient of determiniation) except for a few cases.

ON THEIL'S METHOD IN FUZZY LINEAR REGRESSION MODELS

  • Choi, Seung Hoe;Jung, Hye-Young;Lee, Woo-Joo;Yoon, Jin Hee
    • 대한수학회논문집
    • /
    • 제31권1호
    • /
    • pp.185-198
    • /
    • 2016
  • Regression analysis is an analyzing method of regression model to explain the statistical relationship between explanatory variable and response variables. This paper propose a fuzzy regression analysis applying Theils method which is not sensitive to outliers. This method use medians of rate of increment based on randomly chosen pairs of each components of ${\alpha}$-level sets of fuzzy data in order to estimate the coefficients of fuzzy regression model. An example and two simulation results are given to show fuzzy Theils estimator is more robust than the fuzzy least squares estimator.

Robust Fuzzy Varying Coefficient Regression Analysis with Crisp Inputs and Gaussian Fuzzy Output

  • Yang, Zhihui;Yin, Yunqiang;Chen, Yizeng
    • Journal of Computing Science and Engineering
    • /
    • 제7권4호
    • /
    • pp.263-271
    • /
    • 2013
  • This study presents a fuzzy varying coefficient regression model after deleting the outliers to improve the feasibility and effectiveness of the fuzzy regression model. The objective of our methodology is to allow the fuzzy regression coefficients to vary with a covariate, and simultaneously avoid the impact of data contaminated by outliers. In this paper, fuzzy regression coefficients are represented by Gaussian fuzzy numbers. We also formulate suitable goodness of fit to evaluate the performance of the proposed methodology. An example is given to demonstrate the effectiveness of our methodology.