• 제목/요약/키워드: multiple outliers

검색결과 80건 처리시간 0.022초

Bootstrapping Regression Residuals

  • Imon, A.H.M. Rahmatullah;Ali, M. Masoom
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.665-682
    • /
    • 2005
  • The sample reuse bootstrap technique has been successful to attract both applied and theoretical statisticians since its origination. In recent years a good deal of attention has been focused on the applications of bootstrap methods in regression analysis. It is easier but more accurate computation methods heavily depend on high-speed computers and warrant tough mathematical justification for their validity. It is now evident that the presence of multiple unusual observations could make a great deal of damage to the inferential procedure. We suspect that bootstrap methods may not be free from this problem. We at first present few examples in favour of our suspicion and propose a new method diagnostic-before-bootstrap method for regression purpose. The usefulness of our newly proposed method is investigated through few well-known examples and a Monte Carlo simulation under a variety of error and leverage structures.

  • PDF

Resampling-based Test of Hypothesis in L1-Regression

  • Kim, Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • 제11권3호
    • /
    • pp.643-655
    • /
    • 2004
  • L$_1$-estimator in the linear regression model is widely recognized to have superior robustness in the presence of vertical outliers. While the L$_1$-estimation procedures and algorithms have been developed quite well, less progress has been made with the hypothesis test in the multiple L$_1$-regression. This article suggests computer-intensive resampling approaches, jackknife and bootstrap methods, to estimating the variance of L$_1$-estimator and the scale parameter that are required to compute the test statistics. Monte Carlo simulation studies are performed to measure the power of tests in small samples. The simulation results indicate that bootstrap estimation method is the most powerful one when it is employed to the likelihood ratio test.

한국 프로스포츠 선수들의 연봉에 대한 다변량적 분석 (A Multivariate Analysis of Korean Professional Players Salary)

  • 송종우
    • 응용통계연구
    • /
    • 제21권3호
    • /
    • pp.441-453
    • /
    • 2008
  • 프로스포츠 선수들의 연봉은 선수들의 개인 성적과 팀에 대한 기여도 등으로 결정된다는 가정하에 프로농구와 프로야구 선수들의 전년도 성적으로 다음해 연봉을 예측 분석하였다. 분석에 있어서 data visualization 기법을 통해 변수사이의 관계, 이상점 발견, 모형진단등을 하였다. 다중선형회귀 모형(Multiple Linear Regression)과 트리모형(Regression Tree)을 이용해서 자료를 분석하고 모델간 비교를 했으며, Cross-Validation을 이용해서 최적모델을 선택하였다. 특히, 자동으로 변수선택을 하는 stepwise regression방법을 그냥 사용하기보다는 먼저 설명변수들 사이의 관계나 설명변수와 반응변수 사이의 관계등을 조사하고 나서 이를 통해 선택된 변수들을 가지고 stepwise regression과 regression tree 방법론을 이용해서 적절한 변수 및 최종 모형을 선택하였다. 분석결과, 프로농구의 경우에는 경기당 득점, 어시스트, 자유투 성공수, 경력 등이 중요한 변수였고, 프로야구 투수의 경우에는 경력, 9이닝 당 삼진 수, 방어율, 피홈런 수 등이 중요한 변수였고, 프로야구 타자의 경우에는 경력, 안타 수, FA(자유계약)유무 여부 등이 중요한 변수였다.

하수처리장 방류수 수질예측을 위한 다중회귀분석 모델 개발 및 검증 (Development and Validation of Multiple Regression Models for the Prediction of Effluent Concentration in a Sewage Treatment Process)

  • 민상윤;이승필;김진식;박종운;김만수
    • 대한환경공학회지
    • /
    • 제34권5호
    • /
    • pp.312-315
    • /
    • 2012
  • 본 연구는 Media공법을 적용하고 있는 하수처리장의 실데이터를 활용하여 다중회귀분석을 통해 유출수질을 예측하는 모형을 구현하였다. 다중회귀분석은 2011년 1년간 데이터를 사용하였으며, 변수선택법 적용, 이상치와 영향치 제거, 변수의 로그변환에 따른 CASE별 연구를 수행하였다. 다중회귀분석으로 구축된 예측 모형으로 예측정확도를 검토한 결과, 2차침전지 유출수 $COD_{Mn}$는 0.87 이상, T-N은 0.81 이상으로 검토되었으며, 구축된 다중회귀모형을 이용하여 유출수가 방류수질기준을 초과하지 않는 운전조건의 범위를 설정할 수 있을 것으로 판단된다. 결론적으로 설정된 운전조건 범위 안에서 수질측면과 에너지 비용측면으로 하수처리장 운영시 운전자에게 적절한 운전 가이드를 제공할 수 있을 것으로 판단된다.

Detection of multi-type data anomaly for structural health monitoring using pattern recognition neural network

  • Gao, Ke;Chen, Zhi-Dan;Weng, Shun;Zhu, Hong-Ping;Wu, Li-Ying
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.129-140
    • /
    • 2022
  • The effectiveness of system identification, damage detection, condition assessment and other structural analyses relies heavily on the accuracy and reliability of the measured data in structural health monitoring (SHM) systems. However, data anomalies often occur in SHM systems, leading to inaccurate and untrustworthy analysis results. Therefore, anomalies in the raw data should be detected and cleansed before further analysis. Previous studies on data anomaly detection mainly focused on just single type of data anomaly for denoising or removing outliers, meanwhile, the existing methods of detecting multiple data anomalies are usually time consuming. For these reasons, recognising multiple anomaly patterns for real-time alarm and analysis in field monitoring remains a challenge. Aiming to achieve an efficient and accurate detection for multi-type data anomalies for field SHM, this study proposes a pattern-recognition-based data anomaly detection method that mainly consists of three steps: the feature extraction from the long time-series data samples, the training of a pattern recognition neural network (PRNN) using the features and finally the detection of data anomalies. The feature extraction step remarkably reduces the time cost of the network training, making the detection process very fast. The performance of the proposed method is verified on the basis of the SHM data of two practical long-span bridges. Results indicate that the proposed method recognises multiple data anomalies with very high accuracy and low calculation cost, demonstrating its applicability in field monitoring.

청소년의 자아존중감에 미치는 영향 요인 (Influencing Factors on Self-Esteem in Adolescents)

  • 한상숙;김경미
    • 대한간호학회지
    • /
    • 제36권1호
    • /
    • pp.37-44
    • /
    • 2006
  • Purpose: This research has been conducted in order to understand the major factors that affect self-esteem of adolescents. Methods: Data was collected by questionnaires from 1155 students at middle and high school in Seoul and Kyungkido, Korea. The Instrument tools utilized in this study were self-esteem, body-image, problematic behavior, depression, school adjustment, social support tool and thoroughly modified to verify validity and reliability. The collected data have been analyzed using SPSS 11.0 program. The variable of family harmony and counseling partner was treated as a dummy variable. Seven outliers which were bigger than 3 in absolute value were found, so after taking them off, Multiple Regression was used for further analysis. Result: The major factors that affect self-esteem of adolescents were depression, social support, body-image, problematic behavior, school adjustment, and family harmony, which explained $54.7\%$ of self-esteem. Conclusion: It has been confirmed that the regression equation model of this research may serve as a self-esteem prediction factors in adolescents.

커피전문점의 물리적 환경이 브랜드 충성도에 미치는 영향: 고객만족과 감정 반응의 매개 효과 비교를 중심으로 (The Effects of Physical Environment in Coffee Shops on Customer Brand Loyalty: With a Focus on the Comparison between Mediating Effects of Customer Satisfaction and Emotional Responses)

  • 김수진;이형룡
    • 동아시아식생활학회지
    • /
    • 제21권4호
    • /
    • pp.609-624
    • /
    • 2011
  • The purpose of this study was to examine the physical environmental factors in coffee shops which determine customer brand loyalty, and to investigate the mediated effects of customer satisfaction and emotional responses on the causal relationship between the physical environmental factors and brand loyalty. A sample of 400 coffee shop customers was collected from Seoul and Gyeonggi in March, 2011 through a self-administered questionnaire. 351 of 400 subjects were used for validity and reliability analysis. 12 outliers were removed from the analysis, and 339 subjects were used to derive the results. Multiple linear regression and stepwise regression were conducted after the construct validity and reliability. The results can be summarized as follows: (1) Physical environmental factors in coffee shops consists of 5 dimensions such as facility aesthetics, cleanliness, ambiance, layout, and internet environment. (2) Facility aesthetics, ambiance, and internet environment had an influence on brand loyalty. (3) The effects of cleanliness and layout on brand loyalty, were not significant on multivariate analysis. However, the relationship between cleanliness and brand loyalty was mediated by emotional responses and also the relationship between layout and brand loyalty was mediated by customer satisfaction. (4) The mediating effects of customer satisfaction were higher than those of emotional responses.

평면 추출셀과 반복적 랜덤하프변환을 이용한 다중 평면영역 분할 방법 (A Method to Detect Multiple Plane Areas by using the Iterative Randomized Hough Transform(IRHT) and the Plane Detection)

  • 임성조;김대광;강동중
    • 전기학회논문지
    • /
    • 제57권11호
    • /
    • pp.2086-2094
    • /
    • 2008
  • Finding a planar surface on 3D space is very important for efficient and safe operation of a mobile robot. In this paper, we propose a method using a plane detection cell (PDC) and iterative randomized Hough transform (IRHT) for finding the planar region from a 3D range image. First, the local planar region is detected by a PDC from the target area of the range image. Each plane is then segmented by analyzing the accumulated peaks from voting the local direction and position information of the local PDC in Hough space to reduce effect of noises and outliers and improve the efficiency of the HT. When segmenting each plane region, the IRHT repeatedly decreases the size of the planar region used for voting in the Hough parameter space in order to reduce the effect of noise and solve the local maxima problem in the parameter space. In general, range images have many planes of different normal directions. Hence, we first detected the largest plane region and then the remained region is again processed. Through this procedure, we can segment all planar regions of interest in the range image.

다변량 통계분석을 이용한 서울시 고농도 오존의 예측에 관한 연구 (Prediction of High Level Ozone Concentration in Seoul by Using Multivariate Statistical Analyses)

  • 허정숙;김동술
    • 한국대기환경학회지
    • /
    • 제9권3호
    • /
    • pp.207-215
    • /
    • 1993
  • In order to statistically predict $O_3$ levels in Seoul, the study used the TMS (telemeted air monitoring system) data from the Department of Environment, which have monitored at 20 sites in 1989 and 1990. Each data in each site was characterized by 6 major criteria pollutants ($SO_2, TSP, CO, NO_2, THC, and O_3$) and 2 meteorological parameters, such as wind speed and wind direction. To select proper variables and to determine each pollutant's behavior, univariate statistical analyses were extensively studied in the beginning, and then various applied statistical techniques like cluster analysis, regression analysis, and expert system have been intensively examined. For the initial study of high level $O_3$ prediction, the raw data set in each site was separated into 2 group based on 60 ppb $O_3$ level. A hierarchical cluster analysis was applied to classify the group based on 60 ppb $O_3$ into small calsses. Each class in each site has its own pattern. Next, multiple regression for each class was repeatedly applied to determine an $O_3$ prediction submodel and to determine outliers in each class based on a certain level of standardized redisual. Thus, a prediction submodel for each homogeneous class could be obtained. The study was extended to model $O_3$ prediction for both on-time basis and 1-hr after basis. Finally, an expect system was used to build a unified classification rule based on examples of the homogenous classes for all of sites. Thus, a concept of high level $O_3$ prediction model was developed for one of $O_3$ alert systems.

  • PDF

A graph-based method for fitting planar B-spline curves with intersections

  • Bon, Pengbo;Luo, Gongning;Wang, Kuanquan
    • Journal of Computational Design and Engineering
    • /
    • 제3권1호
    • /
    • pp.14-23
    • /
    • 2016
  • The problem of fitting B-spline curves to planar point clouds is studied in this paper. A novel method is proposed to deal with the most challenging case where multiple intersecting curves or curves with self-intersection are necessary for shape representation. A method based on Delauney Triangulation of data points is developed to identify connected components which is also capable of removing outliers. A skeleton representation is utilized to represent the topological structure which is further used to create a weighted graph for deciding the merging of curve segments. Different to existing approaches which utilize local shape information near intersections, our method considers shape characteristics of curve segments in a larger scope and is thus capable of giving more satisfactory results. By fitting each group of data points with a B-spline curve, we solve the problems of curve structure reconstruction from point clouds, as well as the vectorization of simple line drawing images by drawing lines reconstruction.