Search | Korea Science

Clustering Observations for Detecting Multiple Outliers in Regression Models

Seo, Han-Son;Yoon, Min
- The Korean Journal of Applied Statistics
- /
- v.25 no.3
- /
- pp.503-512
- /
- 2012
Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.
https://doi.org/10.5351/KJAS.2012.25.3.503 인용 PDF KSCI

Multivariate statistical analysis of the comparative antioxidant activity of the total phenolics and tannins in the water and ethanol extracts of dried goji berry (Lycium chinense) fruits

Kim, Joo-Shin;Kimm, Haklin Alex
- Korean Journal of Food Science and Technology
- /
- v.51 no.3
- /
- pp.227-236
- /
- 2019
Antioxidant activity in water and ethanol extracts of dried Lycium chinense fruit, as a result of the total phenolic and tannin content, was measured using a number of chemical and biochemical assays for radical scavenging and inhibition of lipid peroxidation, with the analysis being extended by applying a bootstrapping statistical method. Previous statistical analyses mostly provided linear correlation and regression analyses between antioxidant activity and increasing concentrations of phenolics and tannins in a concentration-dependent mode. The present study showed that multiple component or multivariate analysis by applying multiple regression analysis or regression planes proved more informative than linear regression analysis of the relationship between the concentration of individual components and antioxidant activity. In this paper, we represented the multivariate analysis of antioxidant activities of both phenolic and tannin contents combined in the water and ethanol extracts, which revealed the hidden observations that were not evident from linear statistical analysis.
https://doi.org/10.9721/KJFST.2019.51.3.227 인용 PDF KSCI HTML

Orographic Precipitation Analysis with Regional Frequency Analysis and Multiple Linear Regression (지역빈도해석 및 다중회귀분석을 이용한 산악형 강수해석)

Yun, Hye-Seon;Um, Myoung-Jin;Cho, Won-Cheol;Heo, Jun-Haeng
- Journal of Korea Water Resources Association
- /
- v.42 no.6
- /
- pp.465-480
- /
- 2009
In this study, single and multiple linear regression model were used to derive the relationship between precipitation and altitude, latitude and longitude in Jejudo. The single linear regression analysis was focused on whether orographic effect was existed in Jejudo by annual average precipitation, and the multiple linear regression analysis on whether orographic effect was applied to each duration and return period of quantile from regional frequency analysis by index flood method. As results of the regression analysis, it shows the relationship between altitude and precipitation strongly form a linear relationship as the length of duration and return period increase. The multiple linear regression precipitation estimates(which used altitude, latitude, and longitude information) were found to be more reasonable than estimates obtained using altitude only or altitude-latitude and altitude-longitude. Especially, as results of spatial distribution analysis by kriging method using GIS, it also provides realistic estimates for precipitation that the precipitation was occurred the southeast region as real climate of Jejudo. However, the accuracy of regression model was decrease which derived a short duration of precipitation or estimated high region precipitation even had long duration. Consequently the other factor caused orographic effect would be needed to estimate precipitation to improve accuracy.
https://doi.org/10.3741/JKWRA.2009.42.6.465 인용 PDF KSCI

Autocovariance based estimation in the linear regression model (선형회귀 모형에서 자기공분산 기반 추정)

Park, Cheol-Yong
- Journal of the Korean Data and Information Science Society
- /
- v.22 no.5
- /
- pp.839-847
- /
- 2011
In this study, we derive an estimator based on autocovariance for the regression coefficients vector in the multiple linear regression model. This method is suggested by Park (2009), and although this method does not seem to be intuitively attractive, this estimator is unbiased for the regression coefficients vector. When the vectors of exploratory variables satisfy some regularity conditions, under mild conditions which are satisfied when errors are from autoregressive and moving average models, this estimator has asymptotically the same distribution as the least squares estimator and also converges in probability to the regression coefficients vector. Finally we provide a simulation study that the forementioned theoretical results hold for small sample cases.
PDF KSCI

A Study on Defect Diagnostics for Health Monitoring of a Turbo-Shaft Engine for SUAV (스마트 무인기용 터보축 엔진의 성능진단을 위한 결함 예측에 관한 연구)

Park Juncheol;Roh Taeseong;Choi Dongwhan
- Proceedings of the Korean Society of Propulsion Engineers Conference
- /
- v.y2005m4
- /
- pp.248-251
- /
- 2005
In this paper, health monitoring technique has been studied for performance deterioration caused by the defects of the gas turbine. The parameters for performance diagnostics have been extracted by using GSP program for modeling the target engine. The virtual sensor model for the health monitoring has been built of those data. The position and magnitude of the defects of the engine components have been determined by using Multiple Linear Regression technique and the method using the weight in order to diagnose the single and multiple defects.
PDF

Determination of Research Octane Number using NIR Spectral Data and Ridge Regression

Jeong, Ho Il;Lee, Hye Seon;Jeon, Ji Hyeok
- Bulletin of the Korean Chemical Society
- /
- v.22 no.1
- /
- pp.37-42
- /
- 2001
Ridge regression is compared with multiple linear regression (MLR) for determination of Research Octane Number (RON) when the baseline and signal-to-noise ratio are varied. MLR analysis of near-infrared (NIR) spectroscopic data usually encounters a collinearity problem, which adversely affects long-term prediction performance. The collinearity problem can be eliminated or greatly improved by using ridge regression, which is a biased estimation method. To evaluate the robustness of each calibration, the calibration models developed by both calibration methods were used to predict RONs of gasoline spectra in which the baseline and signal-to-noise ratio were varied. The prediction results of a ridge calibration model showed more stable prediction performance as compared to that of MLR, especially when the spectral baselines were varied. . In conclusion, ridge regression is shown to be a viable method for calibration of RON with the NIR data when only a few wavelengths are available such as hand-carry device using a few diodes.
https://doi.org/10.5012/bkcs.2001.22.1.37 인용 PDF

Motion estimation method using multiple linear regression model (다중선형회귀모델을 이용한 움직임 추정방법)

김학수;임원택;이재철;이규원;박규택
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.34S no.10
- /
- pp.98-103
- /
- 1997
Given the small bit allocation for motion information in very low bit-rate coding, motion estimation using the block matching algorithm(BMA) fails to maintain an acceptable level of prediction errors. The reson is that the motion model, or spatial transformation, assumed in block matching cannot approximate the motion in the real world precisely with a small number of parameters. In order to overcome the drawback of the conventional block matching algorithm, several triangle-based methods which utilize triangular patches insead of blocks have been proposed. To estimate the motions of image sequences, these methods usually have been based on the combination of optical flow equation, affine transform, and iteration. But the compuataional cost of these methods is expensive. This paper presents a fast motion estimation algorithm using a multiple linear regression model to solve the defects of the BMA and the triange-based methods. After describing the basic 2-D triangle-based method, the details of the proposed multiple linear regression model are presented along with the motion estimation results from one standard video sequence, representative of MPEG-4 class A data. The simulationresuls show that in the proposed method, the average PSNR is improved about 1.24 dB in comparison with the BMA method, and the computational cost is reduced about 25% in comparison with the 2-D triangle-based method.
PDF

MapReduce-based Localized Linear Regression for Electricity Price Forecasting (전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델)

Han, Jinju;Lee, Ingyu;On, Byung-Won
- The Transactions of the Korean Institute of Electrical Engineers P
- /
- v.67 no.4
- /
- pp.183-190
- /
- 2018
Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.
https://doi.org/10.5370/KIEEP.2018.67.4.183 인용 PDF KSCI

Subset selection in multiple linear regression: An improved Tabu search

Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
- Journal of Advanced Marine Engineering and Technology
- /
- v.40 no.2
- /
- pp.138-145
- /
- 2016
This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.
https://doi.org/10.5916/jkosme.2016.40.2.138 인용 PDF KSCI

On study for change point regression problems using a difference-based regression model

Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
- Communications for Statistical Applications and Methods
- /
- v.26 no.6
- /
- pp.539-556
- /
- 2019
This paper derive a method to solve change point regression problems via a process for obtaining consequential results using properties of a difference-based intercept estimator first introduced by Park and Kim (Communications in Statistics - Theory Methods, 2019) for outlier detection in multiple linear regression models. We describe the statistical properties of the difference-based regression model in a piecewise simple linear regression model and then propose an efficient algorithm for change point detection. We illustrate the merits of our proposed method in the light of comparison with several existing methods under simulation studies and real data analysis. This methodology is quite valuable, "no matter what regression lines" and "no matter what the number of change points".
https://doi.org/10.29220/CSAM.2019.26.6.539 인용 PDF KSCI

Search Result 452, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)