• Title/Summary/Keyword: Linear Regression Algorithm

Search Result 282, Processing Time 0.027 seconds

A Bayesian Approach to Detecting Outliers Using Variance-Inflation Model

  • Lee, Sangjeen;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.805-814
    • /
    • 2001
  • The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for outliers problem and also analyze it in linear regression model using a Bayesian approach with the variance-inflation model. We will use Geweke's(1996) ideas which is based on the data augmentation method for detecting outliers in linear regression model. The advantage of the proposed method is to find a subset of data which is most suspicious in the given model by the posterior probability The sampling based approach can be used to allow the complicated Bayesian computation. Finally, our proposed methodology is applied to a simulated and a real data.

  • PDF

RADIOMETRIC RESTORATION OF SHADOW AREAS FROM KOMPSAT-2 IMAGERY

  • Choi, Jae-Wan;Kim, Hye-Jin;Han, You-Kyung;Kim, Yong-II
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.371-374
    • /
    • 2008
  • In very high-spatial resolution remote sensing imagery, it is difficult to extract the feature information of various objects because of occlusion and shadows. Moreover, various and feeble information within shadows can be of use in GIS-based applications and remote sensing analysis. In this paper, we developed a radiometric restoration method for shadow areas using KOMPSAT-2 satellite image. After detecting the shadow, non-shadow pixels nearby are extracted using a morphological filter. An iterative linear regression method is applied to calculate the relationship between shadow and non-shadow pixels. The shadows are restored by the parameters of the linear regression algorithm. Tests show that recovery of shadowed areas by our method leads to improved image quality.

  • PDF

Bayesian Variable Selection in Linear Regression Models with Inequality Constraints on the Coefficients (제한조건이 있는 선형회귀 모형에서의 베이지안 변수선택)

  • 오만숙
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.73-84
    • /
    • 2002
  • Linear regression models with inequality constraints on the coefficients are frequently used in economic models due to sign or order constraints on the coefficients. In this paper, we propose a Bayesian approach to selecting significant explanatory variables in linear regression models with inequality constraints on the coefficients. Bayesian variable selection requires computation of posterior probability of each candidate model. We propose a method which computes all the necessary posterior model probabilities simultaneously. In specific, we obtain posterior samples form the most general model via Gibbs sampling algorithm (Gelfand and Smith, 1990) and compute the posterior probabilities by using the samples. A real example is given to illustrate the method.

A Study on Predicting the demand for Public Shared Bikes using linear Regression

  • HAN, Dong Hun;JUNG, Sang Woo
    • Korean Journal of Artificial Intelligence
    • /
    • v.10 no.1
    • /
    • pp.27-32
    • /
    • 2022
  • As the need for eco-friendly transportation increases due to the deepening climate crisis, many local governments in Korea are introducing shared bicycles. Due to anxiety about public transportation after COVID-19, bicycles have firmly established themselves as the axis of daily transportation. The use of shared bicycles is spread, and the demand for bicycles is increasing by rental offices, but there are operational and management difficulties because the demand is managed under a limited budget. And unfortunately, user behavior results in a spatial imbalance of the bike inventory over time. So, in order to easily operate the maintenance of shared bicycles in Seoul, bicycles should be prepared in large quantities at a time of high demand and withdrawn at a low time. Therefore, in this study, by using machine learning, the linear regression algorithm and MS Azure ML are used to predict and analyze when demand is high. As a result of the analysis, the demand for bicycles in 2018 is on the rise compared to 2017, and the demand is lower in winter than in spring, summer, and fall. It can be judged that this linear regression-based prediction can reduce maintenance and management costs in a shared society and increase user convenience. In a further study, we will focus on shared bike routes by using GPS tracking systems. Through the data found, the route used by most people will be analyzed to derive the optimal route when installing a bicycle-only road.

Model selection algorithm in Gaussian process regression for computer experiments

  • Lee, Youngsaeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.383-396
    • /
    • 2017
  • The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

A Fuzzy Linear Regression Algorithm of Load Forecasting for Holidays (퍼지 선형 회귀분석법을 기반으로 한 특수일 수요예측시스템 개발)

  • Cho, Hyun-Ho;Baek, Young-Sik;Hong, Dug-Hun;Song, Kyung-Bin
    • Proceedings of the KIEE Conference
    • /
    • 2000.07a
    • /
    • pp.298-300
    • /
    • 2000
  • This paper proposes a fuzzy linear regression algorithm based on Tanaka's theory for holiday load forecasting. The load patterns of holidays are quite different from those of ordinary weekdays. It is difficult to accurately forecast the holiday load due to the insufficiency of the load patterns compared with ordinary weekdays. The test results show that the proposed method greatly improves the forecast accuracy for holidays.

  • PDF

An Improved Algorithm of the Daily Peak Load Forecasting fair the Holidays (특수일의 최대 전력수요예측 알고리즘 개선)

  • Song, Gyeong-Bin;Gu, Bon-Seok;Baek, Yeong-Sik
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.51 no.3
    • /
    • pp.109-117
    • /
    • 2002
  • High accuracy of the load forecasting for power systems improves the security of the power system and generation cost. However, the forecasting problem is difficult to handle due to the nonlinear and the random-like behavior of system loads as well as weather conditions and variation of economical environments. So far. many studies on the problem have been made to improve the prediction accuracy using deterministic, stochastic, knowledge based and artificial neural net(ANN) method. In the conventional load forecasting method, the load forecasting maximum error occurred for the holidays on Saturday and Monday. In order to reduce the load forecasting error of the daily peak load for the holidays on Saturday and Monday, fuzzy concept and linear regression theory have been adopted into the load forecasting problem. The proposed algorithm shows its good accuracy that the average percentage errors are 2.11% in 1996 and 2.84% in 1997.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

A New Forest Fire Detection Algorithm using Outlier Detection Method on Regression Analysis between Surface temperature and NDVI

  • Huh, Yong;Byun, Young-Gi;Son, Jeong-Hoon;Yu, Ki-Yun;Kim, Yong-Il
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.574-577
    • /
    • 2006
  • In this paper, we developed a forest fire detection algorithm which uses a regression function between NDVI and land surface temperature. Previous detection algorithms use the land surface temperature as a main factor to discriminate fire pixels from non-fire pixels. These algorithms assume that the surface temperatures of non-fire pixels are intrinsically analogous and obey Gaussian normal distribution, regardless of land surface types and conditions. And the temperature thresholds for detecting fire pixels are derived from the statistical distribution of non-fire pixels’ temperature using heuristic methods. This assumption makes the temperature distribution of non-fire pixels very diverse and sometimes slightly overlapped with that of fire pixel. So, sometimes there occur omission errors in the cases of small fires. To ease such problem somewhat, we separated non-fire pixels into each land cover type by clustering algorithm and calculated the residuals between the temperature of a pixel under examination whether fire pixel or not and estimated temperature of the pixel using the linear regression between surface temperature and NDVI. As a result, this algorithm could modify the temperature threshold considering land types and conditions and showed improved detection accuracy.

  • PDF

A Fault Detection of Cyclic Signals Using Support Vector Machine-Regression (Support Vector Machine-Regression을 이용한 주기신호의 이상탐지)

  • Park, Seung-Hwan;Kim, Jun-Seok;Park, Cheong-Sool;Kim, Sung-Shick;Baek, Jun-Geol
    • Journal of Korean Society for Quality Management
    • /
    • v.38 no.3
    • /
    • pp.354-362
    • /
    • 2010
  • This paper presents a non-linear control chart based on support vector machine regression (SVM-R) to improve the accuracy of fault detection of cyclic signals. The proposed algorithm consists of the following two steps. First, the center line of the control chart is constructed by using SVM-R. Second, we calculate control limits by variances that are estimated by perpendicular and normal line of the center line. For performance evaluation, we apply proposed algorithm to the industrial data of the chemical vapor deposition process which is one of the semiconductor processes. The proposed method has better fault detection performance than other existing method