• Title/Summary/Keyword: 국소가중회귀분석

Search Result 4, Processing Time 0.025 seconds

Correction of Erroneous Individual Vehicle Speed Data Using Locally Weighted Regression (LWR) (국소가중다항회귀분석을 이용한 이상치제거 및 자료보정기법 개발 (GPS를 이용한 개별차량 주행속도를 중심으로))

  • Im, Hui-Seop;O, Cheol;Park, Jun-Hyeong;Lee, Geon-U
    • Journal of Korean Society of Transportation
    • /
    • v.27 no.2
    • /
    • pp.47-56
    • /
    • 2009
  • Effective detection and correction of outliers of raw traffic data collected from the field is of keen interest because reliable traffic information is highly dependent on the quality of raw data. Global positioning system (GPS) based traffic surveillance systems are capable of producing individual vehicle speeds that are invaluable for various traffic management and information strategies. This study proposed a locally weighted regression (LWR) based filtering method for individual vehicle speed data. An important feature of this study was to propose a technique to generate synthetic outliers for more systematic evaluation of the proposed method. It was identified by performance evaluations that the proposed LWR-based method outperformed an exponential smoothing. The proposed method is expected to be effectively utilized for filtering out raw individual vehicle speed data.

A comparison study of inverse censoring probability weighting in censored regression (중도절단 회귀모형에서 역절단확률가중 방법 간의 비교연구)

  • Shin, Jungmin;Kim, Hyungwoo;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.957-968
    • /
    • 2021
  • Inverse censoring probability weighting (ICPW) is a popular technique in survival data analysis. In applications of the ICPW technique such as the censored regression, it is crucial to accurately estimate the censoring probability. A simulation study is undertaken in this article to see how censoring probability estimate influences model performance in censored regression using the ICPW scheme. We compare three censoring probability estimators, including Kaplan-Meier (KM) estimator, Cox proportional hazard model estimator, and local KM estimator. For the local KM estimator, we propose to reduce the predictor dimension to avoid the curse of dimensionality and consider two popular dimension reduction tools: principal component analysis and sliced inverse regression. Finally, we found that the Cox proportional hazard model estimator shows the best performance as a censoring probability estimator in both mean and median censored regressions.

Incremental Ensemble Learning for The Combination of Multiple Models of Locally Weighted Regression Using Genetic Algorithm (유전 알고리즘을 이용한 국소가중회귀의 다중모델 결합을 위한 점진적 앙상블 학습)

  • Kim, Sang Hun;Chung, Byung Hee;Lee, Gun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.9
    • /
    • pp.351-360
    • /
    • 2018
  • The LWR (Locally Weighted Regression) model, which is traditionally a lazy learning model, is designed to obtain the solution of the prediction according to the input variable, the query point, and it is a kind of the regression equation in the short interval obtained as a result of the learning that gives a higher weight value closer to the query point. We study on an incremental ensemble learning approach for LWR, a form of lazy learning and memory-based learning. The proposed incremental ensemble learning method of LWR is to sequentially generate and integrate LWR models over time using a genetic algorithm to obtain a solution of a specific query point. The weaknesses of existing LWR models are that multiple LWR models can be generated based on the indicator function and data sample selection, and the quality of the predictions can also vary depending on this model. However, no research has been conducted to solve the problem of selection or combination of multiple LWR models. In this study, after generating the initial LWR model according to the indicator function and the sample data set, we iterate evolution learning process to obtain the proper indicator function and assess the LWR models applied to the other sample data sets to overcome the data set bias. We adopt Eager learning method to generate and store LWR model gradually when data is generated for all sections. In order to obtain a prediction solution at a specific point in time, an LWR model is generated based on newly generated data within a predetermined interval and then combined with existing LWR models in a section using a genetic algorithm. The proposed method shows better results than the method of selecting multiple LWR models using the simple average method. The results of this study are compared with the predicted results using multiple regression analysis by applying the real data such as the amount of traffic per hour in a specific area and hourly sales of a resting place of the highway, etc.

Setting Criteria of Suitable Site for Southern-type Garlic Using Non-linear Regression Model (비선형회귀 분석을 통한 난지형 마늘의 적지기준 설정연구)

  • Choi, Won Jun;Kim, Yong Seok;Shim, Kyo Moon;Hur, Jina;Jo, Sera;Kang, Mingu
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.366-373
    • /
    • 2021
  • This study attempted to establish a field data-based write analysis standard by analyzing field observation data, which is non-linear data of southern garlic. Five regions, including Goheung, Namhae, Sinan, Changnyeong, and Haenam, were selected for analysis. Observation values for each observation station were extracted from the temperature data of farmland in the region through inverse distance weighted. Southern-type garlic production and temperature data were collected for 10 years, from 2010 to 2019. Local regression analysis (Kernel) of the obtained data was performed, and growth temperatures were analyzed, such as 0.8 (18.781℃), 0.9 (18.930℃), 1.0 (19.542℃), 1.1 (20.165℃), and 1.2 (21.042℃) depending on the bandwidth. The analyzed optimum temperature and the grown temperature (4℃/25℃) were applied to extract the growth temperature for each temperature by using the temperature response model analysis. Regression analysis and correlation analysis were performed between the analyzed growth temperature and production data. The coefficient of determination(R2) was analyzed as 0.325 to 0.438, and in the correlation analysis, the correlation coefficient of 0.57 to 0.66 was analyzed at the significance probability 0.001 level. Overall, as the bandwidth increased, the coefficient of determination was higher. However, in all analyses except bandwidth 1.0, it was analyzed that all variables were not used due to bias. The purpose of this study is to accommodate all data through non-linear data. It was analyzed that bandwidth 1.0 with a high coefficient of determination while accepting modeling as a whole is the most suitable.