• 제목/요약/키워드: Additive regression models

검색결과 67건 처리시간 0.028초

지구 통계 모형을 이용한 양파 재배지 농업기상정보 생성 방법 (Production of Agrometeorological Information in Onion Fields using Geostatistical Models)

  • 임지은;윤상후
    • 한국환경과학회지
    • /
    • 제27권7호
    • /
    • pp.509-518
    • /
    • 2018
  • Weather is the most influential factor for crop cultivation. Weather information for cultivated areas is necessary for growth and production forecasting of agricultural crops. However, there are limitations in the meteorological observations in cultivated areas because weather equipment is not installed. This study tested methods of predicting the daily mean temperature in onion fields using geostatistical models. Three models were considered: inverse distance weight method, generalized additive model, and Bayesian spatial linear model. Data were collected from the AWS (automatic weather system), ASOS (automated synoptic observing system), and an agricultural weather station between 2013 and 2016. To evaluate the prediction performance, data from AWS and ASOS were used as the modeling data, and data from the agricultural weather station were used as the validation data. It was found that the Bayesian spatial linear regression performed better than other models. Consequently, high-resolution maps of the daily mean temperature of Jeonnam were generated using all observed weather information.

통계적모형을 통한 고해상도 일별 평균기온 산정 (Generating high resolution of daily mean temperature using statistical models)

  • 윤상후
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권5호
    • /
    • pp.1215-1224
    • /
    • 2016
  • 고해상도 격자 단위 기후정보는 농업, 관광학, 생태학, 질병학 등 다양한 분야의 현상을 설명하는 중요 요인이다. 고해상도 기후정보는 동적 모형과 통계적 모형을 통해 얻을 수 있다. 통계적 모형은 동적 모형에 비해 계산 시간이 저렴하여 시공간 해상도가 높은 기후자료 생성에 주로 이용한다. 본 연구에서는 2003년부터 2012년까지 1월에 관측된 일 평균기온자료를 토대로 통계적 모형의 일 평균 기온을 생성하였다. 통계적 모형으로 선형모형을 기반으로한 일반선형모형, 일반화가법모형, 공간선형모형, 베이지안공간선형모형을 고려하였다. 예측성능평가를 위해 60개소의 지상관측소에서 관측된 일 평균기온을 모형적합 자료로 사용하여 352개소의 자동기상관측의 일 평균기온을 검증하였다. 평균제곱오차와 상관계수를 보면 베이지안공간모형의 예측성능이 다른 모형에 비해 상대적으로 우수하였다. 최종적으로 $1km{\times}1km$ 격자 단위 일 평균기온 지도를 생성하였다.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • 응용통계연구
    • /
    • 제24권4호
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

IDEA를 이용한 탄약중대의 효율성 평가 (Assessment of Ammunition Companies Using the IDEA Model)

  • 배영민;김재희;김승권
    • 산업공학
    • /
    • 제19권4호
    • /
    • pp.291-299
    • /
    • 2006
  • In order to enhance sustainable war fighting capabilities, it is important to maintain a good ammunition support system. In this paper, we evaluate the performance of ammunition companies using Imprecise Data Envelopment Analysis (IDEA)-BCC and IDEA-Additive model, which can deal with imprecise data in DEA. The input variables of IDEA models were selected by stepwise multiple regression analysis. With the regression model, we could choose the number of soldiers, officers, and ammunition warehouses as input variables that have significant effects on the output performance. Then, we applied the IDEA-BCC model with the concept of potential efficiency. The results of the model indicate that 8 out of 16 ammunition companies are efficient, 7 are inefficient, and 1 is potentially efficient. We could also identify the possible input excesses and output shortfalls to reach the efficient frontier using the IDEA-Additive model.

일반화가법모형에서 축소방법의 적용연구 (A Study on Applying Shrinkage Method in Generalized Additive Model)

  • 기승도;강기훈
    • 응용통계연구
    • /
    • 제23권1호
    • /
    • pp.207-218
    • /
    • 2010
  • 일반화가법모형은 기존 선형회귀모형의 문제점을 대부분 해결한 통계모형이지만 의미있는 독립변수의 수를 줄이는 방법이 적용되지 않을 경우 과대적합 문제가 발생할 수 있다. 그러므로 일반화가법모형에서 변수 축소방법을 적용하는 연구가 필요하다. 회귀분석에서 변수 축소방법으로 최근에는 Lasso 계열의 접근법이 연구되고 있다. 본 연구에서는 활용성이 높은 통계모형인 일반화가법모형에 Lasso 계열의 모형 중에서 Group Lasso와 Elastic net 모형을 적용하는 방법을 제시하고 이들의 해를 구하는 절차를 제안하였다. 그리고 제안된 방법을 모의실험과 실제자료인 회계년도 2005년 자동차보혐 자료에 적용을 통해 비교하여 보았다. 그 결과 본 논문에서 제안한 Group Lasso와 Elastic net을 이용하여 변수 축소를 통한 일반화가법모형이 기존의 방법보다 더 나은 결과를 제공하는 것으로 분석 되었다.

단독주택가격 추정을 위한 기계학습 모형의 응용 (Application of machine learning models for estimating house price)

  • 이창로;박기호
    • 대한지리학회지
    • /
    • 제51권2호
    • /
    • pp.219-233
    • /
    • 2016
  • 수리 또는 계량적 모형을 사용하는 사회과학연구에서 분석의 초점은 종속변수와 설명변수의 관계를 밝히는 것, 즉 설명 중심의 모형(explanatory modeling)이 지금까지 주류를 이루었다. 반면 예측(prediction) 능력 제고에 초점을 맞춘 분석은 드물었다. 본 연구에서는 이론 및 가설을 검증하거나 변수 간의 관계를 밝히는 설명 중심의 모형이 아니라 신규 관찰치에 대한 예측 오차를 줄이는, 예측 중심의 비모수 모형(non-parametric model)을 검토하였다. 서울시 강남구를 사례지역으로 선정한 후, 2011년부터 2014년까지 신고된 단독주택 실거래가를 기초자료로 하여 주택가격을 추정하였다. 적용한 비모수 모형은 기계학습 분야에서 제시된 일반가산모형(generalized additive model), 랜덤 포리스트, MARS(multivariate adaptive regression splines), SVM(support vector machines) 등이며 비교적 최근에 개발된 MARS나 SVM의 예측력이 뛰어남을 확인할 수 있었다. 마지막으로 이러한 비모수 모형에 공간적 자기상관성을 추가적으로 반영한 결과, 모형의 가격 예측력이 보다 개선되었음을 알 수 있었다. 본 연구를 계기로 그간 모수 모형에 집중되었던 부동산 가격추정 방법론이 비모수 모형으로 확대 및 다양화되기를 기대한다.

  • PDF

Random Regression Models Are Suitable to Substitute the Traditional 305-Day Lactation Model in Genetic Evaluations of Holstein Cattle in Brazil

  • Padilha, Alessandro Haiduck;Cobuci, Jaime Araujo;Costa, Claudio Napolis;Neto, Jose Braccini
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제29권6호
    • /
    • pp.759-767
    • /
    • 2016
  • The aim of this study was to compare two random regression models (RRM) fitted by fourth ($RRM_4$) and fifth-order Legendre polynomials ($RRM_5$) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike's information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (-2LogL) were for $RRM_4$. Heritability for 305-day milk yield (305MY) was 0.23 ($RRM_4$), 0.24 ($RRM_5$), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from $RRM_4$ and $RRM_5$ were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values.

LACTATION CURVE OF HOLSTEIN FRIESIAN COWS IN THE KINGDOM OF SAUDI ARABIA

  • Ali, A.K.A.;Al-Jumaah, R.S.;Hayes, E.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제9권4호
    • /
    • pp.439-447
    • /
    • 1996
  • Monthly test day production for 12,020 records, were collected from six of the largest specialized dairy farms located in central region of the Kingdom of Saudi Arabia. The records described lactating cows in four parities and two seasons of calving. Monthly test day records were fitted using Wood's model $At{{^b}{_e}}^{-ct}$ with multiple and additive error term. Linear and non-linear regression models were used to find the estimates of the parameters necessary to draw the lactation curves. The shape of the lactation curves of different parities showed that third lactation has the heighest peak (43.08 kg) for linear regression model and (42.08 kg) for non-linear regression model. Fourth lactation has the lowest peak (24.00kg) for linear regression model and (25.64 kg) for non-linear regression models. Cows of second and third lactations reached the peak at 58 day for both linear and non-linear regression models. Cows of first lactation were more persistent and had late peak at 68 and 67 days for both models respectively. While, third lactation cows were lower persistent and had early peak at 58 day for both models. Cows calved at winter months have higher starting values (A), higher ascending slope (b) and higher decending slope (c). Least square means of milk yield of the first four parities and for overall data were 6,653, 7,659, 7,482, 6,988 and 7,614 kg respectively. The corresponding lactation period were 358, 367, 350, 363 and 364 days respectively.

Quantitative Comparison of Probabilistic Multi-source Spatial Data Integration Models for Landslide Hazard Assessment

  • Park No-Wook;Chi Kwang-Hoon;Chung Chang-Jo F.;Kwon Byung-Doo
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.622-625
    • /
    • 2004
  • This paper presents multi-source spatial data integration models based on probability theory for landslide hazard assessment. Four probabilistic models such as empirical likelihood ratio estimation, logistic regression, generalized additive and predictive discriminant models are proposed and applied. The models proposed here are theoretically based on statistical relationships between landslide occurrences and input spatial data sets. Those models especially have the advantage of direct use of continuous data without any information loss. A case study from the Gangneung area, Korea was carried out to quantitatively assess those four models and to discuss operational issues.

  • PDF

SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석 (Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP)

  • 이상덕;김대규;김창수
    • 한국재난정보학회 논문집
    • /
    • 제19권4호
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.