• 제목/요약/키워드: Out-of-Sample Prediction

검색결과 91건 처리시간 0.022초

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Application of Near Infrared Spectroscopy for Nondestructive Evaluation of Nitrogen Content in Ginseng

  • Lin, Gou-lin;Sohn, Mi-Ryeong;Kim, Eun-Ok;Kwon, Young-Kil;Cho, Rae-Kwang
    • 한국근적외분광분석학회:학술대회논문집
    • /
    • 한국근적외분광분석학회 2001년도 NIR-2001
    • /
    • pp.1528-1528
    • /
    • 2001
  • Ginseng cultivated in different country or growing condition has generally different components such as saponin and protein, and it relates to efficacy and action. Protein content assumes by nitrogen content in ginseng radix. Nitrogen content could be determined by chemical analysis such as kjeldahl or extraction methods. However, these methods require long analysis time and result environmental pollution and sample damage. In this work we investigated possibility of non-destructive determination of nitrogen content in ginseng radix using near-infrared spectroscopy. Ginseng radix, root of Panax ginseng C. A. Meyer, was studied. Total 120 samples were used in this study and it was consisted of 6 sample sets, 4, 5 and 6-year-old Korea ginseng and 7, 8 and 9-year-old China ginseng, respectively. Each sample set has 20 sample. Nigrogen content was measured by electronic analysis. NIR reflectance spectra were collected over the 1100 to 2500 nm spectral region with a InfraAlyzer 500C (Bran+Luebbe, Germany) equipped with a halogen lapmp and PbS detector and data were collected every 2 nm data point intervals. The calibration models were carried out by multiple linear regression (MLR) and partial least squares (PLS) analysis using IDAS and SESAME software. Result of electronic analysis, Korean ginseng were different mean value in nitrogen content of China ginseng. Ginseng tend to generally decrease the nitrogen content according as cultivation year is over 6 years. The MLR calibration model with 8 wavelengths using IDAS software accurately predicted nitrogen contents with correlation coefficient (R) and standard error of prediction of 0.985 and 0.855%, respectively. In case of SESAME software, the MLR calibration with 9 wavelength was selected the best calibration, R and SEP were 0.972 and 0.596%, respectively. The PLSR calibration model result in 0.969 of R and 0.630 of RMSEP. This study shows the NIR spectroscopy could be applied to determine the nitrogen content in ginseng radix with high accuracy.

  • PDF

Supply models for stability of supply-demand in the Korean pork market

  • Chunghyeon, Kim;Hyungwoo, Lee ;Tongjoo, Suh
    • 농업과학연구
    • /
    • 제49권3호
    • /
    • pp.679-690
    • /
    • 2022
  • As the supply and demand of pork has become a significant concern in Korea, controlling it has become a critical challenge for the industry. However, compared to the demand for pork, which has relatively stable consumption, it is not easy to maintain a stable supply. As the preparation of measures for a supply-demand crisis response and supply control in the pig industry has emerged as an important task, it has become necessary to establish a stable supply model and create an appropriate manual. In this study, a pork supply prediction model is constructed using reported data from the pig traceability system. Based on the derived results, a method for determining the supply-demand crisis stage using a statistical approach was proposed. From the results of the analysis, working days, African swine fever, heat wave, and Covid-19 were shown to affect the number of pigs graded in the market. A test of the performance of the model showed that both in-sample error rate and out-sample error rate were between 0.3 - 7.6%, indicating a high level of predictive power. Applying the forecast, the distribution of the confidence interval of the predicted value was established, and the supply crisis stage was identified, evaluating supply-demand conditions.

Near Infrared Reflectance Spectroscopy(NIRS)에 의한 음식물 쓰레기 퇴비 분석에 관한 연구 (Analysis on Food Waste Compost by Near Infrared Reflectance Spectroscopy(NIRS))

  • 이효원;길동용
    • 한국유기농업학회지
    • /
    • 제13권3호
    • /
    • pp.281-289
    • /
    • 2005
  • In order to find out an alternative way of analysis of food waste compost, the Near Infrared Reflectance Spectroscopy(NIRS) was used for the compost assessment because the technics has been known as non-detructive, cost-effective and rapid method. One hundred thirty six compost samples were collected from Incheon food waste compost factory at Namdong Indurial Complex. The samples were analyzed for nitrogen, organic matter (OM), ash, P, and K using Kjedahl, ignition method, and acid extraction with spectrophotometer, respectively. The samples were scanned using FOSS NIRSystem of Model 6500 scanning mono-chromator with wavelength from $400\~2,400nm$ at 2nm interval. Modified partial Least Squares(MPLS) was applied to develop the most reliable calibration model between NIR spectra and sample components such as nitrogen, ash, OM, P, and K. The regression was validated using validation set(n=30). Multiple correlation coefficient($R^2$) and standard error of prediction(SEP) for nitrogen, ash, organic matter, OM/N ratio, P and K were 0.87, 0.06, 0.72, 1.07, 0.68, 1.05, 0.89, 0.31, 0.77, 0.06, and 0.64, 0.07, respectively. The results of this experiment indicates that NIRS is reliable analytical method to assess some components of feed waste compost, also suggests that feasibility of NIRS can be Justified in case of various sample collection around the year.

  • PDF

실험적 연구를 통한 비정형롤판재성형 예측 모델 개발 (Development of Prediction Model for Flexibly-reconfigurable Roll Forming based on Experimental Study)

  • 박지우;길민규;윤준석;강범수;이경훈
    • 소성∙가공
    • /
    • 제26권6호
    • /
    • pp.341-347
    • /
    • 2017
  • Flexibly-reconfigurable roll forming (FRRF) is a novel sheet metal forming technology conducive to produce multi-curvature surfaces by controlling strain distribution along longitudinal direction. Reconfigurable rollers could be arranged to implement a kind of punch die set. By utilizing these reconfigurable rollers, desired curved surface can be formed. In FRRF process, three-dimensional surface is formed from two-dimensional curve. Thus, it is difficult to predict the forming result. In this study, a regression analysis was suggested to construct a predictive model for a longitudinal curvature of FRRF process. To facilitate investigation, input parameters affecting the longitudinal curvature of FRRF were determined as maximum compression value, curvature radius in the transverse direction, and initial blank width. Three-factor three-level full factorial experimental design was utilized and 27 experiments using FRRF apparatus were performed to obtain sample data of the regression model. Regression analysis was carried out using experimental results as sample data. The model used for regression analysis was a quadratic nonlinear regression model. Determination factor and root mean square root error were calculated to confirm the conformity of this model. Through goodness of fit test, this regression predictive model was verified.

출구조사를 위한 투표소 확률추출 방법 (Probability Sampling to Select Polling Places in Exit Poll)

  • 김영원;엄윤희
    • 한국조사연구학회지:조사연구
    • /
    • 제6권2호
    • /
    • pp.1-32
    • /
    • 2005
  • 출구조사에서 투표소 추출방법은 출구조사의 정확성을 결정하는 중요한 요소이다. 본 연구에서는 대표구 추출법을 대신할 수 있는 정렬계통추출법을 제안하고 그 활용 가능성 및 효율성을 분석한다. 아울러 제시된 정렬계통추출법을 사용하는 경우 추정량의 표본추출오차(sampling error)가 어느 정도 되며, 원하는 목표 오차를 만족하기 위한 표본크기를 결정하는 문제를 고려한다. 2004년 17대 총선 개표자료를 토대로 경험적인 분석을 통해 제시된 정렬계통추출법이 기존의 대표구 추출법에 비해 평균예측오차 관점에서 효율적이라는 사실을 규명하고, 기존의 출구조사에서 표본크기 및 추정오차를 해석하는 과정에서 발생하는 오류를 집락효과를 이용해 설명했다. 아울러 제안한 정렬추출법에서 얻어지는 추정량의 분산을 구하고, 설계효과 개념을 이용해 표본크기 결정문제를 다루었다.

  • PDF

판별분석에 의한 기업부실예측력 평가: 서울지역 특1급 호텔 사례 분석 (Evaluation of Corporate Distress Prediction Power using the Discriminant Analysis: The Case of First-Class Hotels in Seoul)

  • 김시중
    • 한국산학기술학회논문지
    • /
    • 제17권10호
    • /
    • pp.520-526
    • /
    • 2016
  • 본 연구는 서울지역 특1급 호텔을 대상으로 2015년도 재무비율을 변수로 활용하여 표준재무비율을 산출하며, 다변량 판별분석에 의한 부실예측모형 개발 및 부실예측력 평가에 목적이 있다. 서울소재 19개 특1급 호텔의 14개 재무비율을 분석대상으로 선정하여 실증분석을 실시하였으며 분석결과는 다음과 같다. 첫째, 분석결과 우수기업과 부실기업을 판별하는 7개 재무비율은 유동비율, 차입금의존도, 영업이익대비 이자보상비율, 매출액영업이익율, 자기자본순이익율, 영업현금흐름비율, 총자산회전율로 나타났다. 둘째, 7개 재무비율을 활용하여 우수기업과 부실기업을 판별하는 판별함수를 다변량판별분석에 의해 추정하였으며, 추정된 판별함수를 실제 소속집단과 예측집단으로 분류가 가능한가의 예측력 검정 결과, 예측 판별력의 정확도는 87.9%로 분석되었다. 셋째, 추정된 판별함수의 예측 판별력의 정확도 검증결과 판별분석에 의한 부실예측모형의 예측력은 78.95%로 분석되었다. 이러한 분석결과, 호텔 경영진은 호텔기업의 부실기업집단을 판별하는 7개 재무비율을 중점적으로 관리해야 함을 시사하고 있다. 또한 호텔기업이 타 산업과는 뚜렷한 재무구조의 차이와 부실예측 지표가 상이하며, 이에 호텔기업 대상의 신용평가시스템 구축 시 호텔기업의 재무적 특성을 반영한 시스템 구축이 필요함을 시사하고 있다.

주제공원 이용자들의 선택행동 추정에 관한 연구 -Nested Logit Model의 적용 (A Study on Choice Behavior of Theme Park Visitors - Application of Nested Logit Model -)

  • 홍성권
    • 한국조경학회지
    • /
    • 제24권4호
    • /
    • pp.96-111
    • /
    • 1997
  • This study was carried out to identify users' choice behavior of theme parks. overland. Lotte World, Seoul Land, Dreamland and Children's Grand Park were selected as study areas. Both multinomial logic model(MNL), nested logic model(NMNL) and joint logit model wet$.$e test using a choice-based sample collected on study areas. Hausman-McFadden test showed that the MNL is not appropriate because the IIA assumption is violated. To avoid the problematic IIA assumption, the NMNL was tested. It splits similar alternatives into groups and nests separate decisions into hierarchical order to avoid the IIA assumption. Cluster analysis and discriminant analysis were conducted to find applicable nest structures. The inclusive value coefficient was 0.7788. It meant that sufficient condition of this model is met and users' choice behavior can be better understood by NMNL than MNL. The $\rho$2 value and accuracy of prediction of this model were 0.402 and 46.33% , respectively. Several comments were suggested to make the NMNL to be more reliable for future research on users' choice behavior of theme park.

  • PDF

인공 신경망 모델을 활용한 조미니 곡선 예측 (Prediction of Jominy Curve using Artificial Neural Network)

  • 이운재;이석재
    • 열처리공학회지
    • /
    • 제31권1호
    • /
    • pp.1-5
    • /
    • 2018
  • This work demonstrated the application of an artificial neural network model for predicting the Jominy hardness curve by considering 13 alloying elements in low alloy steels. End-quench Jominy tests were carried out according to ASTM A255 standard method for 1197 samples. The hardness values of Jominy sample were measured at different points from the quenched end. The developed artificial neural network model predicted the Jominy curve with high accuracy ($R^2=0.9969$ for training and $R^2=0.9956$ for verification). In addition, the model was used to investigate the average sensitivity of input variables to hardness change.

Neural network heterogeneous autoregressive models for realized volatility

  • Kim, Jaiyool;Baek, Changryong
    • Communications for Statistical Applications and Methods
    • /
    • 제25권6호
    • /
    • pp.659-671
    • /
    • 2018
  • In this study, we consider the extension of the heterogeneous autoregressive (HAR) model for realized volatility by incorporating a neural network (NN) structure. Since HAR is a linear model, we expect that adding a neural network term would explain the delicate nonlinearity of the realized volatility. Three neural network-based HAR models, namely HAR-NN, $HAR({\infty})-NN$, and HAR-AR(22)-NN are considered with performance measured by evaluating out-of-sample forecasting errors. The results of the study show that HAR-NN provides a slightly wider interval than traditional HAR as well as shows more peaks and valleys on the turning points. It implies that the HAR-NN model can capture sharper changes due to higher volatility than the traditional HAR model. The HAR-NN model for prediction interval is therefore recommended to account for higher volatility in the stock market. An empirical analysis on the multinational realized volatility of stock indexes shows that the HAR-NN that adds daily, weekly, and monthly volatility averages to the neural network model exhibits the best performance.