• 제목/요약/키워드: Out-of-Sample Prediction

검색결과 91건 처리시간 0.027초

Forecasting KOSPI Return Using a Modified Stochastic AdaBoosting

  • Bae, Sangil;Jeong, Minsoo
    • East Asian Economic Review
    • /
    • 제25권4호
    • /
    • pp.403-424
    • /
    • 2021
  • AdaBoost tweaks the sample weight for each training set used in the iterative process, however, it is demonstrated that it provides more correlated errors as the boosting iteration proceeds if models' accuracy is high enough. Therefore, in this study, we propose a novel way to improve the performance of the existing AdaBoost algorithm by employing heterogeneous models and a stochastic twist. By employing the heterogeneous ensemble, it ensures different models that have a different initial assumption about the data are used to improve on diversity. Also, by using a stochastic algorithm with a decaying convergence rate, the model is designed to balance out the trade-off between model prediction performance and model convergence. The result showed that the stochastic algorithm with decaying convergence rate's did have a improving effect and outperformed other existing boosting techniques.

페인트에서 방출되는 TVOC 및 HCHO 방출량 예측모델 (A Prediction Model for TVOC and HCHO Emission of Paint Materials)

  • 김형수;이경회
    • KIEAE Journal
    • /
    • 제3권1호
    • /
    • pp.13-20
    • /
    • 2003
  • It is highly recognized that there is need for protection against indoor air pollution, as we realize environmental pollution is growing, For example, in an indoor environment, a person spends more than 80 percent of their time inside the building. Thus, concern about indoor decoration materials is growing, since they cause pollution in the rooms of an apartment, as well as in offices. As the indoor decoration materials become more diverse and lusurious, so the effect of VOCs(Volatile Organic Compounds) and HCHO(Formaldehy) is growing. The indoor decoration materials cause the Sick Building Syndrome, such as headaches, dizziness, or lack of concentraion, and they in turn cause serious deterioration in people's health. In this study, I probed the status of the indoor air pollution and carried on an investigation and analysis about the prevention technique. In doing so, I performed experimental tests and an assessment of the indoor decoration materials of an apartment. I also examined elements of the emitted and the emission. Finally, I examined the character of emissions, by changing environmental conditions, such as the temperature, humidity, and ventilation. With respect to VOCs tests, I applied the method of solid state adsorption using the adsorptive tube, based on the measurement of the American EPA TO-17, ASTM 5116-97, and the measurement of the Japanese Wall Decoration Industrial Association. The tested sample was analyzed by High Performance Liquid Chromatography, after going through the process of dissolvent extraction. As subjects of the test, Paint were selected. The process of this test is as follows; first, I figured out the character of the emission, by measuring the emitted concentration of VOCs and HOHC from the indoor decoration materials of an apartment. Second, I made a small-scale chamber and the test was processed in the chamber in order to suggest an environment-friendly prediction modlel development.

Estimating the unconfined compression strength of low plastic clayey soils using gene-expression programming

  • Muhammad Naqeeb Nawaz;Song-Hun Chong;Muhammad Muneeb Nawaz;Safeer Haider;Waqas Hassan;Jin-Seop Kim
    • Geomechanics and Engineering
    • /
    • 제33권1호
    • /
    • pp.1-9
    • /
    • 2023
  • The unconfined compression strength (UCS) of soils is commonly used either before or during the construction of geo-structures. In the pre-design stage, UCS as a mechanical property is obtained through a laboratory test that requires cumbersome procedures and high costs from in-situ sampling and sample preparation. As an alternative way, the empirical model established from limited testing cases is used to economically estimate the UCS. However, many parameters affecting the 1D soil compression response hinder employing the traditional statistical analysis. In this study, gene expression programming (GEP) is adopted to develop a prediction model of UCS with common affecting soil properties. A total of 79 undisturbed soil samples are collected, of which 54 samples are utilized for the generation of a predictive model and 25 samples are used to validate the proposed model. Experimental studies are conducted to measure the unconfined compression strength and basic soil index properties. A performance assessment of the prediction model is carried out using statistical checks including the correlation coefficient (R), the root mean square error (RMSE), the mean absolute error (MAE), the relatively squared error (RSE), and external criteria checks. The prediction model has achieved excellent accuracy with values of R, RMSE, MAE, and RSE of 0.98, 10.01, 7.94, and 0.03, respectively for the training data and 0.92, 19.82, 14.56, and 0.15, respectively for the testing data. From the sensitivity analysis and parametric study, the liquid limit and fine content are found to be the most sensitive parameters whereas the sand content is the least critical parameter.

Multi-dimensional Analysis and Prediction Model for Tourist Satisfaction

  • Shrestha, Deepanjal;Wenan, Tan;Gaudel, Bijay;Rajkarnikar, Neesha;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권2호
    • /
    • pp.480-502
    • /
    • 2022
  • This work assesses the degree of satisfaction tourists receive as final recipients in a tourism destination based on the fact that satisfied tourists can make a significant contribution to the growth and continuous improvement of a tourism business. The work considers Pokhara, the tourism capital of Nepal as a prefecture of study. A stratified sampling methodology with open-ended survey questions is used as a primary source of data for a sample size of 1019 for both international and domestic tourists. The data collected through a survey is processed using a data mining tool to perform multi-dimensional analysis to discover information patterns and visualize clusters. Further, supervised machine learning algorithms, kNN, Decision tree, Support vector machine, Random forest, Neural network, Naive Bayes, and Gradient boost are used to develop models for training and prediction purposes for the survey data. To find the best model for prediction purposes, different performance matrices are used to evaluate a model for performance, accuracy, and robustness. The best model is used in constructing a learning-enabled model for predicting tourists as satisfied, neutral, and unsatisfied visitors. This work is very important for tourism business personnel, government agencies, and tourism stakeholders to find information on tourist satisfaction and factors that influence it. Though this work was carried out for Pokhara city of Nepal, the study is equally relevant to any other tourism destination of similar nature.

Measurement of lipid content of compost fermentation using near-infrared spectroscopy

  • Daisuke Masui;Suehara, Ken-ichiro;Yasuhisa Nakano;Takuo Yano
    • Near Infrared Analysis
    • /
    • 제2권1호
    • /
    • pp.37-42
    • /
    • 2001
  • Near infrared spectroscopy (NIRS) was applied to determination of the lipid content of the compost during the compost fermentation of tofu (soybean0curd) refuse. The absorption of lipid observed at 5 wavelengths, 1208, 1712, 1772, 2312 and 2352 nm on the second derivative spectra. To formulated a calibration equation, a multiple linear regression analysis was carried out between the near-infrared spectral data and on the lipid content in the calibration sample set (sample number, n=60) obtained using Soxhlet extraction method. The value of the multiple correlation coefficient (R) was 0.975 when using the wavelengths of 1208 and 1712 nm were used in the calibration equation. To validate the calibration equation obtained, the lipid content in the validation sample set (n=35) not used for formulating the calibration equation was calculated using the calibration equation, and compared with the value obtained using the Soxhlet extraction method. Good agreement was observed between the results of the Soxhlet extraction method and those values of the NIRS method. The simple correlation coefficient (r) and standard error of prediction (SEP) were 0.964 and 0.815 %, respectively. suitability of the lipid content as an indicator of the compost fermentation of tofu refuse was also studied. The decrease of the lipid content in the compost corresponded to the decrease of the total dry weight of the compost in the composter. The lipid content was a significant indicator of the compost fermentation. The NIRS method was applied to measure the time course of the lipid content in the compost fermentation and good results were obtained. The study indicates that NIRS is a useful method for process management of the compost fermentation of tofu refuse.

충주 지역 화강암의 풍화지수 및 일축압축강도 추정에 관한 연구 (The Weathering Index and Prediction of Uniaxial Compressive Strength for Chung-Ju Granite)

  • 엄태욱;김학문;김찬국;장경준;표명렬
    • 한국지반공학회:학술대회논문집
    • /
    • 한국지반공학회 2008년도 춘계 학술발표회 초청강연 및 논문집
    • /
    • pp.863-874
    • /
    • 2008
  • We have to judge engineering properties of rock accurately in order to design and construct rock structure safely and economically. Among the rock tests, the test result of UCS(Uniaxial Compressive Strength) is very important factor used in the variety ways for designing and construction of underground structures, rock slope and foundation analysis. But the UCS test has some disadvantages of intact sample preparation such as because the shape of sample has to be regular cylindrical, cube or rectangular. In order to solve those problem, indirect tests are used such as point load test, schmidt hammer test, absorption test, dry density to predict UCS of rock. Those tests are easy to prepare sample and convenient to carry out the tests, so it is simple and costs less. Schmidt hammer test are frequently used in the construction site, because it is handy and easy to use, but there is concern of misuse without classifying the specification of each schmidt hammer. Thus, this study suggested presumptive numerical formula related on each specification of schmidt hammer test, point load test, absorption test and dry density also. We compared presumptive numerical formula and R-square through schmidt rebound assessment method already brought up. Also, through the test we offer the extent of weathering index according to the weathering grade.

  • PDF

Support Vector Machine과 상태공간모형을 이용한 단변량 수문 시계열의 동역학적 비선형 예측모형 (Dynamic Nonlinear Prediction Model of Univariate Hydrologic Time Series Using the Support Vector Machine and State-Space Model)

  • 권현한;문영일
    • 대한토목학회논문집
    • /
    • 제26권3B호
    • /
    • pp.279-289
    • /
    • 2006
  • 최근에 수문시계열로부터 저차원의 비선형 거동을 재구성하고자 하는 연구가 활발히 진행되고 있다. 이러한 관점에서 본 연구에서는 Support Vector Machine(SVM)을 이용하여 우수한 상태-공간 재구성 능력을 갖는 비선형 예측모형을 구성하여 Great Salt Lake(GSL) Volume에 적용하였다. SVM은 Kernel 함수로부터 유도된 고차원의 특성공간 안에서 선형함수의 가상공간을 이용하는 Machine Learning 방법론이다. 또한 SVM은 훈련자료로부터 얻어지는 평균제곱오차가 아닌 일반화된 오차를 최소화함으로써 상대적으로 기존 방법에 비해 적은 수의 매개변수와 과적합(over fitting)을 피하면서 비선형 함수의 최적화가 가능하다. 본 연구에서 제시한 SVM 회귀분석의 적용성은 미국의 GSL의 2주 간격 Volume을 대상으로 검토하였다. SVM을 이용한 비선형 예측모형은 GSL Volume의 2주(1-Step), 8주(4-Step)와 반복예측(Iterated Prediction, 121-Step)까지 적용되었다. 본 연구에서는 극치사상 즉, 급격한 감소 및 증가 구간을 예측하는데 있어서 훈련구간과 예측구간을 구분하여 모형의 신뢰성을 평가하였다. 예측결과SVM은 훈련자료로부터 적은 수의 관측치를 이용하여 동역학적 거동을 추출할 수 있었으며 실제 관측자료와 거의 유사한 예측이 가능함을 통계적 지표로 확인할 수 있었다. 따라서 비선형 수문시계열의 단기 예측을 위한 모형으로 적용이 가능할 것으로 판단된다.

다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형 (The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM)

  • 박지영;홍태호
    • Asia pacific journal of information systems
    • /
    • 제19권2호
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

이자율 스프레드의 경기 예측력: 문헌 서베이 및 한국의 사례 분석 (Predicting Economic Activity via the Yield Spread: Literature Survey and Empirical Evidence in Korea)

  • 윤재호
    • 경제분석
    • /
    • 제26권3호
    • /
    • pp.1-47
    • /
    • 2020
  • 본 연구는 이자율 스프레드 혹은 이자율 스프레드의 각 구성요소인 기대 스프레드와 기간 프리미엄의 경기 예측력에 관한 1990년대 이후 선행연구를 서베이하고, 한국의 국고채 현물이자율 데이터를 이용하여 이자율 스프레드 및 각 구성요소의 산업생산 증가율, 소비자물가 상승률, 생산갭 등에 대한 예측력에 관한 실증분석을 수행하였다. 먼저 주로 미국 경제를 대상으로 한 선행 연구들을 서베이한 결과 이자율 스프레드는 주요 경제변수들에 대하여 유의한 예측력을 갖고 있으나 1980년대 중반 이후 인플레이션 타깃팅 강화 경향 등에 따라 이자율 스프레드의 경기 예측력이 저하되고 있는 것으로 나타났다. 다음으로 한국 데이터를 대상으로 산업생산 증가율, 소비자물가 상승률, 생산갭 등에 대한 이자율 스프레드 및 각 구성요소의 예측력을 분석한 결과, 특히 이자율 스프레드의 구성요소 중 기간 프리미엄이 유의한 예측력을 갖는 것으로 나타났다. 이자율 스프레드를 이용하여 표본외 분석을 수행한 결과, 예측방정식이 구조적으로 불안정한 것으로 나타났으며, 특히 산업생산지수 예측에 있어서 이자율 스프레드의 분해가 유의한 기여를 하는 것으로 나타났다.

근적외분광법을 이용한 권련 중 일반각초, 팽화주맥 및 팽화각초 배합비 분석 (The Prediction of Blending Ratio of Cut Tobacco, Expanded Stem, and Expanded Cut Tobacco in Cigarettes using Near Infrared Spectroscopy)

  • 김용옥;정한주;김기환
    • 한국연초학회지
    • /
    • 제22권1호
    • /
    • pp.76-83
    • /
    • 2000
  • This study was carried out to predict blending ratio of cut tobacco(CT), expanded stem(ES), and expanded cut tobacco(ECT) in cigarettes. CT, ES, and ECT samples from A brand were, ground and blended with reference to A blending ratio, and scanned by near infrared spectroscopy(NIRSystem Co., Model 6500). Calibration equations were developed and then determined blending ratio by NIRS. The standard error of calibration(SEC) and performance(SEP) of C factory samples between NIRS and known blending ratio were 0.97%, 1.93% for CT, 0.50%, 1.12 % for ES and 0.68%, 1.10% for ECT, respectively. The SEP of CT, ES and ECT of Band D factory samples determined by C factory calibration equation were more inaccurate than those of C factory samples determined by C factory calibration equations. These results were caused by the difference of CT, ES and ECT spectra followed by each factory. The SEP of CT, ES and ECT of Band D factories determined by calibration equations derived from each factory samples were more accurate than those of determined by calibration equation derived from C factory samples. Each factory SEP of CT, ES and ECT determined by calibration equation derived from all calibration samples(B+C+D factory) was similar to that determined by calibration equation derived from each factory samples. To improve the analytical inaccuracy caused by spectra difference, we need to apply a specific calibration equation for each factory sample. Data in development of specific calibrations between sample and NIRS spectra might supply a method for rapid determination of blending ratio of CT, ES, and ECT.

  • PDF