• Title/Summary/Keyword: mape(mean absolute percentage error)

Search Result 110, Processing Time 0.023 seconds

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Prediction of Defect Size of Steam Generator Tube in Nuclear Power Plant Using Neural Network (신경회로망을 이용한 원전SG 세관 결함크기 예측)

  • Han, Ki-Won;Jo, Nam-Hoon;Lee, Hyang-Beom
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.27 no.5
    • /
    • pp.383-392
    • /
    • 2007
  • In this paper, we study the prediction of depth and width of a defect in steam generator tube in nuclear power plant using neural network. To this end, we first generate eddy current testing (ECT) signals for 4 defect patterns of SG tube: I-In type, I-Out type, V-In type, and V-Out type. In particular, we generate 400 ECT signals for various widths and depths for each defect type by the numerical analysis program based on finite element modeling. From those generated ECT signals, we extract new feature vectors for the prediction of defect size, which include the angle between the two points where the maximum impedance and half the maximum impedance are achieved. Using the extracted feature vector, multi-layer perceptron with one hidden layer is used to predict the size of defects. Through the computer simulation study, it is shown that the proposed method achieves decent prediction performance in terms of maximum error and mean absolute percentage error (MAPE).

A new formulation for strength characteristics of steel slag aggregate concrete using an artificial intelligence-based approach

  • Awoyera, Paul O.;Mansouri, Iman;Abraham, Ajith;Viloria, Amelec
    • Computers and Concrete
    • /
    • v.27 no.4
    • /
    • pp.333-341
    • /
    • 2021
  • Steel slag, an industrial reject from the steel rolling process, has been identified as one of the suitable, environmentally friendly materials for concrete production. Given that the coarse aggregate portion represents about 70% of concrete constituents, other economic approaches have been found in the use of alternative materials such as steel slag in concrete. Unfortunately, a standard framework for its application is still lacking. Therefore, this study proposed functional model equations for the determination of strength properties (compression and splitting tensile) of steel slag aggregate concrete (SSAC), using gene expression programming (GEP). The study, in the experimental phase, utilized steel slag as a partial replacement of crushed rock, in steps 20%, 40%, 60%, 80%, and 100%, respectively. The predictor variables included in the analysis were cement, sand, granite, steel slag, water/cement ratio, and curing regime (age). For the model development, 60-75% of the dataset was used as the training set, while the remaining data was used for testing the model. Empirical results illustrate that steel aggregate could be used up to 100% replacement of conventional aggregate, while also yielding comparable results as the latter. The GEP-based functional relations were tested statistically. The minimum absolute percentage error (MAPE), and root mean square error (RMSE) for compressive strength are 6.9 and 1.4, and 12.52 and 0.91 for the train and test datasets, respectively. With the consistency of both the training and testing datasets, the model has shown a strong capacity to predict the strength properties of SSAC. The results showed that the proposed model equations are reliably suitable for estimating SSAC strength properties. The GEP-based formula is relatively simple and useful for pre-design applications.

Ensembles of neural network with stochastic optimization algorithms in predicting concrete tensile strength

  • Hu, Juan;Dong, Fenghui;Qiu, Yiqi;Xi, Lei;Majdi, Ali;Ali, H. Elhosiny
    • Steel and Composite Structures
    • /
    • v.45 no.2
    • /
    • pp.205-218
    • /
    • 2022
  • Proper calculation of splitting tensile strength (STS) of concrete has been a crucial task, due to the wide use of concrete in the construction sector. Following many recent studies that have proposed various predictive models for this aim, this study suggests and tests the functionality of three hybrid models in predicting the STS from the characteristics of the mixture components including cement compressive strength, cement tensile strength, curing age, the maximum size of the crushed stone, stone powder content, sand fine modulus, water to binder ratio, and the ratio of sand. A multi-layer perceptron (MLP) neural network incorporates invasive weed optimization (IWO), cuttlefish optimization algorithm (CFOA), and electrostatic discharge algorithm (ESDA) which are among the newest optimization techniques. A dataset from the earlier literature is used for exploring and extrapolating the STS behavior. The results acquired from several accuracy criteria demonstrated a nice learning capability for all three hybrid models viz. IWO-MLP, CFOA-MLP, and ESDA-MLP. Also in the prediction phase, the prediction products were in a promising agreement (above 88%) with experimental results. However, a comparative look revealed the ESDA-MLP as the most accurate predictor. Considering mean absolute percentage error (MAPE) index, the error of ESDA-MLP was 9.05%, while the corresponding value for IWO-MLP and CFOA-MLP was 9.17 and 13.97%, respectively. Since the combination of MLP and ESDA can be an effective tool for optimizing the concrete mixture toward a desirable STS, the last part of this study is dedicated to extracting a predictive formula from this model.

A Model of Four Seasons Mixed Heat Demand Prediction Neural Network for Improving Forecast Rate (예측율 제고를 위한 사계절 혼합형 열수요 예측 신경망 모델)

  • Choi, Seungho;Lee, Jaebok;Kim, Wonho;Hong, Junhee
    • Journal of Energy Engineering
    • /
    • v.28 no.4
    • /
    • pp.82-93
    • /
    • 2019
  • In this study, a new model is proposed to improve the problem of the decline of predict rate of heat demand on a particular date, such as a public holiday for the conventional heat demand forecasting system. The proposed model was the Four Season Mixed Heat Demand Prediction Neural Network Model, which showed an increase in the forecast rate of heat demand, especially for each type of forecast date (weekday/weekend/holiday). The proposed model was selected through the following process. A model with an even error for each type of forecast date in a particular season is selected to form the entire forecast model. To avoid shortening learning time and excessive learning, after each of the four different models that were structurally simplified were learning and a model that showed optimal prediction error was selected through various combinations. The output of the model is the hourly 24-hour heat demand at the forecast date and the total is the daily total heat demand. These forecasts enable efficient heat supply planning and allow the selection and utilization of output values according to their purpose. For daily heat demand forecasts for the proposed model, the overall MAPE improved from 5.3~6.1% for individual models to 5.2% and the forecast for holiday heat demand greatly improved from 4.9~7.9% to 2.9%. The data in this study utilized 34 months of heat demand data from a specific apartment complex provided by the Korea District Heating Corp. (January 2015 to October 2017).

Performance Evaluation of Loss Functions and Composition Methods of Log-scale Train Data for Supervised Learning of Neural Network (신경 망의 지도 학습을 위한 로그 간격의 학습 자료 구성 방식과 손실 함수의 성능 평가)

  • Donggyu Song;Seheon Ko;Hyomin Lee
    • Korean Chemical Engineering Research
    • /
    • v.61 no.3
    • /
    • pp.388-393
    • /
    • 2023
  • The analysis of engineering data using neural network based on supervised learning has been utilized in various engineering fields such as optimization of chemical engineering process, concentration prediction of particulate matter pollution, prediction of thermodynamic phase equilibria, and prediction of physical properties for transport phenomena system. The supervised learning requires training data, and the performance of the supervised learning is affected by the composition and the configurations of the given training data. Among the frequently observed engineering data, the data is given in log-scale such as length of DNA, concentration of analytes, etc. In this study, for widely distributed log-scaled training data of virtual 100×100 images, available loss functions were quantitatively evaluated in terms of (i) confusion matrix, (ii) maximum relative error and (iii) mean relative error. As a result, the loss functions of mean-absolute-percentage-error and mean-squared-logarithmic-error were the optimal functions for the log-scaled training data. Furthermore, we figured out that uniformly selected training data lead to the best prediction performance. The optimal loss functions and method for how to compose training data studied in this work would be applied to engineering problems such as evaluating DNA length, analyzing biomolecules, predicting concentration of colloidal suspension.

A Study of Travel Time Prediction using K-Nearest Neighborhood Method (K 최대근접이웃 방법을 이용한 통행시간 예측에 대한 연구)

  • Lim, Sung-Han;Lee, Hyang-Mi;Park, Seong-Lyong;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.835-845
    • /
    • 2013
  • Travel-time is considered the most typical and preferred traffic information for intelligent transportation systems(ITS). This paper proposes a real-time travel-time prediction method for a national highway. In this paper, the K-nearest neighbor(KNN) method is used for travel time prediction. The KNN method (a nonparametric method) is appropriate for a real-time traffic management system because the method needs no additional assumptions or parameter calibration. The performances of various models are compared based on mean absolute percentage error(MAPE) and coefficient of variation(CV). In real application, the analysis of real traffic data collected from Korean national highways indicates that the proposed model outperforms other prediction models such as the historical average model and the Kalman filter model. It is expected to improve travel-time reliability by flexibly using travel-time from the proposed model with travel-time from the interval detectors.

Measuring and Modeling the Spectral Attenuation of Light in the Yellow Sea

  • Gallegos, Sonia-C.;Sandidge, Juanita;Chen, Xiaogang;Hahn, Sangbok-D.;Ahn, Yu-Hwan;Iturriaga, Rodolfo;Jeong, Hee-Dong;Suh, Young-Sang;Cho, Sung-Hwam
    • Journal of the korean society of oceanography
    • /
    • v.39 no.1
    • /
    • pp.46-56
    • /
    • 2004
  • Spectral attenuation of light and upwelling radiance were measured in the western coast of Korea on board the R/V Inchon 888 of the Korean National Fisheries Research and Development Institute(NFRDI) during four seasons. The goal of these efforts was to determine the spatial and temporal distribution of the inherent and apparent optical properties of the water, and the factors that control their distribution. Our data indicate that while stratification of the water column, phytoplankton, and wind stress determined the vertical distribution of the optical parameters offshore, it was the tidal current and sediment type that controlled both the vertical and horizontal distribution in the coastal areas. These findings led to the development of a model that estimates the spectral attenuation of light with respect to depth and time for the Yellow Sea. The model integrates water leaving radiance from satellites, sediment types, current vectors, sigma-t, bathymetry, and in situ optical measurements in a learning algorithm capable of extracting optical properties with only knowledge of the environmental conditions of the Yellow Sea. The performance of the model decreases with increase in depth. The mean absolute percentage error (MAPE) of the model is 2% for the upper five meters, 8-10% between 6 and 50 meters, and 15% below 51 meters.

Predictive models of hardened mechanical properties of waste LCD glass concrete

  • Wang, Chien-Chih;Wang, Her-Yung;Huang, Chi
    • Computers and Concrete
    • /
    • v.14 no.5
    • /
    • pp.577-597
    • /
    • 2014
  • This paper aims to develop a prediction model for the hardened properties of waste LCD glass that is used in concrete by analyzing a series of laboratory test results, which were obtained in our previous study. We also summarized the testing results of the hardened properties of a variety of waste LCD glass concretes and discussed the effect of factors such as the water-binder ratio (w/b), waste glass content (G) and age (t) on the concrete compressive strength, flexural strength and ultrasonic pulse velocity. This study also applied a hyperbolic function, an exponential function and a power function in a non-linear regression analysis of multiple variables and established the prediction model that could consider the effect of the water-binder ratio (w/b), waste glass content (G) and age (t) on the concrete compressive strength, flexural strength and ultrasonic pulse velocity. Compared with the testing results, the statistical analysis shows that the coefficient of determination $R^2$ and the mean absolute percentage error (MAPE) were 0.93-0.96 and 5.4-8.4% for the compressive strength, 0.83-0.89 and 8.9-12.2% for the flexural strength and 0.87-0.89 and 1.8-2.2% for the ultrasonic pulse velocity, respectively. The proposed models are highly accurate in predicting the compressive strength, flexural strength and ultrasonic pulse velocity of waste LCD glass concrete. However, with other ranges of mixture parameters, the predicted models must be further studied.

Forecasting Daily Demand of Domestic City Gas with Selective Sampling (선별적 샘플링을 이용한 국내 도시가스 일별 수요예측 절차 개발)

  • Lee, Geun-Cheol;Han, Jung-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.10
    • /
    • pp.6860-6868
    • /
    • 2015
  • In this study, we consider a problem of forecasting daily city gas demand of Korea. Forecasting daily gas demand is a daily routine for gas provider, and gas demand needs to be forecasted accurately in order to guarantee secure gas supply. In this study, we analyze the time series of city gas demand in several ways. Data analysis shows that primary factors affecting the city gas demand include the demand of previous day, temperature, day of week, and so on. Incorporating these factors, we developed a multiple linear regression model. Also, we devised a sampling procedure that selectively collects the past data considering the characteristics of the city gas demand. Test results on real data exhibit that the MAPE (Mean Absolute Percentage Error) obtained by the proposed method is about 2.22%, which amounts to 7% of the relative improvement ratio when compared with the existing method in the literature.