• Title/Summary/Keyword: predictive accuracy

Search Result 797, Processing Time 0.025 seconds

Enhancing Predictive Accuracy of Collaborative Filtering Algorithms using the Network Analysis of Trust Relationship among Users (사용자 간 신뢰관계 네트워크 분석을 활용한 협업 필터링 알고리즘의 예측 정확도 개선)

  • Choi, Seulbi;Kwahk, Kee-Young;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.113-127
    • /
    • 2016
  • Among the techniques for recommendation, collaborative filtering (CF) is commonly recognized to be the most effective for implementing recommender systems. Until now, CF has been popularly studied and adopted in both academic and real-world applications. The basic idea of CF is to create recommendation results by finding correlations between users of a recommendation system. CF system compares users based on how similar they are, and recommend products to users by using other like-minded people's results of evaluation for each product. Thus, it is very important to compute evaluation similarities among users in CF because the recommendation quality depends on it. Typical CF uses user's explicit numeric ratings of items (i.e. quantitative information) when computing the similarities among users in CF. In other words, user's numeric ratings have been a sole source of user preference information in traditional CF. However, user ratings are unable to fully reflect user's actual preferences from time to time. According to several studies, users may more actively accommodate recommendation of reliable others when purchasing goods. Thus, trust relationship can be regarded as the informative source for identifying user's preference with accuracy. Under this background, we propose a new hybrid recommender system that fuses CF and social network analysis (SNA). The proposed system adopts the recommendation algorithm that additionally reflect the result analyzed by SNA. In detail, our proposed system is based on conventional memory-based CF, but it is designed to use both user's numeric ratings and trust relationship information between users when calculating user similarities. For this, our system creates and uses not only user-item rating matrix, but also user-to-user trust network. As the methods for calculating user similarity between users, we proposed two alternatives - one is algorithm calculating the degree of similarity between users by utilizing in-degree and out-degree centrality, which are the indices representing the central location in the social network. We named these approaches as 'Trust CF - All' and 'Trust CF - Conditional'. The other alternative is the algorithm reflecting a neighbor's score higher when a target user trusts the neighbor directly or indirectly. The direct or indirect trust relationship can be identified by searching trust network of users. In this study, we call this approach 'Trust CF - Search'. To validate the applicability of the proposed system, we used experimental data provided by LibRec that crawled from the entire FilmTrust website. It consists of ratings of movies and trust relationship network indicating who to trust between users. The experimental system was implemented using Microsoft Visual Basic for Applications (VBA) and UCINET 6. To examine the effectiveness of the proposed system, we compared the performance of our proposed method with one of conventional CF system. The performances of recommender system were evaluated by using average MAE (mean absolute error). The analysis results confirmed that in case of applying without conditions the in-degree centrality index of trusted network of users(i.e. Trust CF - All), the accuracy (MAE = 0.565134) was lower than conventional CF (MAE = 0.564966). And, in case of applying the in-degree centrality index only to the users with the out-degree centrality above a certain threshold value(i.e. Trust CF - Conditional), the proposed system improved the accuracy a little (MAE = 0.564909) compared to traditional CF. However, the algorithm searching based on the trusted network of users (i.e. Trust CF - Search) was found to show the best performance (MAE = 0.564846). And the result from paired samples t-test presented that Trust CF - Search outperformed conventional CF with 10% statistical significance level. Our study sheds a light on the application of user's trust relationship network information for facilitating electronic commerce by recommending proper items to users.

Abnormal Water Temperature Prediction Model Near the Korean Peninsula Using LSTM (LSTM을 이용한 한반도 근해 이상수온 예측모델)

  • Choi, Hey Min;Kim, Min-Kyu;Yang, Hyun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.3
    • /
    • pp.265-282
    • /
    • 2022
  • Sea surface temperature (SST) is a factor that greatly influences ocean circulation and ecosystems in the Earth system. As global warming causes changes in the SST near the Korean Peninsula, abnormal water temperature phenomena (high water temperature, low water temperature) occurs, causing continuous damage to the marine ecosystem and the fishery industry. Therefore, this study proposes a methodology to predict the SST near the Korean Peninsula and prevent damage by predicting abnormal water temperature phenomena. The study area was set near the Korean Peninsula, and ERA5 data from the European Center for Medium-Range Weather Forecasts (ECMWF) was used to utilize SST data at the same time period. As a research method, Long Short-Term Memory (LSTM) algorithm specialized for time series data prediction among deep learning models was used in consideration of the time series characteristics of SST data. The prediction model predicts the SST near the Korean Peninsula after 1- to 7-days and predicts the high water temperature or low water temperature phenomenon. To evaluate the accuracy of SST prediction, Coefficient of determination (R2), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) indicators were used. The summer (JAS) 1-day prediction result of the prediction model, R2=0.996, RMSE=0.119℃, MAPE=0.352% and the winter (JFM) 1-day prediction result is R2=0.999, RMSE=0.063℃, MAPE=0.646%. Using the predicted SST, the accuracy of abnormal sea surface temperature prediction was evaluated with an F1 Score (F1 Score=0.98 for high water temperature prediction in summer (2021/08/05), F1 Score=1.0 for low water temperature prediction in winter (2021/02/19)). As the prediction period increased, the prediction model showed a tendency to underestimate the SST, which also reduced the accuracy of the abnormal water temperature prediction. Therefore, it is judged that it is necessary to analyze the cause of underestimation of the predictive model in the future and study to improve the prediction accuracy.

Diagnostic Efficacy of FDG-PET Imaging in Solitary Pulmonary Nodule (고립성폐결절의 진단시 FDG-PET의 임상적 유용성에 관한 연구)

  • Cheon, Eun Mee;Kim, Byung-Tae;Kwon, O. Jung;Kim, Hojoong;Chung, Man Pyo;Rhee, Chong H.;Han, Yong Chol;Lee, Kyung Soo;Shim, Young Mog;Kim, Jhingook;Han, Jungho
    • Tuberculosis and Respiratory Diseases
    • /
    • v.43 no.6
    • /
    • pp.882-893
    • /
    • 1996
  • Background : Over one-third of solitary pulmonary nodules are malignant, but most malignant SPNs are in the early stages at diagnosis and can be cured by surgical removal. Therefore, early diagnosis of malignant SPN is essential for the lifesaving of the patient. The incidence of pulmonary tuberculosis in Korea is somewhat higher than those of other countries and a large number of SPNs are found to be tuberculoma. Most primary physicians tend to regard newly detected solitary pulmonary nodule as tuberculoma with only noninvasive imaging such as CT and they prefer clinical observation if the findings suggest benignancy without further invasive procedures. Many kinds of noninvasive procedures for confirmatory diagnosis have been introduced to differentiate malignant SPNs from benign ones, but none of them has been satisfactory. FOG-PET is a unique tool for imaging and quantifying the status of glucose metabolism. On the basis that glucose metabolism is increased in the malignant transfomled cells compared with normal cells, FDG-PET is considered to be the satisfactory noninvasive procedure which can differentiate malignant SPNs from benign SPNs. So we performed FOG-PET in patients with solitary pulmonary nodule and evaluated the diagnostic accuracy in the diagnosis of malignant SPNs. Method : 34 patients with a solitary pulmonary nodule less than 6 cm of irs diameter who visited Samsung Medical Center from Semptember, 1994 to Semptember, 1995 were evaluated prospectively. Simple chest roentgenography, chest computer tomography, FOG-PET scan were performed for all patients. The results of FOG-PET were evaluated comparing with the results of final diagnosis confirmed by sputum study, PCNA, fiberoptic bronchoscopy, or thoracotomy. Results : (I) There was no significant difference in nodule size between malignant (3.1 1.5cm) and benign nodule(2.81.0cm)(p>0.05). (2) Peal SUV(standardized uptake value) of malignant nodules (6.93.7) was significantly higher than peak SUV of benign nodules(2.71.7) and time-activity curves showed continuous increase in malignant nodules. (3) Three false negative cases were found among eighteen malignant nodule by the FDG-PET imaging study and all three cases were nonmucinous bronchioloalveolar carcinoma less than 2 em diameter. (4) FOG-PET imaging resulted in 83% sensitivity, 100% specificity, 100% positive predictive value and 84% negative predictive value. Conclusion: FOG-PET imaging is a new noninvasive diagnostic method of solitary pulmonary nodule thai has a high accuracy of differential diagnosis between malignant and benign nodule. FDG-PET imaging could be used for the differential diagnosis of SPN which is not properly diagnosed with conventional methods before thoracotomy. Considering the high accuracy of FDG-PET imaging, this procedure may play an important role in making the dicision to perform thoracotomy in diffcult cases.

  • PDF

The Usefulness According to the Incubation Time of PTH as Prediction Index of Hypocalcemia (저칼슘혈증 예측지표로서 부갑상선 호르몬 검사반응시간에 따른 유용성)

  • Au, Doo-Hee;Kim, Ji-Young;Seok, Jae-Dong
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.1
    • /
    • pp.138-142
    • /
    • 2010
  • Purpose: PTH (parathyroid hormone) level is a useful index for prediction of hypocalcemia after thyroidectomy. The fast results are required for an early diagnosis of hypocalcemia. In this study, we evaluated the PTH change according to incubation time, and investigated the usefulness of hypocalcemia diagnosis of PTH results in early incubation time. Materials and Methods: The subjects were 131 patients who had taken the PTH test from July to August in 2009. All experiments were used IRMA method. PTH value were evaluated with the correlation between precision (10 times repeat) and recovery rate and at 0.5, 3, 6 and $18{\pm}2$ (below overnight) hours following incubation time. Data analysis was investigated with relationship of the sensitivity, specificity, PPV (positive predictive value) and accuracy. Results: The correlation was time-dependent with levels reaching $R^2$=0.987 at 0.5 hours, $R^2$=0.993 at 3 hours and $R^2$=0.996 at 6 hours compare to overnight levels. The precision (%CV${\pm}$SD) were $15.92{\pm}15.54$ at 0.5 hours, $6.91{\pm}7.38$ at 3 hours, $4.30{\pm}4.69$ at 6 hours and $4.59{\pm}2.59$ at overnight. The recovery rate (%Mean${\pm}$SD) were $96.8{\pm}5.44$ at 0.5 hours, $102.6{\pm}4.35$ at 3 hours, $100.7{\pm}2.56$ at 6 hours and $102.2{\pm}5.98$ at overnight. When 15 pg/ml of overnight density was set up as criteria, we measured the sensitivity, specificity and PPV, accuracy at 0.5, 3, 6 hours. The sensitivity was shown to 97.5% at all times. The specificity was 96.0% at 0.5 hours, 100% at 3 hours and 92.3% at 6 hours for control, respectively. The PPV was 86.6% at 0.5 hours, 100% at 3 hours and 92.8% at 6 hours. The accuracy was shown to 84.7% at 0.5 hours, 97.5% at 3 hours and 90.6% at 6 hours. These data were accompanied by a corresponding PTH value of overnight incubation time, which significantly correlated with early time results. Conclusion: The values of PTH at 3 hours has favorable the rate of concordance of 94.1% and may be useful for prediction of hypocalcemia, and it responses to overnight incubation PTH values. Therefore, This method may be an attractive alternative to proper treatment to stop symptom revelation by giving a calcium agent to the patient.

  • PDF

Unbilled Revenue and Analysts' Earnings Forecasts (진행기준 수익인식 방법과 재무분석가 이익예측 - 미청구공사 계정을 중심으로 -)

  • Lee, Bo-Mi;Park, Bo-Young
    • Management & Information Systems Review
    • /
    • v.36 no.3
    • /
    • pp.151-165
    • /
    • 2017
  • This study investigates the effect of revenue recognition by percentage of completion method on financial analysts' earnings forecasting information in order industry. Specifically, we examines how the analysts' earnings forecast errors and biases differ according to whether or not to report the unbilled revenue account balance and the level of unbilled revenue account balance. The sample consists of 453 firm-years listed in Korea Stock Exchange during the period from 2010 to 2014 since the information on unbilled revenue accounts can be obtained after the adoption of K-IFRS. The results are as follows. First, we find that the firms with unbilled revenue account balances have lower analysts' earnings forecast accuracy than the firms who do not report unbilled revue account balances. In addition, we find that the accuracy of analysts' earnings forecasts decreases as the amount of unbilled revenue increases. Unbilled revenue account balances occur when the revenue recognition of the contractor is faster than the client. There is a possibility that managerial discretionary judgment and estimation may intervene when the contractor calculates the progress rate. The difference between the actual progress of the construction and the progress recognized by the company lowers the predictive value of financial statements. Our results suggest that the analysts' earnings forecasts may be more difficult for the firms that report unbilled revenue balances as applying the revenue recognition method based on the progress criteria. Second, we find that the firms reporting unbilled revenue account balances tend to have higher the optimistic biases in analysts' earnings forecast than the firms who do not report unbilled revenue account balances. And we find that the analysts' earnings forecast biases are increases as the amount of unbilled revenue increases. This study suggests an effort to reduce the arbitrary adjustment and estimation in the measurement of the progress as well as the introduction of the progress measurement method which can reflect the actual progress. Investors are encouraged to invest and analyze the characteristics of the order-based industry accounting standards. In addition, the results of this study empower the accounting transparency enhancement plan for order industry proposed by the policy authorities.

  • PDF

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

The Intelligent Determination Model of Audience Emotion for Implementing Personalized Exhibition (개인화 전시 서비스 구현을 위한 지능형 관객 감정 판단 모형)

  • Jung, Min-Kyu;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.39-57
    • /
    • 2012
  • Recently, due to the introduction of high-tech equipment in interactive exhibits, many people's attention has been concentrated on Interactive exhibits that can double the exhibition effect through the interaction with the audience. In addition, it is also possible to measure a variety of audience reaction in the interactive exhibition. Among various audience reactions, this research uses the change of the facial features that can be collected in an interactive exhibition space. This research develops an artificial neural network-based prediction model to predict the response of the audience by measuring the change of the facial features when the audience is given stimulation from the non-excited state. To present the emotion state of the audience, this research uses a Valence-Arousal model. So, this research suggests an overall framework composed of the following six steps. The first step is a step of collecting data for modeling. The data was collected from people participated in the 2012 Seoul DMC Culture Open, and the collected data was used for the experiments. The second step extracts 64 facial features from the collected data and compensates the facial feature values. The third step generates independent and dependent variables of an artificial neural network model. The fourth step extracts the independent variable that affects the dependent variable using the statistical technique. The fifth step builds an artificial neural network model and performs a learning process using train set and test set. Finally the last sixth step is to validate the prediction performance of artificial neural network model using the validation data set. The proposed model is compared with statistical predictive model to see whether it had better performance or not. As a result, although the data set in this experiment had much noise, the proposed model showed better results when the model was compared with multiple regression analysis model. If the prediction model of audience reaction was used in the real exhibition, it will be able to provide countermeasures and services appropriate to the audience's reaction viewing the exhibits. Specifically, if the arousal of audience about Exhibits is low, Action to increase arousal of the audience will be taken. For instance, we recommend the audience another preferred contents or using a light or sound to focus on these exhibits. In other words, when planning future exhibitions, planning the exhibition to satisfy various audience preferences would be possible. And it is expected to foster a personalized environment to concentrate on the exhibits. But, the proposed model in this research still shows the low prediction accuracy. The cause is in some parts as follows : First, the data covers diverse visitors of real exhibitions, so it was difficult to control the optimized experimental environment. So, the collected data has much noise, and it would results a lower accuracy. In further research, the data collection will be conducted in a more optimized experimental environment. The further research to increase the accuracy of the predictions of the model will be conducted. Second, using changes of facial expression only is thought to be not enough to extract audience emotions. If facial expression is combined with other responses, such as the sound, audience behavior, it would result a better result.

Mathematical Transformation Influencing Accuracy of Near Infrared Spectroscopy (NIRS) Calibrations for the Prediction of Chemical Composition and Fermentation Parameters in Corn Silage (수 처리 방법이 근적외선분광법을 이용한 옥수수 사일리지의 화학적 조성분 및 발효품질의 예측 정확성에 미치는 영향)

  • Park, Hyung-Soo;Kim, Ji-Hye;Choi, Ki-Choon;Kim, Hyeon-Seop
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.36 no.1
    • /
    • pp.50-57
    • /
    • 2016
  • This study was conducted to determine the effect of mathematical transformation on near infrared spectroscopy (NIRS) calibrations for the prediction of chemical composition and fermentation parameters in corn silage. Corn silage samples (n=407) were collected from cattle farms and feed companies in Korea between 2014 and 2015. Samples of silage were scanned at 1 nm intervals over the wavelength range of 680~2,500 nm. The optical data were recorded as log 1/Reflectance (log 1/R) and scanned in intact fresh condition. The spectral data were regressed against a range of chemical parameters using partial least squares (PLS) multivariate analysis in conjunction with several spectral math treatments to reduce the effect of extraneous noise. The optimum calibrations were selected based on the highest coefficients of determination in cross validation ($R^2{_{cv}}$) and the lowest standard error of cross validation (SECV). Results of this study revealed that the NIRS method could be used to predict chemical constituents accurately (correlation coefficient of cross validation, $R^2{_{cv}}$, ranging from 0.77 to 0.91). The best mathematical treatment for moisture and crude protein (CP) was first-order derivatives (1, 16, 16, and 1, 4, 4), whereas the best mathematical treatment for neutral detergent fiber (NDF) and acid detergent fiber (ADF) was 2, 16, 16. The calibration models for fermentation parameters had lower predictive accuracy than chemical constituents. However, pH and lactic acids were predicted with considerable accuracy ($R^2{_{cv}}$ 0.74 to 0.77). The best mathematical treatment for them was 1, 8, 8 and 2, 16, 16, respectively. Results of this experiment demonstrate that it is possible to use NIRS method to predict the chemical composition and fermentation quality of fresh corn silages as a routine analysis method for feeding value evaluation to give advice to farmers.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Classification Algorithm-based Prediction Performance of Order Imbalance Information on Short-Term Stock Price (분류 알고리즘 기반 주문 불균형 정보의 단기 주가 예측 성과)

  • Kim, S.W.
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.157-177
    • /
    • 2022
  • Investors are trading stocks by keeping a close watch on the order information submitted by domestic and foreign investors in real time through Limit Order Book information, so-called price current provided by securities firms. Will order information released in the Limit Order Book be useful in stock price prediction? This study analyzes whether it is significant as a predictor of future stock price up or down when order imbalances appear as investors' buying and selling orders are concentrated to one side during intra-day trading time. Using classification algorithms, this study improved the prediction accuracy of the order imbalance information on the short-term price up and down trend, that is the closing price up and down of the day. Day trading strategies are proposed using the predicted price trends of the classification algorithms and the trading performances are analyzed through empirical analysis. The 5-minute KOSPI200 Index Futures data were analyzed for 4,564 days from January 19, 2004 to June 30, 2022. The results of the empirical analysis are as follows. First, order imbalance information has a significant impact on the current stock prices. Second, the order imbalance information observed in the early morning has a significant forecasting power on the price trends from the early morning to the market closing time. Third, the Support Vector Machines algorithm showed the highest prediction accuracy on the day's closing price trends using the order imbalance information at 54.1%. Fourth, the order imbalance information measured at an early time of day had higher prediction accuracy than the order imbalance information measured at a later time of day. Fifth, the trading performances of the day trading strategies using the prediction results of the classification algorithms on the price up and down trends were higher than that of the benchmark trading strategy. Sixth, except for the K-Nearest Neighbor algorithm, all investment performances using the classification algorithms showed average higher total profits than that of the benchmark strategy. Seventh, the trading performances using the predictive results of the Logical Regression, Random Forest, Support Vector Machines, and XGBoost algorithms showed higher results than the benchmark strategy in the Sharpe Ratio, which evaluates both profitability and risk. This study has an academic difference from existing studies in that it documented the economic value of the total buy & sell order volume information among the Limit Order Book information. The empirical results of this study are also valuable to the market participants from a trading perspective. In future studies, it is necessary to improve the performance of the trading strategy using more accurate price prediction results by expanding to deep learning models which are actively being studied for predicting stock prices recently.