The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)
-
- Journal of Intelligence and Information Systems
- /
- v.25 no.3
- /
- pp.239-251
- /
- 2019
Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.
Image matching is a crucial preprocessing step for effective utilization of multi-temporal and multi-sensor very high resolution (VHR) satellite images. Deep learning (DL) method which is attracting widespread interest has proven to be an efficient approach to measure the similarity between image pairs in quick and accurate manner by extracting complex and detailed features from satellite images. However, Image matching of VHR satellite images remains challenging due to limitations of DL models in which the results are depending on the quantity and quality of training dataset, as well as the difficulty of creating training dataset with VHR satellite images. Therefore, this study examines the feasibility of DL-based method in matching pair extraction which is the most time-consuming process during image registration. This paper also aims to analyze factors that affect the accuracy based on the configuration of training dataset, when developing training dataset from existing multi-sensor VHR image database with bias for DL-based image matching. For this purpose, the generated training dataset were composed of correct matching pairs and incorrect matching pairs by assigning true and false labels to image pairs extracted using a grid-based Scale Invariant Feature Transform (SIFT) algorithm for a total of 12 multi-temporal and multi-sensor VHR images. The Siamese convolutional neural network (SCNN), proposed for matching pair extraction on constructed training dataset, proceeds with model learning and measures similarities by passing two images in parallel to the two identical convolutional neural network structures. The results from this study confirm that data acquired from VHR satellite image database can be used as DL training dataset and indicate the potential to improve efficiency of the matching process by appropriate configuration of multi-sensor images. DL-based image matching techniques using multi-sensor VHR satellite images are expected to replace existing manual-based feature extraction methods based on its stable performance, thus further develop into an integrated DL-based image registration framework.
Although the concept of light is important in the elementary school curriculum, substantial research suggests that students and teachers have difficulties in understanding it. Therefore, it is necessary to analyze the reasons for these difficulties-whether it is due to the content or due to the presentation method of contents, structure, and expression. The national curriculum and textbooks of Korea, the US, China, and Japan were comparatively analyzed from the following perspectives: 1) key concepts of light, 2) structure of light units in the textbook, 3) materials, light sources, and optics used in light units. Consequently, there were differences between countries in their inclusion of the concept of light in the curriculum. In particular, the Korean curriculum studies the concept of refraction by a convex lens, whereas the concept of light, light source, and vision is not introduced. Furthermore, countries also differed in their structuring of units. The Korean curriculum was presented segmentally by concept rather than structured according to core ideas or perspectives, and the connection between concepts was unclear. In addition, there were differences between the countries in materials, light sources, and optical instruments to explain key concepts. On using light, the US curriculum provides a purpose and uses light to achieve it, and China and Korea understand the concept. It was divided into the method of using the material to deepen. Based on the results of this analysis, the implications for the elementary science curriculum in Korea were derived as follows. First, it is necessary to introduce concepts sequentially and organize them so that the connection between concepts is well expressed. Second, it is necessary to introduce light and light sources as the predominant concepts. Third, it is necessary to include the principle of seeing objects. Fourth, it is necessary to adjust the material and content level of the refraction concept included in the light and lens unit. Fifth, an integrated approach is required because light has a deep connection with various concepts included in the elementary science curriculum.
The Rural Revitalization Strategy (2018-2022), published by the Chinese State Council in 2018, represents a new period of rural development in China. Suburban areas are more convenient than other rural areas in integrated urban-rural development but are under greater pressure from construction and industrial pollution. As a rural area with a high proportion of rural areas, it would be valuable for Henan province to gain a comprehensive grasp of rural human settlementst while identifying problems and proposing solutions. The purpose of this study is to analyze the satisfaction of the evaluation items based on the usage status and life perception of the residents of Tai Nan village, a suburb-type rural village in Henan province. The study proposes improvement programs based on the evaluation results. As a result of the study, 24 evaluation items were derived and divided into five categories: "Living Service Facilities", "Housing Environment, "Road Environment", "Health & Ecology Environment", and "Social & Cultural Environment". The Fuzzy Comprehensive Evaluation Method was used to find the overall satisfaction level of the human living environment in Tai Nan village, which was "average", among which "Living Service Facilities" was the most important "Health & Ecology Environment" was the least satisfied. Based on these results, an improvement plan is proposed in three stages. First, the living service will be improved while strengthening the facility management of the hygiene and the ecological environment. Second, reasonable improvement of housing and the road environment will be applied. Third, programs will be introduced to cultivate residents' ability to build their own and improve the social and cultural environment. This study provides basic data for the future improvement of rural settlements in the suburban areas of Henan province and is of great significance in gradually improving the the residents' quality of life.
The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The purpose of this study was to find out various problem behaviors of children who were not diagnosed with any disability, but instead, engaged in problem behaviors. This study also intended to review the difficulties of children with problem behaviors, their teachers' difficulties and needs, to suggest support for child care and education teachers. This study conducted semi-structured qualitative interviews with eight child care and education teachers. The interviews were transcribed into text and analyzed by contents. The results of this study are as follows. Problem behaviors of children described by teachers were classified into external and internal types. In addition, children with problem behaviors had experienced difficulties in maintaining relationships with their teachers, peers and parents. Many teachers were not successful to provide appropriate support for preschoolers who demonstrated problem behaviors in classrooms or some teachers provided individualized support. Teachers adapted the behavioral and the psychological approaches to problem behaviors of preschoolers. However, teachers reported difficulties with children with problem behavior and brought up the following issues on teaching children with problem behaviors; managing troubled matters happening in the class, difficulty in controlling teacher's emotions on problem behaviors, the lack of time, the integrated child care time without teacher in charge of child, the interruption in activity progress, the lack of a special way to deal with problem behaviors, and difficulty in cooperation with families through parents-teacher counseling sessions. Teachers counseled with parents who had a child with problem behaviors and revealed that parents reacted to problem behaviors in various ways such as embarrassment, acceptance, ignorance, or avoidance. Most teachers received assistance and support for teaching children with problem behaviors, from families, local communities and in-service training. Lastly, teachers with preschoolers with problem behaviors needed the support of experts on managing behavior problems, assistant teaching personnel, education for parents and teachers, respects for teachers, psychological counseling or play therapy from professional service agencies, diagnosis service at child care and education centers which children attended, and support networking with agencies. Teachers also required the family support of medical diagnosis and psychological counseling and financial support from the government.
The primary goal of this research was to develop an optimized analytical procedure for soil analysis based on ion-selective microelectrodes for agricultural purposes, which can perform on-site measurement of various ions in soil easily and rapidly. For the simple and rapid on-site diagnosis, an analysis of soil chemicals was performed employing a multicomponent-in-situ-extractant and an evaluation of ionselective microelectrodes were conducted through the regressive correlation method with a standard analytical approach widely employed in this area. Examination of sensor responses between various soil nutrient extractants revealed that 0.01M HCl and 1M LiCl provided the most ideal Nernstian response. However, 1M LiCl deteriorated the selective response for analytes due to high concentration (1M) of lithium cation. Thus, employing either 0.1M HCl as an extractant followed by 10 times dilution, or 0.01M HCl as an extractant without further dilution was chosen as the optimal extractant composition. A study of regressive correlation between results from ion-selective microelectrodes and those from the standard analytical procedure showed that analyses of
This study is a fundamental research in the field of home economics education to enforce vocational competencies. It was carried out in the purpose of examining the recent economical and social environmental changes and its management system related to the vocational training in the field of home economics education. It seeks change in direction in relation to the National Competency Standard(NCS) based on revisions in the educational system. The method of study was mostly through reference and data analysis, professional advisory and public hearing. The main research results are as follows. First, the main environmental change factors in relation to vocational training have been integrated to the changes in; population structure, gender related economic activities, generation composition, communications technology, and innovation of living technique. These change factors are forecasting innovations in related industries, lifestyle changes, demand for manpower and changes in capabilities required for each specific profession. Second, according to the analysis of current home economics education training, vocational home educations high school accounts for 9.4% of the total number of specialized high schools, where 8 standard departments are specialized in and characterized into 137 different department names. Despite differences among departments, overall employment rate of graduates were measured 44.7%, which rates above the entrance rate of 41.9%. These numbers show great change since 2010(overall employment rate 16.9%, entrance rate 75.2%), a meaningful outcome resulting from changes in policy from the previous employment-centered education system. Third, NCS based on high school vocational home economics education system revision and investigations in change of direction in vocational home economics, this study attempts to provide background for revision from the development of NCS. It also provides proposals for restructuring division of current classification and departments of home economics education, and propositions for further future research.