• Title/Summary/Keyword: Average nearest neighbor analysis

Search Result 33, Processing Time 0.028 seconds

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Biometrics Based on Multi-View Features of Teeth Using Principal Component Analysis (주성분분석을 이용한 치아의 다면 특징 기반 생체식별)

  • Chang, Chan-Wuk;Kim, Myung-Su;Shin, Young-Suk
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.4
    • /
    • pp.445-455
    • /
    • 2007
  • We present a new biometric identification system based on multi-view features of teeth using principal components analysis(PCA). The multi-view features of teeth consist of the frontal view, the left side view and the right side view. In this paper, we try to stan the foundations of a dental biometrics for secure access in real life environment. We took the pictures of the three views teeth in the experimental environment designed specially and 42 principal components as the features for individual identification were developed. The classification for individual identification based on the nearest neighbor(NN) algorithm is created with the distance between the multi-view teeth and the multi-view teeth rotated. The identification performance after rotating two degree of test data is 95.2% on the left side view teeth and 91.3% on the right side view teeth as the average values.

  • PDF

A Comparative Study using Bibliometric Analysis Method on the Reformed Theology and Evangelicalism (개혁신학과 복음주의에 관한 계량서지학적 비교 연구)

  • Yoo, Yeong Jun;Lee, Jae Yun
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.29 no.3
    • /
    • pp.41-63
    • /
    • 2018
  • This study aimed at analyzing journals and index terms, authors of the reformed theology and evangelicalism, neutral theological position by using bibliometrical analyzing methods. The analyzing methods are average linkage and neighbor centralities, profile cosine similarities. Especially, when analyzing the relationship between authors, we interpreted the research topic by finding the key shared index terms between the authors. In the journal analysis results, 9 journals were largely clustered together in the two clusters of the reformed theology and evangelicalism, but Presbyterian Theological Quarterly that is thought to be a reformed journal was clustered in evangelical cluster. In the index terms analysis results of the clusters, the reformed theology and evangelicalism were key words representing the two clusters. In the authors' analysis results, we had 9 clusters and the Presbyterian theologian studying the reformed theology had the four clusters and the non-Presbyterian theologian had the 5 clusters. Therefore, we consistently had the two clusters of the reformed theology and evangelicalism in all the analysis of the journals and the index terms, the authors.

Fault Diagnosis of Ball Bearing using Correlation Dimension (상관차원에 의한 볼베어링 고장진단)

  • 김진수;최연선
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2004.05a
    • /
    • pp.979-984
    • /
    • 2004
  • The ball bearing having faults generally shows, nonlinear vibration characteristics. For the effective method of fault diagnosis on bail bearing, non-linear diagnostic methods can be used. In this paper, the correlation dimension analysis based on nonlinear timeseries was applied to diagnose the faults of ball bearing. The correlation dimension analysis shows some Intrinsic information of underlying dynamical systems, and clear the classification of the fault of ball bearing.

  • PDF

Seasonal Prevalence of Mosquitoes and Ecological Characteristics of Anopheline Larval Occurrence in Gimpo, Gyeonggi Province, Republic of Korea (경기도 김포시 모기의 계절적 발생소장 및 얼룩날개모기 유충 발생원의 생태학적 특성)

  • Kim, Yong Ki;Lee, Cheol Min;Lee, Jeong Boon;Bae, Yang-Seop
    • Korean journal of applied entomology
    • /
    • v.51 no.4
    • /
    • pp.305-312
    • /
    • 2012
  • This study was conducted to elucidate the seasonal prevalence of adult mosquitoes and occurrence of anopheline larvae in Gimpo, Gyeonggi province, Republic of Korea. Sampling of adult mosquitoes was carried out 18 times from May to September in 2008, using light traps at six sites. A total of 48,919 individuals belonging to 16 species and 8 genera were collected. Aedes vexans nipponii (43%) was the most frequently collected, followed by Anopheles sinensis (34%) and Culex tritaeniorhynchus (14%). The seasonal prevalence of anopheline mosquitoes had a different occurrence compared to that of Cx. tritaeniorhynchus. Anopheline larvae were collected using a standard dipper (size = of 350 ml) at eight habitat types of 203 point locations. Anopheline larvae were confirmed from 138 points (63%). The average larval population per dipper was highest in the lotus plantation (6.9 individuals), followed by irrigation channel (4.5 ind.), dropwort field (3.4 ind.), fallow field (3.1 ind.), paddy field (1.8 ind.), swamp (1.4 ind.), and stream (0.2 ind.). There was no significant relationship observed between larval density and DO, pH, salinity, or distance from blood meal hosts (cowshed, pigsty, chicken-house). In the results of average nearest neighbor analysis (ANNA), the distribution of whole points for mosquito larval collection was clustered, and points with over 4.0 larvae per dipper were randomly distributed. Within the study area Haseong-myeon, those points where over 3.0 larvae were collected were of a dispersed distribution.

Analysis of Bluetooth Indoor Localization Technologies and Experiemnt of Correlation between RSSI and Distance

  • Kim, Yang-Su;Jang, Beakcheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.10
    • /
    • pp.55-62
    • /
    • 2016
  • In this paper, we present indoor localization technologies using the bluetooth signal categorizing them into proximity based, triangulation based and fingerprinting based technologies. Then we provide localization accuracy improvement algorithms such as moving average, K-means, particle filter, and K-Nearest neighbor algorithms. We define important performance issues for indoor localization technologies and analyze recent technologies according to the performance issues. Finally we provide experimental results for correlation between RSSI and distance. We believe that this paper provide wise view and necessary information for recent localization technologies using the bluetooth signal.

Classification of Korean Traditional Musical Instruments Using Feature Functions and k-nearest Neighbor Algorithm (특성함수 및 k-최근접이웃 알고리즘을 이용한 국악기 분류)

  • Kim Seok-Ho;Kwak Kyung-Sup;Kim Jae-Chun
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.3
    • /
    • pp.279-286
    • /
    • 2006
  • Classification method used in this paper is applied for the first time to Korean traditional music. Among the frequency distribution vectors, average peak value is suggested and proved effective comparing to previous classification success rate. Mean, variance, spectral centroid, average peak value and ZCR are used to classify Korean traditional musical instruments. To achieve Korean traditional instruments automatic classification, Spectral analysis is used. For the spectral domain, Various functions are introduced to extract features from the data files. k-NN classification algorithm is applied to experiments. Taegum, gayagum and violin are classified in accuracy of 94.44% which is higher than previous success rate 87%.

  • PDF

Pattern Analysis for Urban Spatial Distribution of Traffic Accidents in Jinju (진주시 교통사고의 도시공간분포패턴 분석)

  • Sung, Byeong Jun;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.3
    • /
    • pp.99-105
    • /
    • 2014
  • Since traffic accidents account for the highest proportion of the artificial disasters which occur in urban areas along with fire, more scientific an analysis on the causes of traffic accidents and various prevention measures against traffic accidents are needed. In this study, the research selected Jinju-si, which belongs to local small and medium-sized cities as a research target to analyze the characteristics of temporal and spacial distribution of traffic accidents by associating the data of traffic accidents, occurred in 2013 with the causes of traffic accidents and location information that includes occurrence time and seasonal features. It subsequently examines the spatial correlation between traffic accidents and the characteristics of urban space development according to the plans of land using. As a result, the characteristics of accident distribution according to the types of accidents reveal that side right-angle collisions (car versus car) and pedestrian-crossing accident (car versus man) showed the highest clustering in the density analysis and average nearest neighbor analysis. In particular, traffic accidents occurred the most on roads which connect urban central commercial areas, high-density residential areas, and industrial areas. In addition, human damage in damage conditions, clear day in weather condition, dry condition in the road condition, and three-way intersection in the road way showed the highest clustering.

A Study on the Weight of W-KNN for WiFi Fingerprint Positioning (WiFi 핑거프린트 위치추정 방식에서 W-KNN의 가중치에 관한 연구)

  • Oh, Jongtaek
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.6
    • /
    • pp.105-111
    • /
    • 2017
  • In this paper, the analysis results are shown about several weights of Weighted K-Nearest Neighbor method, Recently, it is employed for the indoor positioning technologies using WiFi fingerprint which has been actively studied. In spite of the simplest feature, the W-KNN method shows comparable performance to another methods using WiFi fingerprint technology. So W-KNN method has employed in the existing indoor positioning system. It shows positioning error performance according to data preprocessing and weight factor, and the analysis on the weight is very important. In this paper, based on the real measured WiFi fingerprint data, the estimation error is analyzed and the performances are compared, for the case of data processing methods, of the weight of average, variance, and distance, and of the averaging several position of number K. These results could be practically useful to construct the real indoor positioning system.

A novel approach of ship wakes target classification based on the LBP-IBPANN algorithm

  • Bo, Liu;Yan, Lin;Liang, Zhang
    • Ocean Systems Engineering
    • /
    • v.4 no.1
    • /
    • pp.53-62
    • /
    • 2014
  • The detection of ship wakes image can demonstrate substantial information regarding on a ship, such as its tonnage, type, direction, and speed of movement. Consequently, the wake target recognition is a favorable way for ship identification. This paper proposes a Local Binary Pattern (LBP) approach to extract image features (wakes) for training an Improved Back Propagation Artificial Neural Network (IBPANN) to identify ship speed. This method is applied to sort and recognize the ship wakes of five different speeds images, the result shows that the detection accuracy is satisfied as expected, the average correctness rates of wakes target recognition at the five speeds may be achieved over 80%. Specifically, the lower ship's speed, the better accurate rate, sometimes it's accuracy could be close to 100%. In addition, one significant feature of this method is that it can receive a higher recognition rate than the nearest neighbor classification method.