• Title/Summary/Keyword: K-Nearest Neighbor algorithm

Search Result 265, Processing Time 0.031 seconds

SOMk-NN Search Algorithm for Content-Based Retrieval (내용기반 검색을 위한 SOMk-NN탐색 알고리즘)

  • O, Gun-Seok;Kim, Pan-Gu
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.358-366
    • /
    • 2002
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the high speed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Maps(SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space and generates a topological feature map. A topological feature map preserves the mutual relations (similarities) in feature spaces of input data, and clusters mutually similar feature vectors in a neighboring nodes. Therefore each node of the topological feature map holds a node vector and similar images that is closest to each node vector. We implemented a k-NN search for similar image classification as to (1) access to topological feature map, and (2) apply to pruning strategy of high speed search. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Automatic Document Categorization Using K-Nearest Neighbor Algorithm and Object-Oriented Thesaurus (K-NN과 객체 지향 시소러스를 이용한 웹 문서 자동 분류)

  • 방선이;양재동
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.145-147
    • /
    • 2001
  • 문서 자동 분류에는 통계적인 기법과 machine learning 기법의 맡은 알고리즘들이 이용되고 있다. 통계적인 기법 알고리즘을 이용한 문서 분류는 높은 성능을 보이지만 분류할 카테고리가 둘 이상인 경우가 빈번할 경우에는 정확률이 급격히 저하되는 단점이 있다. 본 논문에서는 K-NN알고리즘을 이용하여 일차적인 문서 분류를 수행한 후 특정 카테고리로 분류하기에 애매모호한 경우가 생길 경우 시소러스의 일반화 관계와 연관화 관계를 이용하여 모호성을 줄임으로써 문서 자동 분류의 성능을 높이기 위한 새 기법을 제안한다.

  • PDF

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

An Online Forklift Dispatching Algorithm Based on Minimal Cost Assignment Approach (최소 비용할당 기반 온라인 지게차 운영 알고리즘)

  • kwon, BoBae;Son, Jung-Ryoul;Ha, Byung-Hyun
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.2
    • /
    • pp.71-81
    • /
    • 2018
  • Forklifts in a shipyard lift and transport heavy objects. Tasks occur dynamically and the rate of the task occurrence changes over time. Especially, the rate of the task occurrence is high immediately after morning and afternoon business hours. The weight of objects varies according to task characteristic, and a forklift also has the workable or allowable weight limit. In this study, we propose an online forklift dispatching algorithm based on nearest-neighbor dispatching rule using minimal cost assignment approach in order to attain the efficient operations. The proposed algorithm considers various types of forklift and multiple jobs at the same time to determine the dispatch plan. We generate dummy forklifts and dummy tasks to handle unbalance in the numbers of forklifts and tasks by taking their capacity limits and weights. In addition, a method of systematic forklift selection is also devised considering the condition of the forklift. The performance indicator is the total travel distance and the average task waiting time. We validate our approach against the priority rule-based method of the previous study by discrete-event simulation.

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

Comparison of the performance of classification algorithms using cytotoxicity data (세포독성 자료를 이용한 분류 알고리즘 성능 비교)

  • Yoon, Yeochang;Jeung, Eui Bae;Jo, Na Rae;Ju, Su In;Lee, Sung Duck
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.417-426
    • /
    • 2018
  • An alternative developmental toxicity test using mouse embryonic stem cell derived embryoid bodies has been developed. This alternative method is not to administer chemicals to animals, but to treat chemicals with cells. This study suggests the use of Discriminant Analysis, Support Vector Machine, Artificial Neural Network and k-Nearest Neighbor. Algorithm performance was compared with accuracy and a weighted Cohen's kappa coefficient. In application, various classification techniques were applied to cytotoxicity data to classify drug toxicity and compare the results.

An Efficient KNN Query Processing Method in Sensor Networks (센서 네트워크에서 효율적인 KNN 질의처리 방법)

  • Son, In-Keun;Hyun, Dong-Joon;Chung, Yon-Dohn;Lee, Eun-Kyu;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.429-440
    • /
    • 2005
  • As rapid improvement in electronic technologies makes sensor hardware more powerful and capable, the application range of sensor networks Is getting to be broader. The main purpose of sensor networks is to monitor the phenomena in interesting regions (e.g., factory warehouses, disaster areas, wild fields, etc) and return required data. The k Nearest Neighbor (KNN) query that finds k objects which are geographically close to the given point is an Important application in sensor networks. However, most previous approaches are either seem to be impractical or are not energy-efficient in resource-limited sensor networks. In this paper. we propose an efficient KNN query processing method in sensor networks. In the proposed method, we dynamically increase searching boundary, if necessary, and traverse nodes inside the boundary until finding k nearest neighbors. Since only the representative sensor nodes are visited, our algorithm reduces a number of messages. We show thorough experiments that the proposed method performs better than the existing method in various network environments.

Neighborhood Selection with Intrinsic Partitions (데이터 분포에 기반한 유사 군집 선택법)

  • Kim, Kye-Hyeon;Choi, Seung-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10c
    • /
    • pp.428-432
    • /
    • 2007
  • We present a novel method for determining k nearest neighbors, which accurately recognizes the underlying clusters in a data set. To this end, we introduce the "tiling neighborhood" which is constructed by tiling a number of small local circles rather than a single circle, as existing neighborhood schemes do. Then we formulate the problem of determining the tiling neighborhood as a minimax optimization, leading to an efficient message passing algorithm. For several real data sets, our method outperformed the k-nearest neighbor method. The results suggest that our method can be an alternative to existing for general classification tasks, especially for data sets which have many missing values.

  • PDF

A K-Nearest Neighbor Search Algorithm for DGR-Tree (DGR-Tree를 위한 KNN 검색 알고리즘)

  • Lee, Deuk-Woo;Kang, Hong-Koo;Han, Ki-Joon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.799-800
    • /
    • 2009
  • 유비쿼터스 컴퓨팅 환경에서의 LBS에서는 점차 대용량화 및 밀집화 경향을 보이는 POI에 대한 빠른 KNN 검색이 중요하다. 따라서 본 논문에서는 기존의 DGR-Tree를 위해서 POI에 대한 빠른 KNN 검색을 위한 KNN 검색 알고리즘을 제시하고, 또한 성능 평가를 통해 그 우수성을 입증한다.