• Title/Summary/Keyword: K-nearest neighbors (KNN)

Search Result 48, Processing Time 0.018 seconds

On the use of weighted adaptive nearest neighbors for missing value imputation (가중 적응 최근접 이웃을 이용한 결측치 대치)

  • Yum, Yunjin;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.507-516
    • /
    • 2018
  • Widely used among the various single imputation methods is k-nearest neighbors (KNN) imputation due to its robustness even when a parametric model such as multivariate normality is not satisfied. We propose a weighted adaptive nearest neighbors imputation method that combines the adaptive nearest neighbors imputation method that accounts for the local features of the data in the KNN imputation method and weighted k-nearest neighbors method that are less sensitive to extreme value or outlier among k-nearest neighbors. We conducted a Monte Carlo simulation study to compare the performance of the proposed imputation method with previous imputation methods.

Adaptive Nearest Neighbors를 활용한 결측치 대치

  • 전명식;정형철
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.185-190
    • /
    • 2004
  • 비모수적 결측치 대치 방법으로 널리 사용되는 k-nearest neighbors(KNN) 방법은 자료의 국소적(local) 특징을 고려하지 않고 전체 자료에 대해 균일한 이웃의 개수 k를 사용하는 단점이 있다. 본 연구에서는 KNN의 대안으로 자료의 국소적 특징을 고려하는 adaptive nearest neighbors(ANN) 방법을 제안하였다. 나아가 microarray 자료의 경우에 대하여 결측치 대치를 통해 KNN과 ANN의 성능을 비교하였다.

  • PDF

On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation (Weighted k-Nearest Neighbors를 이용한 결측치 대치)

  • Lim, Chanhui;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.23-31
    • /
    • 2015
  • A conventional missing value problem in the statistical analysis k-Nearest Neighbor(KNN) method are used for a simple imputation method. When one of the k-nearest neighbors is an extreme value or outlier, the KNN method can create a bias. In this paper, we propose a Weighted k-Nearest Neighbors(WKNN) imputation method that can supplement KNN's faults. A Monte-Carlo simulation study is also adapted to compare the WKNN method and KNN method using real data set.

Performance Analysis of Fingerprinting Method for LTE Positioning according to W-KNN Correlation Techniques in Urban Area (도심지역 LTE 측위를 위한 Fingerprinting 기법의 W-KNN Correlation 기술에 따른 성능 분석)

  • Kwon, Jae-Uk;Cho, Seong Yun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.6
    • /
    • pp.1059-1068
    • /
    • 2021
  • In urban areas, GPS(Global Positioning System)/GNSS(Global Navigation Satellite System) signals are blocked or distorted by structures such as buildings, which limits positioning. To compensate for this problem, in this paper, fingerprinting-based positioning using RSRP(: Reference Signal Received Power) information of LTE signals is performed. The W-KNN(Weighted - K Nearest Neighbors) technique, which is widely used in the positioning step of fingerprinting, yields different positioning performance results depending on the similarity distance calculation method and weighting method used in correlation. In this paper, the performance of the fingerprinting positioning according to the techniques used in correlation is comparatively analyzed experimentally.

On the Use of Sequential Adaptive Nearest Neighbors for Missing Value Imputation (순차 적응 최근접 이웃을 활용한 결측값 대치법)

  • Park, So-Hyun;Bang, Sung-Wan;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1249-1257
    • /
    • 2011
  • In this paper, we propose a Sequential Adaptive Nearest Neighbor(SANN) imputation method that combines the Adaptive Nearest Neighbor(ANN) method and the Sequential k-Nearest Neighbor(SKNN) method. When choosing the nearest neighbors of missing observations, the proposed SANN method takes the local feature of the missing observations into account as well as reutilizes the imputed observations in a sequential manner. By using a Monte Carlo study and a real data example, we demonstrate the characteristics of the SANN method and its potential performance.

Dynamic threshold location algorithm based on fingerprinting method

  • Ding, Xuxing;Wang, Bingbing;Wang, Zaijian
    • ETRI Journal
    • /
    • v.40 no.4
    • /
    • pp.531-536
    • /
    • 2018
  • The weighted K-nearest neighbor (WKNN) algorithm is used to reduce positioning accuracy, as it uses a fixed number of neighbors to estimate the position. In this paper, we propose a dynamic threshold location algorithm (DH-KNN) to improve positioning accuracy. The proposed algorithm is designed based on a dynamic threshold to determine the number of neighbors and filter out singular reference points (RPs). We compare its performance with the WKNN and Enhanced K-Nearest Neighbor (EKNN) algorithms in test spaces of networks with dimensions of $20m{\times}20m$, $30m{\times}30m$, $40m{\times}40m$ and $50m{\times}50m$. Simulation results show that the maximum position accuracy of DH-KNN improves by 31.1%, and its maximum position error decreases by 23.5%. The results demonstrate that our proposed method achieves better performance than other well-known algorithms.

Development of an Evaluation Index for Identifying Freeway Traffic Safety Based on Integrating RWIS and VDS Data (기상 및 교통 자료를 이용한 교통류 안전성 판단 지표 개발)

  • Park, Hyunjin;Joo, Shinhye;Oh, Cheol
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.5
    • /
    • pp.441-451
    • /
    • 2014
  • This study proposes a novel performance measure, which is referred to as Hazardous Spacing Index (HSI), to be used for evaluating safety of traffic stream on freeways. The basic principle of the proposed methodology is to investigate whether drivers would have sufficient stopping sight distance (SSD) under limited visibility conditions to eliminate rear-end crash potentials at every time step. Both Road Weather Information Systems (RWIS) and Vehicle Detection Systems (VDS) data were used to derive visibility distance (VD) and SSD, respectively. Moreover, the K-Nearest Neighbors (KNN) method was adopted to predict both VD and SSD in estimating predictive HSIs, which would be used to trigger advanced warning information to encourage safer driving. The outcome of this study is also expected to be used for monitoring freeway traffic stream in terms of safety.

Optimized KNN/IFCM Algorithm for Efficient Indoor Location (효율적인 실내 측위를 위한 최적화된 KNN/IFCM 알고리즘)

  • Lee, Jang-Jae;Song, Lick-Ho;Kim, Jong-Hwa;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.2
    • /
    • pp.125-133
    • /
    • 2011
  • For any pattern matching based algorithm in WLAN environment, the characteristics of signal to noise ratio(SNR) to multiple access points(APs) are utilized to establish database in the training phase, and in the estimation phase, the actual two dimensional coordinates of mobile unit(MU) are estimated based on the comparison between the new recorded SNR and fingerprints stored in database. As fingerprinting method, k-nearest neighbor(KNN) has been widely applied for indoor location in wireless location area networks(WLAN), but its performance is sensitive to number of neighbors k and positions of reference points(RPs). So intuitive fuzzy c-means(IFCM) clustering algorithm is applied to improve KNN, which is the KNN/IFCM hybrid algorithm presented in this paper. In the proposed algorithm, through KNN, k RPs are firstly chosen as the data samples of IFCM based on signal to noise ratio(SNR). Then, the k RPs are classified into different clusters through IFCM based on SNR. Experimental results indicate that the proposed KNN/IFCM hybrid algorithm generally outperforms KNN, KNN/FCM, KNN/PFCM algorithm when the locations error is less than 2m.

Classification of nuclear activity types for neighboring countries of South Korea using machine learning techniques with xenon isotopic activity ratios

  • Sang-Kyung Lee;Ser Gi Hong
    • Nuclear Engineering and Technology
    • /
    • v.56 no.4
    • /
    • pp.1372-1384
    • /
    • 2024
  • The discrimination of the source for xenon gases' release can provide an important clue for detecting the nuclear activities in the neighboring countries. In this paper, three machine learning techniques, which are logistic regression, support vector machine (SVM), and k-nearest neighbors (KNN), were applied to develop the predictive models for discriminating the source for xenon gases' release based on the xenon isotopic activity ratio data which were generated using the depletion codes, i.e., ORIGEN in SCALE 6.2 and Serpent, for the probable sources. The considered sources for the neighboring countries of South Korea include PWRs, CANDUs, IRT-2000, Yongbyun 5 MWe reactor, and nuclear tests with plutonium and uranium. The results of the analysis showed that the overall prediction accuracies of models with SVM and KNN using six inputs, all exceeded 90%. Particularly, the models based on SVM and KNN that used six or three xenon isotope activity ratios with three classification categories, namely reactor, plutonium bomb, and uranium bomb, had accuracy levels greater than 88%. The prediction performances demonstrate the applicability of machine learning algorithms to predict nuclear threat using ratios of xenon isotopic activity.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.