• Title/Summary/Keyword: K-Mean++ 클러스터링

Search Result 83, Processing Time 0.028 seconds

Image Clustering Using Machine Learning : Study of InceptionV3 with K-means Methods. (머신 러닝을 사용한 이미지 클러스터링: K-means 방법을 사용한 InceptionV3 연구)

  • Nindam, Somsauwt;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.681-684
    • /
    • 2021
  • In this paper, we study image clustering without labeling using machine learning techniques. We proposed an unsupervised machine learning technique to design an image clustering model that automatically categorizes images into groups. Our experiment focused on inception convolutional neural networks (inception V3) with k-mean methods to cluster images. For this, we collect the public datasets containing Food-K5, Flowers, Handwritten Digit, Cats-dogs, and our dataset Rice Germination, and the owner dataset Palm print. Our experiment can expand into three-part; First, format all the images to un-label and move to whole datasets. Second, load dataset into the inception V3 extraction image features and transferred to the k-mean cluster group hold on six classes. Lastly, evaluate modeling accuracy using the confusion matrix base on precision, recall, F1 to analyze. In this our methods, we can get the results as 1) Handwritten Digit (precision = 1.000, recall = 1.000, F1 = 1.00), 2) Food-K5 (precision = 0.975, recall = 0.945, F1 = 0.96), 3) Palm print (precision = 1.000, recall = 0.999, F1 = 1.00), 4) Cats-dogs (precision = 0.997, recall = 0.475, F1 = 0.64), 5) Flowers (precision = 0.610, recall = 0.982, F1 = 0.75), and our dataset 6) Rice Germination (precision = 0.997, recall = 0.943, F1 = 0.97). Our experiment showed that modeling could get an accuracy rate of 0.8908; the outcomes state that the proposed model is strongest enough to differentiate the different images and classify them into clusters.

A Study of using Emotional Features for Information Retrieval Systems (감정요소를 사용한 정보검색에 관한 연구)

  • Kim, Myung-Gwan;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.579-586
    • /
    • 2003
  • In this paper, we propose a novel approach to employ emotional features to document retrieval systems. Fine emotional features, such as HAPPY, SAD, ANGRY, FEAR, and DISGUST, have been used to represent Korean document. Users are allowed to use these features for retrieving their documents. Next, retrieved documents are learned by classification methods like cohesion factor, naive Bayesian, and, k-nearest neighbor approaches. In order to combine various approaches, voting method has been used. In addition, k-means clustering has been used for our experimentation. The performance of our approach proved to be better in accuracy than other methods, and be better in short texts rather than large documents.

lustering of Categorical Data using Rough Entropy (러프 엔트로피를 이용한 범주형 데이터의 클러스터링)

  • Park, Inkyoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.183-188
    • /
    • 2013
  • A variety of cluster analysis techniques prerequisite to cluster objects having similar characteristics in data mining. But the clustering of those algorithms have lots of difficulties in dealing with categorical data within the databases. The imprecise handling of uncertainty within categorical data in the clustering process stems from the only algebraic logic of rough set, resulting in the degradation of stability and effectiveness. This paper proposes a information-theoretic rough entropy(RE) by taking into account the dependency of attributes and proposes a technique called min-mean-mean roughness(MMMR) for selecting clustering attribute. We analyze and compare the performance of the proposed technique with K-means, fuzzy techniques and other standard deviation roughness methods based on ZOO dataset. The results verify the better performance of the proposed approach.

A Comprehensive Performance Evaluation in Collaborative Filtering (협업필터링에서 포괄적 성능평가 모델)

  • Yu, Seok-Jong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.4
    • /
    • pp.83-90
    • /
    • 2012
  • In e-commerce systems that deal with a large number of items, the function of personalized recommendation is essential. Collaborative filtering that is a successful recommendation algorithm, suffers from the sparsity, cold-start, and scalability restrictions. Additionally, this work raises a new flaw of the algorithm, inconsistent performance of recommendation. This is also not measurable by the current MAE-based evaluation that does not consider the deviation of prediction error, and furthermore is performed independently of precision and recall measurement. To evaluate the collaborative filtering comprehensively, this work proposes an extended evaluation model that includes the current criteria such as MAE, Precision, Recall, deviation, and applies it to cluster-based combined collaborative filtering.

VAD By Neural Network Under Wireless Communication Systems (Neural Network을 이용한 무선 통신시스템에서의 VAD)

  • Lee Hosun;Kim Sukyung;Park Sung-Kwon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12C
    • /
    • pp.1262-1267
    • /
    • 2005
  • Elliptical basis function (EBF) neural network works stably under high-level background noise environment and makes the nonlinear processing possible. It can be adapted real time VAD with simple design. This paper introduces VAD implementation using EBF and the experimental results show that EBF VAD outperforms G729 Annex B and RBF neural networks. The best error rates achieved by the EBF networks were improved more than $70\%$ in speech and $50\%$ in silence while that achieved by G.729 Annex B and RBF networks respectively.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Operation diagnostic based on PCA for wastewater treatment (PCA를 이용한 하폐수처리시설 운전상태진단)

  • Jun Byong-Hee;Park Jang-Hwan;Chun Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.3
    • /
    • pp.383-388
    • /
    • 2006
  • SBR is one of the most general sewage/wastewater treatment processes and, particularly, has an advantage in high concentration wastewater treatment like sewage wastewater. A Kernel PCA based fault diagnosis system for biological reaction in full-scale wastewater treatment plant was proposed using only common bio-chemical sensors such as ORP(Oxidation-Reduction Potential) and DO(Dissolved Oxygen). During the SBR operation, the operation status could be divided into normal status and abnormal status such as controller malfunction, influent disturbance and instrumental trouble. For the classification and diagnosis of these statuses, a series of preprocessing, dimension reduction using PCA, LDA, K-PCA and feature reduction was performed. Also, the diagnosis result using differential data was superior to that of raw data, and the fusion data show better results than other data. Also, the results of combination of K-PCA and LDA were better than those of LDA or (PCA+LDA). Finally, the fault recognition rate in case of using only ORP or DO was around maximum 97.03% and the fusion method showed better result of maximum 98.02%.

Improved FCM Algorithm using Entropy-based Weight and Intercluster (엔트로피 기반의 가중치와 분포크기를 이용한 향상된 FCM 알고리즘)

  • Kwak Hyun-Wook;Oh Jun-Taek;Sohn Young-Ho;Kim Wook-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.4 s.310
    • /
    • pp.1-8
    • /
    • 2006
  • This paper proposes an improved FCM(Fuzzy C-means) algorithm using intercluster and entropy-based weight in gray image. The fuzzy clustering methods have been extensively used in the image segmentation since it extracts feature information of the region. Most of fuzzy clustering methods have used the FCM algorithm. But, FCM algorithm is still sensitive to noise, as it does not include spatial information. In addition, it can't correctly classify pixels according to the feature-based distributions of clusters. To solve these problems, we applied a weight and intercluster to the traditional FCM algorithm. A weight is obtained from the entropy information based on the cluster's number of neighboring pixels. And a membership for one pixel is given based on the information considering the feature-based intercluster. Experiments has confirmed that the proposed method was more tolerant to noise and superior to existing methods.

Autonomous Battle Tank Detection and Aiming Point Search Using Imagery (영상정보에 기초한 전차 자율탐지 및 조준점탐색 연구)

  • Kim, Jong-Hwan;Jung, Chi-Jung;Heo, Mira
    • Journal of the Korea Society for Simulation
    • /
    • v.27 no.2
    • /
    • pp.1-10
    • /
    • 2018
  • This paper presents an autonomous detection and aiming point computation of a battle tank by using RGB images. Maximally stable extremal regions algorithm was implemented to find features of the tank, which are matched with images extracted from streaming video to figure out the region of interest where the tank is present. The median filter was applied to remove noises in the region of interest and decrease camouflage effects of the tank. For the tank segmentation, k-mean clustering was used to autonomously distinguish the tank from its background. Also, both erosion and dilation algorithms of morphology techniques were applied to extract the tank shape without noises and generate the binary image with 1 for the tank and 0 for the background. After that, Sobel's edge detection was used to measure the outline of the tank by which the aiming point at the center of the tank was calculated. For performance measurement, accuracy, precision, recall, and F-measure were analyzed by confusion matrix, resulting in 91.6%, 90.4%, 85.8%, and 88.1%, respectively.

A Study on the Extraction of Slope Surface Orientation using LIDAR with respect to Triangulation Method and Sampling on the Point Cloud (LIDAR를 이용한 삼차원 점군 데이터의 삼각망 구성 방법 및 샘플링에 따른 암반 불연속면 방향 검출에 관한 연구)

  • Lee, Sudeuk;Jeon, Seokwon
    • Tunnel and Underground Space
    • /
    • v.26 no.1
    • /
    • pp.46-58
    • /
    • 2016
  • In this study, a LIDAR laser scanner was used to scan a rock slope around Mt. Gwanak and to produce point cloud from which directional information of rock joint surfaces shall be extracted. It was analyzed using two different algorithms, i.e. Ball Pivoting and Wrap algorithm, and four sampling intervals, i.e. raw, 2, 5, and 10 cm. The results of Fuzzy K-mean clustering were analyzed on the stereonet. As a result, the Ball Pivoting and Wrap algorithms were considered suitable for extraction of rock surface orientation. In the case of 5 cm sampling interval, both triangulation algorithms extracted the most number of the patch and patched area.