• Title/Summary/Keyword: k-Means Clustering

Search Result 1,119, Processing Time 0.03 seconds

Design and Implementation of a Sound Classification System for Context-Aware Mobile Computing (상황 인식 모바일 컴퓨팅을 위한 사운드 분류 시스템의 설계 및 구현)

  • Kim, Joo-Hee;Lee, Seok-Jun;Kim, In-Cheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.2
    • /
    • pp.81-86
    • /
    • 2014
  • In this paper, we present an effective sound classification system for recognizing the real-time context of a smartphone user. Our system avoids unnecessary consumption of limited computational resource by filtering both silence and white noise out of input sound data in the pre-processing step. It also improves the classification performance on low energy-level sounds by amplifying them as pre-processing. Moreover, for efficient learning and application of HMM classification models, our system executes the dimension reduction and discretization on the feature vectors through k-means clustering. We collected a large amount of 8 different type sound data from daily life in a university research building and then conducted experiments using them. Through these experiments, our system showed high classification performance.

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation (데이터셋 유형 분류를 통한 클래스 불균형 해소 방법 및 분류 알고리즘 추천)

  • Kim, Jeonghun;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.23-43
    • /
    • 2022
  • In order to apply AI (Artificial Intelligence) in various industries, interest in algorithm selection is increasing. Algorithm selection is largely determined by the experience of a data scientist. However, in the case of an inexperienced data scientist, an algorithm is selected through meta-learning based on dataset characteristics. However, since the selection process is a black box, it was not possible to know on what basis the existing algorithm recommendation was derived. Accordingly, this study uses k-means cluster analysis to classify types according to data set characteristics, and to explore suitable classification algorithms and methods for resolving class imbalance. As a result of this study, four types were derived, and an appropriate class imbalance resolution method and classification algorithm were recommended according to the data set type.

Forecasting the Growth of Smartphone Market in Mongolia Using Bass Diffusion Model (Bass Diffusion 모델을 활용한 스마트폰 시장의 성장 규모 예측: 몽골 사례)

  • Anar Bataa;KwangSup Shin
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.193-212
    • /
    • 2022
  • The Bass Diffusion Model is one of the most successful models in marketing research, and management science in general. Since its publication in 1969, it has guided marketing research on diffusion. This paper illustrates the usage of the Bass diffusion model, using mobile cellular subscription diffusion as a context. We fit the bass diffusion model to three large developed markets, South Korea, Japan, and China, and the emerging markets of Vietnam, Thailand, Kazakhstan, and Mongolia. We estimate the parameters of the bass diffusion model using the nonlinear least square method. The diffusion of mobile cellular subscriptions does follow an S-curve in every case. After acquiring m, p, and q parameters we use k-Means Cluster Analysis for grouping countries into three groups. By clustering countries, we suggest that diffusion rates and patterns are similar, where countries with emerging markets can follow in the footsteps of countries with developed markets. The purpose was to predict the timing and the magnitude of the market maturity and to determine whether the data follow the typical diffusion curve of innovations from the Bass model.

Design of pRBFNNs Pattern Classifier-based Face Recognition System Using 2-Directional 2-Dimensional PCA Algorithm ((2D)2PCA 알고리즘을 이용한 pRBFNNs 패턴분류기 기반 얼굴인식 시스템 설계)

  • Oh, Sung-Kwun;Jin, Yong-Tak
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.1
    • /
    • pp.195-201
    • /
    • 2014
  • In this study, face recognition system was designed based on polynomial Radial Basis Function Neural Networks(pRBFNNs) pattern classifier using 2-directional 2-dimensional principal component analysis algorithm. Existing one dimensional PCA leads to the reduction of dimension of image expressed by the multiplication of rows and columns. However $(2D)^2PCA$(2-Directional 2-Dimensional Principal Components Analysis) is conducted to reduce dimension to each row and column of image. and then the proposed intelligent pattern classifier evaluates performance using reduced images. The proposed pRBFNNs consist of three functional modules such as the condition part, the conclusion part, and the inference part. In the condition part of fuzzy rules, input space is partitioned with the aid of fuzzy c-means clustering. In the conclusion part of rules. the connection weight of RBFNNs is represented as the linear type of polynomial. The essential design parameters (including the number of inputs and fuzzification coefficient) of the networks are optimized by means of Differential Evolution. Using Yale and AT&T dataset widely used in face recognition, the recognition rate is obtained and evaluated. Additionally IC&CI Lab dataset is experimented with for performance evaluation.

Alleviating Semantic Term Mismatches in Korean Information Retrieval (한국어 정보 검색에서 의미적 용어 불일치 완화 방안)

  • Yun, Bo-Hyun;Park, Sung-Jin;Kang, Hyun-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3874-3884
    • /
    • 2000
  • An information retrieval system has to retrieve all and only documents which are relevant to a user query, even if index terms and query terms are not matched exactly. However, term mismatches between index terms and qucry terms have been a serious obstacle to the enhancement of retrieval performance. In this paper, we discuss automatic term normalization between words in text corpora and their application to a Korean information retrieval system. We perform two types of term normalizations to alleviate semantic term mismatches: equivalence class and co-occurrence cluster. First, transliterations, spelling errors, and synonyms are normalized into equivalence classes bv using contextual similarity. Second, context-based terms are normalized by using a combination of mutual information and word context to establish word similarities. Next, unsupervised clustering is done by using K-means algorithm and co-occurrence clusters are identified. In this paper, these normalized term products are used in the query expansion to alleviate semantic tem1 mismatches. In other words, we utilize two kinds of tcrm normalizations, equivalence class and co-occurrence cluster, to expand user's queries with new tcrms, in an attempt to make user's queries more comprehensive (adding transliterations) or more specific (adding spc'Cializationsl. For query expansion, we employ two complementary methods: term suggestion and term relevance feedback. The experimental results show that our proposed system can alleviatl' semantic term mismatches and can also provide the appropriate similarity measurements. As a result, we know that our system can improve the rctrieval efficiency of the information retrieval system.

  • PDF

Real-Time Face Recognition Based on Subspace and LVQ Classifier (부분공간과 LVQ 분류기에 기반한 실시간 얼굴 인식)

  • Kwon, Oh-Ryun;Min, Kyong-Pil;Chun, Jun-Chul
    • Journal of Internet Computing and Services
    • /
    • v.8 no.3
    • /
    • pp.19-32
    • /
    • 2007
  • This paper present a new face recognition method based on LVQ neural net to construct a real time face recognition system. The previous researches which used PCA, LDA combined neural net usually need much time in training neural net. The supervised LVQ neural net needs much less time in training and can maximize the separability between the classes. In this paper, the proposed method transforms the input face image by PCA and LDA sequentially into low-dimension feature vectors and recognizes the face through LVQ neural net. In order to make the system robust to external light variation, light compensation is performed on the detected face by max-min normalization method as preprocessing. PCA and LDA transformations are applied to the normalized face image to produce low-level feature vectors of the image. In order to determine the initial centers of LVQ and speed up the convergency of the LVQ neural net, the K-Means clustering algorithm is adopted. Subsequently, the class representative vectors can be produced by LVQ2 training using initial center vectors. The face recognition is achieved by using the euclidean distance measure between the center vector of classes and the feature vector of input image. From the experiments, we can prove that the proposed method is more effective in the recognition ratio for the cases of still images from ORL database and sequential images rather than using conventional PCA of a hybrid method with PCA and LDA.

  • PDF

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

Influence of Self-driving Data Set Partition on Detection Performance Using YOLOv4 Network (YOLOv4 네트워크를 이용한 자동운전 데이터 분할이 검출성능에 미치는 영향)

  • Wang, Xufei;Chen, Le;Li, Qiutan;Son, Jinku;Ding, Xilong;Song, Jeongyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.6
    • /
    • pp.157-165
    • /
    • 2020
  • Aiming at the development of neural network and self-driving data set, it is also an idea to improve the performance of network model to detect moving objects by dividing the data set. In Darknet network framework, the YOLOv4 (You Only Look Once v4) network model was used to train and test Udacity data set. According to 7 proportions of the Udacity data set, it was divided into three subsets including training set, validation set and test set. K-means++ algorithm was used to conduct dimensional clustering of object boxes in 7 groups. By adjusting the super parameters of YOLOv4 network for training, Optimal model parameters for 7 groups were obtained respectively. These model parameters were used to detect and compare 7 test sets respectively. The experimental results showed that YOLOv4 can effectively detect the large, medium and small moving objects represented by Truck, Car and Pedestrian in the Udacity data set. When the ratio of training set, validation set and test set is 7:1.5:1.5, the optimal model parameters of the YOLOv4 have highest detection performance. The values show mAP50 reaching 80.89%, mAP75 reaching 47.08%, and the detection speed reaching 10.56 FPS.

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.