• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.023 seconds

Multiple Model Prediction System Based on Optimal TS Fuzzy Model and Its Applications to Time Series Forecasting (최적 TS 퍼지 모델 기반 다중 모델 예측 시스템의 구현과 시계열 예측 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.28 no.B
    • /
    • pp.101-109
    • /
    • 2008
  • In general, non-stationary or chaos time series forecasting is very difficult since there exists a drift and/or nonlinearities in them. To overcome this situation, we suggest a new prediction method based on multiple model TS fuzzy predictors combined with preprocessing of time series data, where, instead of time series data, the differences of them are applied to predictors as input. In preprocessing procedure, the candidates of optimal difference interval are determined by using con-elation analysis and corresponding difference data are generated. And then, for each of them, TS fuzzy predictor is constructed by using k-means clustering algorithm and least squares method. Finally, the best predictor which minimizes the performance index is selected and it works on hereafter for prediction. Computer simulation is performed to show the effectiveness and usefulness of our method.

  • PDF

The use of support vector machines in semi-supervised classification

  • Bae, Hyunjoo;Kim, Hyungwoo;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.193-202
    • /
    • 2022
  • Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but effective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.

A Study for Load Profile Generation of Electric Power Customer using Clustering Algorithm (클러스터링 기법을 이용한 전력 고객의 대표 부하패턴 생성에 대한 연구)

  • Kim, Young-Il;Choi, Hoon
    • Annual Conference of KIPS
    • /
    • 2008.05a
    • /
    • pp.435-438
    • /
    • 2008
  • 한전에서는 연간 전력 사용량이 높은 고압 고객에 대하여 전자식 전력량계를 설치하여 15분 단위로 전력 사용량을 수집하는 자동검침시스템을 운영하고 있다. 본 연구에서는 자동검침시스템을 통해 수집된 데이터를 이용하여 배전선로에 대한 부하를 분석하기 위해 자동검침 고객의 부하 데이터를 이용하여 클러스터링 기법을 통해 대표 부하패턴을 생성하는 방식을 제안하였다. 기존에는 계약종별 코드가 동일한 고객들의 부하패턴을 이용하여 15분 단위의 평균 사용량을 계산하여 대표 부하패턴을 생성하는 방식을 사용하였으나, 같은 계약종별 코드를 갖는 고객이라 할지라도 부하패턴이 다른 경우가 많아서 부하분석의 정확도를 떨어뜨렸다. 본 연구에서는 동일한 계약종별 코드를 갖는 고객에 대하여 15분 단위 자동검침 데이터를 이용하여 k-means 기법을 통해 고객을 분류하고 각 그룹마다 대표 부하패턴을 생성하는 방식을 제안하였다.

Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation (데이터셋 유형 분류를 통한 클래스 불균형 해소 방법 및 분류 알고리즘 추천)

  • Kim, Jeonghun;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.23-43
    • /
    • 2022
  • In order to apply AI (Artificial Intelligence) in various industries, interest in algorithm selection is increasing. Algorithm selection is largely determined by the experience of a data scientist. However, in the case of an inexperienced data scientist, an algorithm is selected through meta-learning based on dataset characteristics. However, since the selection process is a black box, it was not possible to know on what basis the existing algorithm recommendation was derived. Accordingly, this study uses k-means cluster analysis to classify types according to data set characteristics, and to explore suitable classification algorithms and methods for resolving class imbalance. As a result of this study, four types were derived, and an appropriate class imbalance resolution method and classification algorithm were recommended according to the data set type.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Early Detection of Lung Cancer Risk Using Data Mining

  • Ahmed, Kawsar;Abdullah-Al-Emran, Abdullah-Al-Emran;Jesmin, Tasnuba;Mukti, Roushney Fatima;Rahman, Md. Zamilur;Ahmed, Farzana
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.1
    • /
    • pp.595-598
    • /
    • 2013
  • Background: Lung cancer is the leading cause of cancer death worldwide Therefore, identification of genetic as well as environmental factors is very important in developing novel methods of lung cancer prevention. However, this is a multi-layered problem. Therefore a lung cancer risk prediction system is here proposed which is easy, cost effective and time saving. Materials and Methods: Initially 400 cancer and non-cancer patients' data were collected from different diagnostic centres, pre-processed and clustered using a K-means clustering algorithm for identifying relevant and non-relevant data. Next significant frequent patterns are discovered using AprioriTid and a decision tree algorithm. Results: Finally using the significant pattern prediction tools for a lung cancer prediction system were developed. This lung cancer risk prediction system should prove helpful in detection of a person's predisposition for lung cancer. Conclusions: Most of people of Bangladesh do not even know they have lung cancer and the majority of cases are diagnosed at late stages when cure is impossible. Therefore early prediction of lung cancer should play a pivotal role in the diagnosis process and for an effective preventive strategy.

Automatic Detection of Dead Trees Based on Lightweight YOLOv4 and UAV Imagery

  • Yuanhang Jin;Maolin Xu;Jiayuan Zheng
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.614-630
    • /
    • 2023
  • Dead trees significantly impact forest production and the ecological environment and pose constraints to the sustainable development of forests. A lightweight YOLOv4 dead tree detection algorithm based on unmanned aerial vehicle images is proposed to address current limitations in dead tree detection that rely mainly on inefficient, unsafe and easy-to-miss manual inspections. An improved logarithmic transformation method was developed in data pre-processing to display tree features in the shadows. For the model structure, the original CSPDarkNet-53 backbone feature extraction network was replaced by MobileNetV3. Some of the standard convolutional blocks in the original extraction network were replaced by depthwise separable convolution blocks. The new ReLU6 activation function replaced the original LeakyReLU activation function to make the network more robust for low-precision computations. The K-means++ clustering method was also integrated to generate anchor boxes that are more suitable for the dataset. The experimental results show that the improved algorithm achieved an accuracy of 97.33%, higher than other methods. The detection speed of the proposed approach is higher than that of YOLOv4, improving the efficiency and accuracy of the detection process.

Sequence-based Similar Music Retrieval Scheme (시퀀스 기반의 유사 음악 검색 기법)

  • Jun, Sang-Hoon;Hwang, Een-Jun
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.167-174
    • /
    • 2009
  • Music evokes human emotions or creates music moods through various low-level musical features. Typical music clip consists of one or more moods and this can be used as an important criteria for determining the similarity between music clips. In this paper, we propose a new music retrieval scheme based on the mood change patterns of music clips. For this, we first divide music clips into segments based on low level musical features. Then, we apply K-means clustering algorithm for grouping them into clusters with similar features. By assigning a unique mood symbol for each cluster, we can represent each music clip by a sequence of mood symbols. Finally, to estimate the similarity of music clips, we measure the similarity of their musical mood sequence using the Longest Common Subsequence (LCS) algorithm. To evaluate the performance of our scheme, we carried out various experiments and measured the user evaluation. We report some of the results.

  • PDF

Development of a Model for Dynamic Station Assignmentto Optimize Demand Responsive Transit Operation (수요대응형 모빌리티 최적 운영을 위한 동적정류장 배정 모형 개발)

  • Kim, Jinju;Bang, Soohyuk
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.1
    • /
    • pp.17-34
    • /
    • 2022
  • This paper develops a model for dynamic station assignment to optimize the Demand Responsive Transit (DRT) operation. In the process of optimization, we use the bus travel time as a variable for DRT management. In addition, walking time, waiting time, and delay due to detour to take other passengers (detour time) are added as optimization variables and entered for each DRT passenger. Based on a network around Anaheim, California, reserved origins and destinations of passengers are assigned to each demand responsive bus, using K-means clustering. We create a model for selecting the dynamic station and bus route and use Non-dominated Sorting Genetic Algorithm-III to analyze seven scenarios composed combination of the variables. The result of the study concluded that if the DRT operation is optimized for the DRT management, then the bus travel time and waiting time should be considered in the optimization. Moreover, it was concluded that the bus travel time, walking time, and detour time are required for the passenger.

Korean Phoneme Recognition Using Self-Organizing Feature Map (SOFM 신경회로망을 이용한 한국어 음소 인식)

  • Jeon, Yong-Koo;Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.101-112
    • /
    • 1995
  • In order to construct a feature map-based phoneme classification system for speech recognition, two procedures are usually required. One is clustering and the other is labeling. In this paper, we present a phoneme classification system based on the Kohonen's Self-Organizing Feature Map (SOFM) for clusterer and labeler. It is known that the SOFM performs self-organizing process by which optimal local topographical mapping of the signal space and yields a reasonably high accuracy in recognition tasks. Consequently, SOFM can effectively be applied to the recognition of phonemes. Besides to improve the performance of the phoneme classification system, we propose the learning algorithm combined with the classical K-mans clustering algorithm in fine-tuning stage. In order to evaluate the performance of the proposed phoneme classification algorithm, we first use totaly 43 phonemes which construct six intra-class feature maps for six different phoneme classes. From the speaker-dependent phoneme classification tests using these six feature maps, we obtain recognition rate of $87.2\%$ and confirm that the proposed algorithm is an efficient method for improvement of recognition performance and convergence speed.

  • PDF