• Title/Summary/Keyword: Ensemble clustering

Search Result 37, Processing Time 0.023 seconds

Review on Genetic Algorithms for Pattern Recognition (패턴 인식을 위한 유전 알고리즘의 개관)

  • Oh, Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.1
    • /
    • pp.58-64
    • /
    • 2007
  • In pattern recognition field, there are many optimization problems having exponential search spaces. To solve of sequential search algorithms seeking sub-optimal solutions have been used. The algorithms have limitations of stopping at local optimums. Recently lots of researches attempt to solve the problems using genetic algorithms. This paper explains the huge search spaces of typical problems such as feature selection, classifier ensemble selection, neural network pruning, and clustering, and it reviews the genetic algorithms for solving them. Additionally we present several subjects worthy of noting as future researches.

An Ensemble Model for Machine Failure Prediction (앙상블 모델 기반의 기계 고장 예측 방법)

  • Cheon, Kang Min;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.1
    • /
    • pp.123-131
    • /
    • 2020
  • There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.

Context-Aware Fusion with Support Vector Machine (Support Vector Machine을 이용한 문맥 인지형 융합)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.6
    • /
    • pp.19-26
    • /
    • 2014
  • An ensemble classifier system is a widely-used multi-classifier system, which combines the results from each classifier and, as a result, achieves better classification result than any single classifier used. Several methods have been used to build an ensemble classifier including boosting, which is a cascade method where misclassified examples in previous stage are used to boost the performance in current stage. Boosting is, however, a serial method which does not form a complete feedback loop. In this paper, proposed is context sensitive SVM ensemble (CASE) which adopts SVM, one of the best classifiers in term of classification rate, as a basic classifier and clustering method to divide feature space into contexts. As CASE divides feature space and trains SVMs simultaneously, the result from one component can be applied to the other and CASE achieves better result than boosting. Experimental results prove the usefulness of the proposed method.

Analysis and Application of Power Consumption Patterns for Changing the Power Consumption Behaviors (전력소비행위 변화를 위한 전력소비패턴 분석 및 적용)

  • Jang, MinSeok;Nam, KwangWoo;Lee, YonSik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.603-610
    • /
    • 2021
  • In this paper, we extract the user's power consumption patterns, and model the optimal consumption patterns by applying the user's environment and emotion. Based on the comparative analysis of these two patterns, we present an efficient power consumption method through changes in the user's power consumption behavior. To extract significant consumption patterns, vector standardization and binary data transformation methods are used, and learning about the ensemble's ensemble with k-means clustering is applied, and applying the support factor according to the value of k. The optimal power consumption pattern model is generated by applying forced and emotion-based control based on the learning results for ensemble aggregates with relatively low average consumption. Through experiments, we validate that it can be applied to a variety of windows through the number or size adjustment of clusters to enable forced and emotion-based control according to the user's intentions by identifying the correlation between the number of clusters and the consistency ratios.

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

  • Seungsik Kim;Nami Gu;Jeongin Moon;Keunwook Kim;Yeongeun Hwang;Kyeongjun Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.485-499
    • /
    • 2023
  • This study aimed to predict the number of meals served in a group cafeteria using machine learning methodology. Features of the menu were created through the Word2Vec methodology and clustering, and a stacking ensemble model was constructed using Random Forest, Gradient Boosting, and CatBoost as sub-models. Results showed that CatBoost had the best performance with the ensemble model showing an 8% improvement in performance. The study also found that the date variable had the greatest influence on the number of diners in a cafeteria, followed by menu characteristics and other variables. The implications of the study include the potential for machine learning methodology to improve predictive performance and reduce food waste, as well as the removal of subjective elements in menu classification. Limitations of the research include limited data cases and a weak model structure when new menus or foreign words are not included in the learning data. Future studies should aim to address these limitations.

Association-rule based ensemble clustering for adopting a prior knowledge (사전정보 활용을 위한 관련 규칙 기반의 Ensemble 클러스터링)

  • Go, Song;Kim, Dae-Won
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.67-70
    • /
    • 2007
  • 본 논문은 클러스터링 문제에서 사전 정보에 대한 활용의 효율을 개선시킬 수 있는 방법을 제안한다. 클러스터링에서 사전 정보의 존재 시 이의 활용은 성능을 개선시킬 수 있는 계기가 될 수 있으므로 그의 활용 폭을 늘리기 위한 방법으로 다양한 사용 방법의 적용인 semi-supervised 클러스터링 앙상블을 제안한다. 사전 정보의 활용 방법의 방안으로써 association-rule의 개념을 접목하였다. 클러스터 수를 다르게 적용하더라도 패턴간의 유사도가 높으면 같은 그룹에 속할 확률은 높아진다. 다양한 초기화에 따른 클러스터의 동작은 사전 정보의 활용을 다양화 시키게 되며, 사전 정보에 충족하는 각각의 클러스터 결과를 제시한다. 결과를 총 취합하여 association-matrix를 형성하면 패턴간의 유사도를 얻을 수 있으며 결국 association-matrix를 통해 클러스터링 할 수 있는 방법을 제시한다.

  • PDF

Mapping the real-space distributions of galaxies in SDSS DR7

  • Shi, Feng
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.1
    • /
    • pp.78.1-78.1
    • /
    • 2019
  • Using a method to correct redshift space distortion (RSD) for individual galaxies, we mapped the real space distributions of galaxies in the Sloan Digital Sky Survey (SDSS) Data Release 7(DR7). We use an ensemble of mock catalogs to demonstrate the reliability of this extension, showing that it allows for an accurate recovery of the real-space correlation functions and galaxy biases. We also demonstrate that, using an iterative method applied to intermediate scale clustering data, we can obtain an unbiased estimate of the growth rate of structure $f\sigma_8$, which is related to the clustering amplitude of matter, to an accuracy of $\sim 10\%$. Applying this method to the Sloan Digital Sky Survey (SDSS) Data Release 7 (DR7), we construct a real-space galaxy catalog spanning the redshift range $0.01 \leq z \leq 0.2$, which contains 584,473 galaxies in the North Galactic Cap (NGC). Using this data we, infer $0.376 \pm 0.038$ at a median redshift z=0.1, which is consistent with the WMAP9 cosmology at $1\sigma$ level. By combining this measurement with the real-space clustering of galaxies and with galaxy-galaxy weak lensing measurements for the same sets of galaxies, we are able to break the degeneracy between $f$, $\sigma_8$ and $b$. From the SDSS DR7 data alone, we obtain the following cosmological constraints at redshift $z=0.1$ for galaxies.

  • PDF

Outlier detection of main engine data of a ship using ensemble method (앙상블 기법을 이용한 선박 메인엔진 빅데이터의 이상치 탐지)

  • KIM, Dong-Hyun;LEE, Ji-Hwan;LEE, Sang-Bong;JUNG, Bong-Kyu
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.56 no.4
    • /
    • pp.384-394
    • /
    • 2020
  • This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.

Vacant Technology Forecasting using Ensemble Model (앙상블모형을 이용한 공백기술예측)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.3
    • /
    • pp.341-346
    • /
    • 2011
  • A vacant technology forecasting is an important issue in management of technology. The forecast of vacant technology leads to the growth of nation and company. So, we need the results of technology developments until now to predict the vacant technology. Patent is an objective thing of the results in research and development of technology. We study a predictive method for forecasting the vacant technology quantitatively using patent data in this paper. We propose an ensemble model that is to vote some clustering criteria because we can't guarantee a model is optimal. Therefore, an objective and accurate forecasting model of vacant technology is researched in our paper. This model combines statistical analysis methods with machine learning algorithms. To verify our performance evaluation objectively, we make experiments using patent documents of diverse technology fields.