• Title/Summary/Keyword: K means clustering

Search Result 1,118, Processing Time 0.03 seconds

A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

  • Nishanth, Kancherla Jonah;Ravi, Vadlamani
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.633-650
    • /
    • 2013
  • All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP)or GRNN in Stage-1and Stage-2respectively. Several experiments were conducted on 8benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.

A Study of the Fuzzy Clustering Algorithm using a Growth Curve Model (성장곡선을 이용한 퍼지군집분석 기법의 연구)

  • 김응환;이석훈
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.2
    • /
    • pp.439-448
    • /
    • 2001
  • 본 연구는 시간자료(Longitudinal data)의 분석을 위하여 Fuzzy k-means 군집분석 방법을 확장한 알고리즘을 제안한다. 이 논문에서 제안하는 군집분석방법은 각각의 개체에 대응하는 성장곡선에 Fuzzy k-means 군집분석의 알고리즘을 결합하는 것을 핵심아이디어로한다. 분석결과는 생성된 군집을 성장곡선모형으로 표현할 수 있고 또한 추정된 모형의 식을 활용하여 새로운 개체를 분류도 할수 있음을 보인다. 그리고 이 군집분석방법은 아직 자라지 않은 나이 어린 개체가 미래에 어느 군집에 속할 것인가 하는 분류와 함께 이 개체의 향후 성장상태를 예측을 하는 데에도 적용이 가능하다. 제안된 알고리즘을 원숭이(macaque)의 상악동(maxillary sinus)의 자료에 적용한 실례로 보인다.

  • PDF

Bootstrapping of Hanwoo Chromosome17 Based on BMS1167 Microsatellite Locus

  • Lee, Jea-Young;Lee, Yong-Won;Yeo, Jung-Sou
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.175-184
    • /
    • 2007
  • LOD scores and a permutation test for detecting and locating quantitative trait loci (QTL) from the Hanwoo economic trait have been described and we selected a considerable major BMS1167 locus for further analysis. K-means clustering analysis, for the major DNA marker mining of BMS1167 microsatellite loci in Hanwoo chromosome17, has been tried and three cluster groups divide four traits. The three cluster groups are classified according to eight DNA marker bps. Finally, we employed the bootstrap test method to calculate confidence intervals using the resampling method to find major DNA markers. We conclude that the major marker of BMS1167 locus in Hanwoo chromosome17 is only DNA marker 100bp.

  • PDF

Design of Genetically Optimized Context-based RBFNN (진화론적으로 최적화된 Context-based RBF 뉴럴 네트워크 설계)

  • Park, Ho-Sung;Oh, Sung-Kwun;Kim, Hyun-Ki
    • Proceedings of the IEEK Conference
    • /
    • 2009.05a
    • /
    • pp.258-260
    • /
    • 2009
  • 본 논문에서는 최적화 알고리즘인 유전자 알고리즘과 context-based FCM 클러스터링 방법을 이용하여 새로운 형태의 RBF 뉴럴 네트워크의 포괄적인 설계 방법론을 소개한다. 제안된 구조는 클러스터링 기법을 기반하여 사용된 데이터의 특성에 효과적인 모델을 구축하고자 한다. 또한 유전자 알고리즘을 이용하여 모델의 최적화에 주요한 영향을 미치는 파리미터들(-은닉층에서의 contex의 수, contex에 포괄되는 노드의 수, 그리고 contex에 입력되는 입력변수)을 동조한다. 제안된 모델의 설계 공정은 1) K-means 클러스터링을 통한 context fuzzy set에 대한 정의와 설계, 2) context-based fuzzy clustering에 대한 모델의 적용과 이에 따른 모델 구축의 효율성, 3) 유전자 알고리즘을 통한 모델 최적화를 위한 파라미터들의 최적화와 같은 단계로 구성되어 있다. 구축된 RBF 뉴럴 네트워크의 후반부 다항식에 대한 parameter들은 성능지수를 최소화하기 위해 Least Square Method에 의해서 보정된다. 본 논문에서는 모델을 설계함에 있어서 체계적인 설계 알고리즘을 포괄적으로 설명하고 있으며, 더 나아가 제안된 모델의 성능을 다른 표준적인 모델들과 대조함으로써 제안된 모델의 우수성을 나타내고자 한다.

  • PDF

Design of the Optimal Fuzzy Prediction Systems using RCGKA (RCGKA를 이용한 최적 퍼지 예측 시스템 설계)

  • Bang, Young-Keun;Shim, Jae-Son;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.29 no.B
    • /
    • pp.9-15
    • /
    • 2009
  • In the case of traditional binary encoding technique, it takes long time to converge the optimal solutions and brings about complexity of the systems due to encoding and decoding procedures. However, the ROGAs (real-coded genetic algorithms) do not require these procedures, and the k-means clustering algorithm can avoid global searching space. Thus, this paper proposes a new approach by using their advantages. The proposed method constructs the multiple predictors using the optimal differences that can reveal the patterns better and properties concealed in non-stationary time series where the k-means clustering algorithm is used for data classification to each predictor, then selects the best predictor. After selecting the best predictor, the cluster centers of the predictor are tuned finely via RCGKA in secondary tuning procedure. Therefore, performance of the predictor can be more enhanced. Finally, we verifies the prediction performance of the proposed system via simulating typical time series examples.

  • PDF

A Study on the Real-Time Preference Prediction for Personalized Recommendation on the Mobile Device (모바일 기기에서 개인화 추천을 위한 실시간 선호도 예측 방법에 대한 연구)

  • Lee, Hak Min;Um, Jong Seok
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.336-343
    • /
    • 2017
  • We propose a real time personalized recommendation algorithm on the mobile device. We use a unified collaborative filtering with reduced data. We use Fuzzy C-means clustering to obtain the reduced data and Konohen SOM is applied to get initial values of the cluster centers. The proposed algorithm overcomes data sparsity since it extends data to the similar users and similar items. Also, it enables real time service on the mobile device since it reduces computing time by data clustering. Applying the suggested algorithm to the MovieLens data, we show that the suggested algorithm has reasonable performance in comparison with collaborative filtering. We developed Android-based smart-phone application, which recommends restaurants with coupons and restaurant information.

A Dynamic Ontology-based Multi-Agent Context-Awareness User Profile Construction Method for Personalized Information Retrieval

  • Gao, Qian;Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.270-276
    • /
    • 2012
  • With the increase in amount of data and information available on the web, there have been high demands on personalized information retrieval services to provide context-aware services for the web users. This paper proposes a novel dynamic multi-agent context-awareness user profile construction method based on ontology to incorporate concepts and properties to model the user profile. This method comprehensively considers the frequency and the specific of the concept in one document and its corresponding domain ontology to construct the user profile, based on which, a fuzzy c-means clustering method is adopted to cluster the user's interest domain, and a dynamic update policy is adopted to continuously consider the change of the users' interest. The simulation result shows that along with the gradual perfection of the our user profile, our proposed system is better than traditional semantic based retrieval system in terms of the Recall Ratio and Precision Ratio.

Automatic Fuzzy Rule Generation Utilizing Genetic Algorithms

  • Hee, Soo-Hwang;Kwang, Bang-Woo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.2 no.3
    • /
    • pp.40-49
    • /
    • 1992
  • In this paper, an approach to identify fuzzy rules is proposed. The decision of the optimal number of fuzzy rule is made by means of fuzzy c-means clustering. The identification of the parameters of fuzzy implications is carried out by use of genetic algorithms. For the efficinet and fast parameter identification, the reduction thechnique of search areas of genetica algorithms is proposed. The feasibility of the proposed approach is evaluated through the identification of the fuzzy model to describe an input-output relation of Gas Furnace. Despite the simplicity of the propsed apprach the accuracy of the identified fuzzy model of gas furnace is superior as compared with that of other fuzzy modles.

  • PDF

Acoustic Emission Studies on the Structural Integrity Test of Welded High Strength Steel using Pattern Recognition (패턴인식을 이용한 고장력강의 용접 구조건전성 평가에 대한 음향방출 사례연구)

  • Kim, Gil-Dong;Rhee, Zhang-Kyu
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2008.04a
    • /
    • pp.185-196
    • /
    • 2008
  • The objective of this study is to evaluate the mechanical behaviors and structural integrity of the weldment of high strength steel by using an acoustic emission (AE) techniques. Simple tension and AE tests were conducted against the 3 kind of welding test specimens. In order to analysis the effectiveness of weldability, joinability and structural integrity, we used K-means clustering method as a unsupervised learning pattern recognition algorithm for obtained multivariate AE main data sets, such as AE counts, energy, amplitude, hits, risetime, duration, counts to peak and rms signals. Through the experimental results, the effectiveness of the proposed method is discussed.

  • PDF

The Binarization of Text Regions in Natural Scene Images, based on Stroke Width Estimation (자연 영상에서 획 너비 추정 기반 텍스트 영역 이진화)

  • Zhang, Chengdong;Kim, Jung Hwan;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.1 no.4
    • /
    • pp.27-34
    • /
    • 2012
  • In this paper, a novel text binarization is presented that can deal with some complex conditions, such as shadows, non-uniform illumination due to highlight or object projection, and messy backgrounds. To locate the target text region, a focus line is assumed to pass through a text region. Next, connected component analysis and stroke width estimation based on location information of the focus line is used to locate the bounding box of the text region, and each box of connected components. A series of classifications are applied to identify whether each CC(Connected component) is text or non-text. Also, a modified K-means clustering method based on an HCL color space is applied to reduce the color dimension. A text binarization procedure based on location of text component and seed color pixel is then used to generate the final result.

  • PDF