• Title/Summary/Keyword: Hierarchical Clustering Model

Search Result 89, Processing Time 0.027 seconds

Cluster-based Information Retrieval with Tolerance Rough Set Model

  • Ho, Tu-Bao;Kawasaki, Saori;Nguyen, Ngoc-Binh
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.1
    • /
    • pp.26-32
    • /
    • 2002
  • The objectives of this paper are twofold. First is to introduce a model for representing documents with semantics relatedness using rough sets but with tolerance relations instead of equivalence relations (TRSM). Second is to introduce two document hierarchical and nonhierarchical clustering algorithms based on this model and TRSM cluster-based information retrieval using these two algorithms. The experimental results show that TRSM offers an alterative approach to text clustering and information retrieval.

A Study on the Asia Container Ports Clustering Using Hierarchical Clustering(Single, Complete, Average, Centroid Linkages) Methods with Empirical Verification of Clustering Using the Silhouette Method and the Second Stage(Type II) Cross-Efficiency Matrix Clustering Model (계층적 군집분석(최단, 최장, 평균, 중앙연결)방법에 의한 아시아 컨테이너 항만의 클러스터링 측정 및 실루엣방법과 2단계(Type II) 교차효율성 메트릭스 군집모형을 이용한 실증적 검증에 관한 연구)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.31-70
    • /
    • 2021
  • The purpose of this paper is to measure the clustering change and analyze empirical results, and choose the clustering ports for Busan, Incheon, and Gwangyang ports by using Hierarchical clustering(single, complete, average, and centroid), Silhouette, and 2SCE[the Second Stage(Type II) cross-efficiency] matrix clustering models on Asian container ports over the period 2009-2018. The models have chosen number of cranes, depth, birth length, and total area as inputs and container TEU as output. The main empirical results are as follows. First, ranking order according to the efficiency increasing ratio during the 10 years analysis shows Silhouette(0.4052 up), Hierarchical clustering(0.3097 up), and 2SCE(0.1057 up). Second, according to empirical verification of the Silhouette and 2SCE models, 3 Korean ports should be clustered with ports like Busan Port[ Dubai, Hong Kong, and Tanjung Priok], and Incheon Port and Gwangyang Port are required to cluster with most ports. Third, in terms of the ASEAN, it would be good to cluster like Busan (Singapore), Incheon Port (Tanjung Priok, Tanjung Perak, Manila, Tanjung Pelpas, Leam Chanbang, and Bangkok), and Gwangyang Port(Tanjung Priok, Tanjung Perak, Port Kang, Tanjung Pelpas, Leam Chanbang, and Bangkok). Third, Wilcoxon's signed-ranks test of models shows that all P values are significant at an average level of 0.852. It means that the average efficiency figures and ranking orders of the models are matched each other. The policy implication is that port policy makers and port operation managers should select benchmarking ports by introducing the models used in this study into the clustering of ports, compare and analyze the port development and operation plans of their ports, and introduce and implement the parts which required benchmarking quickly.

Clustering properties and halo occupation of Lyman-break galaxies at z ~ 4

  • Park, Jaehong;Kim, Han-Seek;Wyithe, Stuart B.;Lacey, Cedric G.;Baugh, Carlton M.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.40 no.1
    • /
    • pp.59.3-60
    • /
    • 2015
  • We investigate the clustering properties of Lyman-break galaxies (LBGs) at z ~ 4. Using the hierarchical galaxy formation model GALFORM, we predict the angular correlation function (ACF) of LBGs and compare this with the measured ACF from combined survey fields consisting of the Hubble eXtreme Deep Field (XDF) and CANDELS. We find that the predicted ACF is in a good agreement with the measured ACFs. However, when we divide the model LBGs into bright and faint subset, the predicted ACFs are less consistent with observations. We quantify the dependence of clustering on luminosity and show that the fraction of satellite LBGs is important for determining the amplitude of ACF at small scales. We find that central LBGs predominantly reside in ${\sim}10^{11}h^{-1}M_{solar}$ haloes and satellites reside in haloes of mass ${\sim}10^{12}-10^{13}h^{-1}M_{solar}$. The model predicts fewer bright satellite LBGs than is inferred from the observation. LBGs in the tails of the redshift distribution contribute significant additional clustering signal, especially on small scales. This spurious clustering may affect the interpretation of the halo occupation distribution, including the minimum halo mass and abundance of satellite LBGs.

  • PDF

i-LEACH : Head-node Constrained Clustering Algorithm for Randomly-Deployed WSN (i-LEACH : 랜덤배치 고정형 WSN에서 헤더수 고정 클러스터링 알고리즘)

  • Kim, Chang-Joon;Lee, Doo-Wan;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.1
    • /
    • pp.198-204
    • /
    • 2012
  • Generally, the clustering of sensor nodes in WSN is a useful mechanism that helps to cope with scalability problem and, if combined with network data aggregation, may increase the energy efficiency of the network. The Hierarchical clustering routing algorithm is a typical algorithm for enhancing overall energy efficiency of network, which selects cluster-head in order to send the aggregated data arriving from the node in cluster to a base station. In this paper, we propose the improved-LEACH that uses comparably simple and light-weighted policy to select cluster-head nodes, which results in reduction of the clustering overhead and overall power consumption of network. By using fine-grained power model, the simulation results show that i-LEACH can reduce clustering overhead compared with the well-known previous works such as LEACH. As result, i-LEACH algorithm and LEACH algorithm was compared, network power-consumption of i-LEACH algorithm was improved than LEACH algorithm with 25%, and network-traffic was improved 16%.

Active Learning based on Hierarchical Clustering (계층적 군집화를 이용한 능동적 학습)

  • Woo, Hoyoung;Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.10
    • /
    • pp.705-712
    • /
    • 2013
  • Active learning aims to improve the performance of a classification model by repeating the process to select the most helpful unlabeled data and include it to the training set through labelling by expert. In this paper, we propose a method for active learning based on hierarchical agglomerative clustering using Ward's linkage. The proposed method is able to construct a training set actively so as to include at least one sample from each cluster and also to reflect the total data distribution by expanding the existing training set. While most of existing active learning methods assume that an initial training set is given, the proposed method is applicable in both cases when an initial training data is given or not given. Experimental results show the superiority of the proposed method.

Online Recognition of Handwritten Korean and English Characters

  • Ma, Ming;Park, Dong-Won;Kim, Soo Kyun;An, Syungog
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.653-668
    • /
    • 2012
  • In this study, an improved HMM based recognition model is proposed for online English and Korean handwritten characters. The pattern elements of the handwriting model are sub character strokes and ligatures. To deal with the problem of handwriting style variations, a modified Hierarchical Clustering approach is introduced to partition different writing styles into several classes. For each of the English letters and each primitive grapheme in Korean characters, one HMM that models the temporal and spatial variability of the handwriting is constructed based on each class. Then the HMMs of Korean graphemes are concatenated to form the Korean character models. The recognition of handwritten characters is implemented by a modified level building algorithm, which incorporates the Korean character combination rules within the efficient network search procedure. Due to the limitation of the HMM based method, a post-processing procedure that takes the global and structural features into account is proposed. Experiments showed that the proposed recognition system achieved a high writer independent recognition rate on unconstrained samples of both English and Korean characters. The comparison with other schemes of HMM-based recognition was also performed to evaluate the system.

Segmenting Inpatients by Mixture Model and Analytical Hierarchical Process(AHP) Approach In Medical Service (의료서비스에서 혼합모형(Mixture model) 및 분석적 계층과정(AHP)를 이용한 입원환자의 시장세분화에 관한 연구)

  • 백수경;곽영식
    • Health Policy and Management
    • /
    • v.12 no.2
    • /
    • pp.1-22
    • /
    • 2002
  • Since the early 1980s scholars have applied latent structure and other type of finite mixture models from various academic fields. Although the merits of finite mixture model are well documented, the attempt to apply the mixture model to medical service has been relatively rare. The researchers aim to try to fill this gap by introducing finite mixture model and segmenting inpatients DB from one general hospital. In section 2 finite mixture models are compared with clustering, chi-square analysis, and discriminant analysis based on Wedel and Kamakura(2000)'s segmentation methodology schemata. The mixture model shows the optimal segments number and fuzzy classification for each observation by EM(expectation-maximization algorism). The finite mixture model is to unfix the sample, to Identify the groups, and to estimate the parameters of the density function underlying the observed data within each group. In section 3 and 4 we illustrate results of segmenting 4510 patients data including menial and ratio scales. And then, we show AHP can be identify the attractiveness of each segment, in which the decision maker can select the best target segment.

Probability-based Deep Learning Clustering Model for the Collection of IoT Information (IoT 정보 수집을 위한 확률 기반의 딥러닝 클러스터링 모델)

  • Jeong, Yoon-Su
    • Journal of Digital Convergence
    • /
    • v.18 no.3
    • /
    • pp.189-194
    • /
    • 2020
  • Recently, various clustering techniques have been studied to efficiently handle data generated by heterogeneous IoT devices. However, existing clustering techniques are not suitable for mobile IoT devices because they focus on statically dividing networks. This paper proposes a probabilistic deep learning-based dynamic clustering model for collecting and analyzing information on IoT devices using edge networks. The proposed model establishes a subnet by applying the frequency of the attribute values collected probabilistically to deep learning. The established subnets are used to group information extracted from seeds into hierarchical structures and improve the speed and accuracy of dynamic clustering for IoT devices. The performance evaluation results showed that the proposed model had an average 13.8 percent improvement in data processing time compared to the existing model, and the server's overhead was 10.5 percent lower on average than the existing model. The accuracy of extracting IoT information from servers has improved by 8.7% on average from previous models.

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

Decision Tree Based Context Clustering with Cross Likelihood Ratio for HMM-based TTS (HMM 기반의 TTS를 위한 상호유사도 비율을 이용한 결정트리 기반의 문맥 군집화)

  • Jung, Chi-Sang;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.174-180
    • /
    • 2013
  • This paper proposes a decision tree based context clustering algorithm for HMM-based speech synthesis systems using the cross likelihood ratio with a hierarchical prior (CLRHP). Conventional algorithms tie the context-dependent HMM states that have similar statistical characteristics, but they do not consider the statistical similarity of split child nodes, which does not guarantee the statistical difference between the final leaf nodes. The proposed CLRHP algorithm improves the reliability of model parameters by taking a criterion of minimizing the statistical similarity of split child nodes. Experimental results verify the superiority of the proposed approach to conventional ones.