• Title/Summary/Keyword: over-clustering

Search Result 388, Processing Time 0.024 seconds

A Comparison and Analysis on High-Dimensional Clustering Techniques for Data Mining (데이터 마이닝을 위한 고차원 클러스터링 기법에 관한 비교 분석 연구)

  • 김홍일;이혜명
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.12
    • /
    • pp.887-900
    • /
    • 2003
  • Many applications require the clustering of large amounts of high dimensional data. Most automated clustering techniques have been developed but they do not work effectively and/or efficiently on high dimensional (numerical) data, which is due to the so-called “curse of dimensionality”. Moreover, the high dimensional data often contain a significant amount of noise, which causes additional ineffectiveness of algorithms. Therefore, it is necessary to look over the structure and various characteristics of high dimensional data and to develop algorithm that support clustering adapted to applications of the high dimensional database. In this paper, we investigate and classify the existing high dimensional clustering methods by analyzing the strength and weakness of each method for specific applications and comparing them. Especially, in terms of efficiency and effectiveness, we compare the traditional algorithms with CLIP which are developed by us. This study will contribute to develop more advanced algorithms than the current algorithms.

  • PDF

Real-time Fault Detection and Classification of Reactive Ion Etching Using Neural Networks (Neural Networks을 이용한 Reactive Ion Etching 공정의 실시간 오류 검출에 관한 연구)

  • Ryu Kyung-Han;Lee Song-Jae;Soh Dea-Wha;Hong Sang-Jeen
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.7
    • /
    • pp.1588-1593
    • /
    • 2005
  • In coagulant control of water treatment plants, rule extraction, one of datamining categories, was performed for coagulant control of a water treatment plant. Clustering methods were applied to extract control rules from data. These control rules can be used for fully automation of water treatment plants instead of operator's knowledge for plant control. To perform fuzzy clustering, there are some coefficients to be determined and these kinds of studies have been performed over decades such as clustering indices. In this study, statistical indices were taken to calculate the number of clusters. Simultaneously, seed points were found out based on hierarchical clustering. These statistical approaches give information about features of clusters, so it can reduce computing cost and increase accuracy of clustering. The proposed algorithm can play an important role in datamining and knowledge discovery.

Three Effective Top-Down Clustering Algorithms for Location Database Systems

  • Lee, Kwang-Jo;Yang, Sung-Bong
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.2
    • /
    • pp.173-187
    • /
    • 2010
  • Recent technological advances in mobile communication systems have made explosive growth in the number of mobile device users worldwide. One of the most important issues in designing a mobile computing system is location management of users. The hierarchical systems had been proposed to solve the scalability problem in location management. The scalability problem occurs when there are too many users for a mobile system to handle, as the system is likely to react slow or even get down due to late updates of the location databases. In this paper, we propose a top-down clustering algorithm for hierarchical location database systems in a wireless network. A hierarchical location database system employs a tree structure. The proposed algorithm uses a top-down approach and utilizes the number of visits to each cell made by the users along with the movement information between a pair of adjacent cells. We then present a modified algorithm by incorporating the exhaustive method when there remain a few levels of the tree to be processed. We also propose a capacity constraint top-down clustering algorithm for more realistic environments where a database has a capacity limit. By the capacity of a database we mean the maximum number of mobile device users in the cells that can be handled by the database. This algorithm reduces a number of databases used for the system and improves the update performance. The experimental results show that the proposed, top-down, modified top-down, and capacity constraint top-down clustering algorithms reduce the update cost by 17.0%, 18.0%, 24.1%, the update time by about 43.0%, 39.0%, 42.3%, respectively. The capacity constraint algorithm reduces the average number of databases used for the system by 23.9% over other algorithms.

GARCH Model with Conditional Return Distribution of Unbounded Johnson (Unbounded Johnson 분포를 이용한 GARCH 수익률 모형의 적용)

  • Jung, Seung-Hyun;Oh, Jung-Jun;Kim, Sung-Gon
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.29-43
    • /
    • 2012
  • Financial data such as stock index returns and exchange rates have the properties of heavy tail and asymmetry compared to normal distribution. When we estimate VaR using the GARCH model (with the conditional return distribution of normal) it shows the tendency of the lower estimation and clustering in the losses over the estimated VaR. In this paper, we argue that this problem can be resolved through the adaptation of the unbounded Johnson distribution as that of the condition return. We also compare this model with the GARCH with the conditional return distribution of normal and student-t. Using the losses exceed the ex-ante VaR, estimates, we check the validity of the GARCH models through the failure proportion test and the clustering test. We nd that the GARCH model with conditional return distribution of unbounded Johnson provides an appropriate estimation of the VaR and does not occur the clustering of violations.

Temporospatial clustering analysis of foot-and-mouth disease transmission in South Korea, 2010~2011 (시공간 클러스터링 분석을 이용한 2010~2011 국내 발생 구제역 전파양상)

  • Bae, Sun-Hak;Shin, Yeun-Kyung;Kim, Byunghan;Pak, Son-Il
    • Korean Journal of Veterinary Research
    • /
    • v.53 no.1
    • /
    • pp.49-54
    • /
    • 2013
  • To investigate the transmission pattern of geographical area and temporal trends of the 2010~2011 foot-and-mouth disease (FMD) outbreaks in Korea, and to explore temporal intervals at which spatial clustering of FMD cases space-time analysis based on georeferenced database of 3,575 burial sites, from 30 November 2010 to 23 February 2011, was performed. The cases represent approximately 98.1% of all infected farms (n = 3,644) during the same period. Descriptive maps of spatial patterns of the outbreaks were generated by ArcGIS. Spatial Scan Statistics, using SaTScan software, was applied to investigate geographical clusters of FMD cases across the country. Overall, spatial heterogeneity was identified, and the transmission pattern was different by province. Cattle have more clusters in number but smaller in size, as compared to the swine population. In addition, spatiotemporal analysis and the comparison of clustering patterns between the first 7 days and days 8 to 14 of the outbreak revealed that the strongest spatial clustering was identified at the 7-day interval, although clustering over longer intervals (8~14 days) was also observed. We further discussed the importance of time period elapsed between FMD-suspected notice and the date of confirmation, and emphasized the necessity of region-specific and species-specific control measures.

Ant Colony Hierarchical Cluster Analysis (개미 군락 시스템을 이용한 계층적 클러스터 분석)

  • Kang, Mun-Su;Choi, Young-Sik
    • Journal of Internet Computing and Services
    • /
    • v.15 no.5
    • /
    • pp.95-105
    • /
    • 2014
  • In this paper, we present a novel ant-based hierarchical clustering algorithm, where ants repeatedly hop from one node to another over a weighted directed graph of k-nearest neighborhood obtained from a given dataset. We introduce a notion of node pheromone, which is the summation of amount of pheromone on incoming arcs to a node. The node pheromone can be regarded as a relative density measure in a local region. After a finite number of ants' hopping, we remove nodes with a small amount of node pheromone from the directed graph, and obtain a group of strongly connected components as clusters. We iteratively do this removing process from a low value of threshold to a high value, yielding a hierarchy of clusters. We demonstrate the performance of the proposed algorithm with synthetic and real data sets, comparing with traditional clustering methods. Experimental results show the superiority of the proposed method to the traditional methods.

Extreme value modeling of structural load effects with non-identical distribution using clustering

  • Zhou, Junyong;Ruan, Xin;Shi, Xuefei;Pan, Chudong
    • Structural Engineering and Mechanics
    • /
    • v.74 no.1
    • /
    • pp.55-67
    • /
    • 2020
  • The common practice to predict the characteristic structural load effects (LEs) in long reference periods is to employ the extreme value theory (EVT) for building limit distributions. However, most applications ignore that LEs are driven by multiple loading events and thus do not have the identical distribution, a prerequisite for EVT. In this study, we propose the composite extreme value modeling approach using clustering to (a) cluster initial blended samples into finite identical distributed subsamples using the finite mixture model, expectation-maximization algorithm, and the Akaike information criterion; (b) combine limit distributions of subsamples into a composite prediction equation using the generalized Pareto distribution based on a joint threshold. The proposed approach was validated both through numerical examples with known solutions and engineering applications of bridge traffic LEs on a long-span bridge. The results indicate that a joint threshold largely benefits the composite extreme value modeling, many appropriate tail approaching models can be used, and the equation form is simply the sum of the weighted models. In numerical examples, the proposed approach using clustering generated accurate extrema prediction of any reference period compared with the known solutions, whereas the common practice of employing EVT without clustering on the mixture data showed large deviations. Real-world bridge traffic LEs are driven by multi-events and present multipeak distributions, and the proposed approach is more capable of capturing the tendency of tailed LEs than the conventional approach. The proposed approach is expected to have wide applications to general problems such as samples that are driven by multiple events and that do not have the identical distribution.

RAG-based Hierarchical Classification (RAG 기반 계층 분류 (2))

  • Lee, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.6
    • /
    • pp.613-619
    • /
    • 2006
  • This study proposed an unsupervised image classification through the dendrogram of agglomerative clustering as a higher stage of image segmentation in image processing. The proposed algorithm is a hierarchical clustering which includes searching a set of MCSNP (Mutual Closest Spectral Neighbor Pairs) based on the data structures of RAG(Regional Adjacency Graph) defined on spectral space and Min-Heap. It also employes a multi-window system in spectral space to define the spectral adjacency. RAG is updated for the change due to merging using RNV (Regional Neighbor Vector). The proposed algorithm provides a dendrogram which is a graphical representation of data. The hierarchical relationship in clustering can be easily interpreted in the dendrogram. In this study, the proposed algorithm has been extensively evaluated using simulated images and applied to very large QuickBird imagery acquired over an area of Korean Peninsula. The results have shown it potentiality for the application of remotely-sensed imagery.

A Method for Tree Image Segmentation Combined Adaptive Mean Shifting with Image Abstraction

  • Yang, Ting-ting;Zhou, Su-yin;Xu, Ai-jun;Yin, Jian-xin
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1424-1436
    • /
    • 2020
  • Although huge progress has been made in current image segmentation work, there are still no efficient segmentation strategies for tree image which is taken from natural environment and contains complex background. To improve those problems, we propose a method for tree image segmentation combining adaptive mean shifting with image abstraction. Our approach perform better than others because it focuses mainly on the background of image and characteristics of the tree itself. First, we abstract the original tree image using bilateral filtering and image pyramid from multiple perspectives, which can reduce the influence of the background and tree canopy gaps on clustering. Spatial location and gray scale features are obtained by step detection and the insertion rule method, respectively. Bandwidths calculated by spatial location and gray scale features are then used to determine the size of the Gaussian kernel function and in the mean shift clustering. Furthermore, the flood fill method is employed to fill the results of clustering and highlight the region of interest. To prove the effectiveness of tree image abstractions on image clustering, we compared different abstraction levels and achieved the optimal clustering results. For our algorithm, the average segmentation accuracy (SA), over-segmentation rate (OR), and under-segmentation rate (UR) of the crown are 91.21%, 3.54%, and 9.85%, respectively. The average values of the trunk are 92.78%, 8.16%, and 7.93%, respectively. Comparing the results of our method experimentally with other popular tree image segmentation methods, our segmentation method get rid of human interaction and shows higher SA. Meanwhile, this work shows a promising application prospect on visual reconstruction and factors measurement of tree.

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.