• Title/Summary/Keyword: hierarchy clustering

Search Result 129, Processing Time 0.023 seconds

Automatic e-mail Hierarchy Classification using Dynamic Category Hierarchy and Principal Component Analysis (PCA와 동적 분류체계를 사용한 자동 이메일 계층 분류)

  • Park, Sun
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.3
    • /
    • pp.419-425
    • /
    • 2009
  • The amount of incoming e-mails is increasing rapidly due to the wide usage of Internet. Therefore, it is more required to classify incoming e-mails efficiently and accurately. Currently, the e-mail classification techniques are focused on two way classification to filter spam mails from normal ones based mainly on Bayesian and Rule. The clustering method has been used for the multi-way classification of e-mails. But it has a disadvantage of low accuracy of classification and no category labels. The classification methods have a disadvantage of training and setting of category labels by user. In this paper, we propose a novel multi-way e-mail hierarchy classification method that uses PCA for automatic category generation and dynamic category hierarchy for high accuracy of classification. It classifies a huge amount of incoming e-mails automatically, efficiently, and accurately.

  • PDF

MD-TIX: Multidimensional Type Inheritance Indexing for Efficient Execution of XML Queries (MD-TIX: XML 질의의 효율적 처리를 위한 다차원 타입상속 색인기법)

  • Lee, Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.9
    • /
    • pp.1093-1105
    • /
    • 2007
  • This paper presents a multidimensional type inheritance indexing technique (MD-TIX) for XML databases. We use a multidimensional file organization as the index structure. In conventional XML database indexing techniques using one-dimensional index structures, they do not efficiently handle complex queries involving both nested elements and type inheritance hierarchies. We extend a two-dimensional type hierarchy indexing technique(2D-THI) for indexing the nested elements of XML databases. 2D-THI is an indexing scheme that deals with the problem of clustering elements in a two-dimensional domain space consisting of the key value domain and the type identifier domain for indexing a simple element in a type hierarchy. In our extended scheme, we handle the clustering of the index entries in a multidimensional domain space consisting of a key value domain and multiple type identifier domains that include one type identifier domain per type hierarchy on a path expression. This scheme efficiently supports queries that involve search conditions on the nested element represented by an extended path expression. An extended path expression is a path expression in which every type hierarchy on a path can be substituted by an individual type or a subtype hierarchy.

  • PDF

Research on Low-energy Adaptive Clustering Hierarchy Protocol based on Multi-objective Coupling Algorithm

  • Li, Wuzhao;Wang, Yechuang;Sun, Youqiang;Mao, Jie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1437-1459
    • /
    • 2020
  • Wireless Sensor Networks (WSN) is a distributed Sensor network whose terminals are sensors that can sense and check the environment. Sensors are typically battery-powered and deployed in where the batteries are difficult to replace. Therefore, maximize the consumption of node energy and extend the network's life cycle are the problems that must to face. Low-energy adaptive clustering hierarchy (LEACH) protocol is an adaptive clustering topology algorithm, which can make the nodes in the network consume energy in a relatively balanced way and prolong the network lifetime. In this paper, the novel multi-objective LEACH protocol is proposed, in order to solve the proposed protocol, we design a multi-objective coupling algorithm based on bat algorithm (BA), glowworm swarm optimization algorithm (GSO) and bacterial foraging optimization algorithm (BFO). The advantages of BA, GSO and BFO are inherited in the multi-objective coupling algorithm (MBGF), which is tested on ZDT and SCH benchmarks, the results are shown the MBGF is superior. Then the multi-objective coupling algorithm is applied in the multi-objective LEACH protocol, experimental results show that the multi-objective LEACH protocol can greatly reduce the energy consumption of the node and prolong the network life cycle.

Energy Efficient Cooperative LEACH Protocol for Wireless Sensor Networks

  • Asaduzzaman, Asaduzzaman;Kong, Hyung-Yun
    • Journal of Communications and Networks
    • /
    • v.12 no.4
    • /
    • pp.358-365
    • /
    • 2010
  • We develop a low complexity cooperative diversity protocol for low energy adaptive clustering hierarchy (LEACH) based wireless sensor networks. A cross layer approach is used to obtain spatial diversity in the physical layer. In this paper, a simple modification in clustering algorithm of the LEACH protocol is proposed to exploit virtual multiple-input multiple-output (MIMO) based user cooperation. In lieu of selecting a single cluster-head at network layer, we proposed M cluster-heads in each cluster to obtain a diversity order of M in long distance communication. Due to the broadcast nature of wireless transmission, cluster-heads are able to receive data from sensor nodes at the same time. This fact ensures the synchronization required to implement a virtual MIMO based space time block code (STBC) in cluster-head to sink node transmission. An analytical method to evaluate the energy consumption based on BER curve is presented. Analysis and simulation results show that proposed cooperative LEACH protocol can save a huge amount of energy over LEACH protocol with same data rate, bit error rate, delay and bandwidth requirements. Moreover, this proposal can achieve higher order diversity with improved spectral efficiency compared to other virtual MIMO based protocols.

Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm (K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택)

  • Lee, Won-Hee;Lee, Shin-Won;Chung, Sung-Jong;An, Dong-Un
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.150-156
    • /
    • 2008
  • Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means reduces a time complexity. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

A Study on clustering method for Banlancing Energy Consumption in Hierarchical Sensor Network (계층적 센서 네트워크에서 균등한 에너지 소비를 위한 클러스터링 기법에 관한 연구)

  • Kim, Yo-Sup;Hong, Yeong-Pyo;Cho, Young-Il;Kim, Jin-Su;Eun, Jong-Won;Lee, Jong-Yong;Lee, Sang-Hun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3472-3480
    • /
    • 2010
  • The Clustering technology of Energy efficiency wireless sensor network gets the energy efficiency by reducing the number of communication between sensor nodes and sink node. In this paper, First analyzed on the clustering technique of the distributed clustering protocol routing scheme LEACH (Low Energy Adaptive Clustering Hierarchy) and HEED (Hybrid, Energy-Efficient Distributed Clustering Approach), and based on this, new energy-efficient clustering technique is proposed for the cause the maximum delay of dead nodes and to increase the lifetime of the network. In the proposed method, the cluster head is elect the optimal efficiency node based on the residual energy information of each member node and located information between sink node and cluster node, and elected a node in the cluster head since the data transfer process from the data been sent to the sink node to form a network by sending the energy consumption of individual nodes evenly to increase the network's entire life is the purpose of this study. To verify the performance of the proposed method through simulation and compared with existing clustering techniques. As a result, compared to the existing method of the network life cycle is approximately 5-10% improvement could be confirmed.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Gene Sequences Clustering for the Prediction of Functional Domain (기능 도메인 예측을 위한 유전자 서열 클러스터링)

  • Han Sang-Il;Lee Sung-Gun;Hou Bo-Kyeng;Byun Yoon-Sup;Hwang Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.10
    • /
    • pp.1044-1049
    • /
    • 2006
  • Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.

2D-THI: Two-Dimensional Type Hierarchy Index for XML Databases (2D-THI: XML 데이테베이스를 위한 이차원 타입상속 계층색인)

  • Lee Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.3
    • /
    • pp.265-278
    • /
    • 2006
  • This paper presents a two-dimensional type inheritance hierarchy index(2D-THI) for XML databases. XML Schema is one of schema models for the XML documents supporting. The type inheritance. The conventional indexing techniques for XML databases can not support XML queries on type inheritance hierarchies. We construct a two-dimensional index structure using multidimensional file organizations for supporting type inheritance hierarchy in XML queries. This indexing technique deals with the problem of clustering index entries in the two-dimensional domain space that consists of a key element domain and a type identifier domain based on the user query pattern. This index enhances query performance by adjusting the degree of clustering between the two domains. For performance evaluation, we have compared our proposed 2D-THI with the conventional class hierarchy indexing techniques in object-oriented databases such as CH-index and CG-tree through the cost model. As the result of the performance evaluations, we have verified that our proposed two-dimensional type inheritance indexing technique can efficiently support the query Processing in XML databases according to the query types.

  • PDF

Clustering-based Hierarchical Scene Structure Construction for Movie Videos (영화 비디오를 위한 클러스터링 기반의 계층적 장면 구조 구축)

  • Choi, Ick-Won;Byun, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.529-542
    • /
    • 2000
  • Recent years, the use of multimedia information is rapidly increasing, and the video media is the most rising one than any others, and this field Integrates all the media into a single data stream. Though the availability of digital video is raised largely, it is very difficult for users to make the effective video access, due to its length and unstructured video format. Thus, the minimal interaction of users and the explicit definition of video structure is a key requirement in the lately developing image and video management systems. This paper defines the terms and hierarchical video structure, and presents the system, which construct the clustering-based video hierarchy, which facilitate users by browsing the summary and do a random access to the video content. Instead of using a single feature and domain-specific thresholds, we use multiple features that have complementary relationship for each other and clustering-based methods that use normalization so as to interact with users minimally. The stage of shot boundary detection extracts multiple features, performs the adaptive filtering process for each features to enhance the performance by eliminating the false factors, and does k-means clustering with two classes. The shot list of a result after the proposed procedure is represented as the video hierarchy by the intelligent unsupervised clustering technique. We experimented the static and the dynamic movie videos that represent characteristics of various video types. In the result of shot boundary detection, we had almost more than 95% good performance, and had also rood result in the video hierarchy.

  • PDF