• Title/Summary/Keyword: clustering technique

Search Result 712, Processing Time 0.024 seconds

Double monothetic clustering for histogram-valued data

  • Kim, Jaejik;Billard, L.
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.263-274
    • /
    • 2018
  • One of the common issues in large dataset analyses is to detect and construct homogeneous groups of objects in those datasets. This is typically done by some form of clustering technique. In this study, we present a divisive hierarchical clustering method for two monothetic characteristics of histogram data. Unlike classical data points, a histogram has internal variation of itself as well as location information. However, to find the optimal bipartition, existing divisive monothetic clustering methods for histogram data consider only location information as a monothetic characteristic and they cannot distinguish histograms with the same location but different internal variations. Thus, a divisive clustering method considering both location and internal variation of histograms is proposed in this study. The method has an advantage in interpreting clustering outcomes by providing binary questions for each split. The proposed clustering method is verified through a simulation study and applied to a large U.S. house property value dataset.

Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

  • Kim, Chul-Won;Park, Sun
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.241-246
    • /
    • 2013
  • A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods.

Semantic Correspondence of Database Schema from Heterogeneous Databases using Self-Organizing Map

  • Dumlao, Menchita F.;Oh, Byung-Joo
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.217-224
    • /
    • 2008
  • This paper provides a framework for semantic correspondence of heterogeneous databases using self- organizing map. It solves the problem of overlapping between different databases due to their different schemas. Clustering technique using self-organizing maps (SOM) is tested and evaluated to assess its performance when using different kinds of data. Preprocessing of database is performed prior to clustering using edit distance algorithm, principal component analysis (PCA), and normalization function to identify the features necessary for clustering.

  • PDF

Document Clustering Technique by Domain Ontology (도메인 온톨로지에 의한 문서 군집화 기법)

  • Kim, Woosaeng;Guan, Xiang-Dong
    • Journal of Information Technology Applications and Management
    • /
    • v.23 no.2
    • /
    • pp.143-152
    • /
    • 2016
  • We can organize, manage, search, and process the documents efficiently by a document clustering. In general, the documents are clustered in a high dimensional feature space because the documents consist of many terms. In this paper, we propose a new method to cluster the documents efficiently in a low dimensional feature space by finding the core concepts from a domain ontology corresponding to the particular area documents. The experiment shows that our clustering method has a good performance.

An Energy-Efficient Sensor Network Clustering Using the Hybrid Setup (하이브리드 셋업을 이용한 에너지 효율적 센서 네트워크 클러스터링)

  • Min, Hong-Ki
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.12 no.1
    • /
    • pp.38-43
    • /
    • 2011
  • Cluster-based routing is high energy consumption of cluster head nodes. A recent approach to resolving the problem is the dynamic cluster technique that periodically re-selects cluster head nodes to distribute energy consumption of the sensor nodes. However, the dynamic clustering technique has a problem that repetitive construction of clustering consumes the more energies. This paper proposes a solution to the problems described above from the energy efficiency perspective. The round-robin cluster header(RRCH) technique, which fixes the initially structured cluster and sequentially selects cluster head nodes, is suggested for solving the energy consumption problem regarding repetitive cluster construction. A simulation result were compared with the performances of two of the most widely used conventional techniques, the LEACH(Low Energy Adaptive Clustering Hierarchy) and HEED(Hybrid, Energy Efficient, Distributed Clustering) algorithms, based on energy consumption, remaining energy for each node and uniform distribution. The evaluation confirmed that in terms of energy consumption, the technique proposed in this paper was 26.5% and 20% more efficient than LEACH and HEED, respectively.

Korean Language Clustering using Word2Vec (Word2Vec를 이용한 한국어 단어 군집화 기법)

  • Heu, Jee-Uk
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.5
    • /
    • pp.25-30
    • /
    • 2018
  • Recently with the development of Internet technology, a lot of research area such as retrieval and extracting data have getting important for providing the information efficiently and quickly. Especially, the technique of analyzing and finding the semantic similar words for given korean word such as compound words or generated newly is necessary because it is not easy to catch the meaning or semantic about them. To handle of this problem, word clustering is one of the technique which is grouping the similar words of given word. In this paper, we proposed the korean language clustering technique that clusters the similar words by embedding the words using Word2Vec from the given documents.

A Web Personalized Recommender System Using Clustering-based CBR (클러스터링 기반 사례기반추론을 이용한 웹 개인화 추천시스템)

  • Hong, Tae-Ho;Lee, Hee-Jung;Suh, Bo-Mil
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.1
    • /
    • pp.107-121
    • /
    • 2005
  • Recently, many researches on recommendation systems and collaborative filtering have been proceeding in both research and practice. However, although product items may have multi-valued attributes, previous studies did not reflect the multi-valued attributes. To overcome this limitation, this paper proposes new methodology for recommendation system. The proposed methodology uses multi-valued attributes based on clustering technique for items and applies the collaborative filtering to provide accurate recommendations. In the proposed methodology, both user clustering-based CBR and item attribute clustering-based CBR technique have been applied to the collaborative filtering to consider correlation of item to item as well as correlation of user to user. By using multi-valued attribute-based clustering technique for items, characteristics of items are identified clearly. Extensive experiments have been performed with MovieLens data to validate the proposed methodology. The results of the experiment show that the proposed methodology outperforms the benchmarked methodologies: Case Based Reasoning Collaborative Filtering (CBR_CF) and User Clustering Case Based Reasoning Collaborative Filtering (UC_CBR_CF).

  • PDF

The Clustering Threshold Image Processing Technique in fMRI (핵자기 뇌기능 영상에서 군집경계기법을 이용한 영상처리법)

  • Jeong, Sun-Cheol;No, Yong-Man;Jo, Jang-Hui
    • Journal of Biomedical Engineering Research
    • /
    • v.16 no.4
    • /
    • pp.425-430
    • /
    • 1995
  • The correlation technique has been widely used in ctRl data processing. The proposed CLT (clus- tering threshold) technique is a modified CCT (correlation coefficient threshold) technique and has many advantages compared with the conventional CCT technique. The CLT technique is explained by the following two steps. First, once the correlation coefficient map above the proper TH value is obtained using the CCT technique which is discrete and includes splash noise data, then the spurious pixels are rejected and the real neural activity pixels extracted using an nxn matrix box. Second, a clustering operation is performed by the two correction rules. The real neuronal activated pixels can be clustered and the false spurious pixels can be suppressed by the proposed CLT technique. The proposed CLT technique used in the post processing in ctRl has advantages over other existing techniques. It is especially proved to be robust in noisy environment.

  • PDF

A New Approach to Spatial Pattern Clustering based on Longest Common Subsequence with application to a Grocery (공간적 패턴클러스터링을 위한 새로운 접근방법의 제안 : 슈퍼마켓고객의 동선분석)

  • Jung, In-Chul;Kwon, Young-S.
    • IE interfaces
    • /
    • v.24 no.4
    • /
    • pp.447-456
    • /
    • 2011
  • Identifying the major moving patterns of shoppers' movements in the selling floor has been a longstanding issue in the retailing industry. With the advent of RFID technology, it has been easier to collect the moving data for a individual shopper's movement. Most of the previous studies used the traditional clustering technique to identify the major moving pattern of customers. However, in using clustering technique, due to the spatial constraint (aisle layout or other physical obstructions in the store), standard clustering methods are not feasible for moving data like shopping path should be adjusted for the analysis in advance, which is time-consuming and causes data distortion. To alleviate this problems, we propose a new approach to spatial pattern clustering based on longest common subsequence (LCSS). Experimental results using the real data obtained from a grocery in Seoul show that the proposed method performs well in finding the hot spot and dead spot as well as in finding the major path patterns of customer movements.

Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia (한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.5
    • /
    • pp.189-196
    • /
    • 2014
  • Recently, the need for retrieving documents is growing in SNS environment such as twitter. For supporting the twitter search, a clustering technique classifying the massively retrieved documents in terms of topics is required. However, due to the nature of twitter, there is a limit in applying previous simple techniques to clustering the twitter documents. To overcome such problem, we propose in this paper a new clustering technique suitable to twitter environment. In proposed method, we augment new terms to feature vectors representing the twitter documents, and recalculate the weights of features using Korean Wikipedia. In addition, we performed the experiments with Korean twitter documents, and proved the usability of proposed method through performance comparison with the previous techniques.