• 제목/요약/키워드: and clustering

Search Result 5,641, Processing Time 0.038 seconds

Document Clustering Technique by Domain Ontology (도메인 온톨로지에 의한 문서 군집화 기법)

  • Kim, Woosaeng;Guan, Xiang-Dong
    • Journal of Information Technology Applications and Management
    • /
    • v.23 no.2
    • /
    • pp.143-152
    • /
    • 2016
  • We can organize, manage, search, and process the documents efficiently by a document clustering. In general, the documents are clustered in a high dimensional feature space because the documents consist of many terms. In this paper, we propose a new method to cluster the documents efficiently in a low dimensional feature space by finding the core concepts from a domain ontology corresponding to the particular area documents. The experiment shows that our clustering method has a good performance.

Neutron clustering in Monte Carlo iterated-source calculations

  • Sutton, Thomas M.;Mittal, Anudha
    • Nuclear Engineering and Technology
    • /
    • v.49 no.6
    • /
    • pp.1211-1218
    • /
    • 2017
  • Monte Carlo neutron transport codes generally use the method of successive generations to converge the fission source distribution to-and then maintain it at-the fundamental mode. Recently, a phenomenon called "clustering" has been noted, which produces fission distributions that are very far from the fundamental mode. In this study, a mathematical model of clustering in Monte Carlo has been developed. The model draws on previous work for continuous-time birth-death processes, as well as methods from the field of population genetics.

Genomic Tree of Gene Contents Based on Functional Groups of KEGG Orthology

  • Kim Jin-Sik;Lee Sang-Yup
    • Journal of Microbiology and Biotechnology
    • /
    • v.16 no.5
    • /
    • pp.748-756
    • /
    • 2006
  • We propose a genome-scale clustering approach to identify whole genome relationships using the functional groups given by the Kyoto Encyclopedia of Genes and Genomes Orthology (KO) database. The metabolic capabilities of each organism were defined by the number of genes in each functional category. The archaeal, bacterial, and eukaryotic genomes were compared by simultaneously applying a two-step clustering method, comprised of a self-organizing tree algorithm followed by unsupervised hierarchical clustering. The clustering results were consistent with various phenotypic characteristics of the organisms analyzed and, additionally, showed a different aspect of the relationship between genomes that have previously been established through rRNA-based comparisons. The proposed approach to collect and cluster the metabolic functional capabilities of organisms should make it a useful tool in predicting relationships among organisms.

Damage identification for high-speed railway truss arch bridge using fuzzy clustering analysis

  • Cao, Bao-Ya;Ding, You-Liang;Zhao, Han-Wei;Song, Yong-Sheng
    • Structural Monitoring and Maintenance
    • /
    • v.3 no.4
    • /
    • pp.315-333
    • /
    • 2016
  • This study aims to perform damage identification for Da-Sheng-Guan (DSG) high-speed railway truss arch bridge using fuzzy clustering analysis. Firstly, structural health monitoring (SHM) system is established for the DSG Bridge. Long-term field monitoring strain data in 8 different cases caused by high-speed trains are taken as classification reference for other unknown cases. And finite element model (FEM) of DSG Bridge is established to simulate damage cases of the bridge. Then, effectiveness of one fuzzy clustering analysis method named transitive closure method and FEM results are verified using the monitoring strain data. Three standardization methods at the first step of fuzzy clustering transitive closure method are compared: extreme difference method, maximum method and non-standard method. At last, the fuzzy clustering method is taken to identify damage with different degrees and different locations. The results show that: non-standard method is the best for the data with the same dimension at the first step of fuzzy clustering analysis. Clustering result is the best when 8 carriage and 16 carriage train in the same line are in a category. For DSG Bridge, the damage is identified when the strain mode change caused by damage is more significant than it caused by different carriages. The corresponding critical damage degree called damage threshold varies with damage location and reduces with the increase of damage locations.

A Non-linear Variant of Global Clustering Using Kernel Methods (커널을 이용한 전역 클러스터링의 비선형화)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon;Woo, Young-Woon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.4
    • /
    • pp.11-18
    • /
    • 2010
  • Fuzzy c-means (FCM) is a simple but efficient clustering algorithm using the concept of a fuzzy set that has been proved to be useful in many areas. There are, however, several well known problems with FCM, such as sensitivity to initialization, sensitivity to outliers, and limitation to convex clusters. In this paper, global fuzzy c-means (G-FCM) and kernel fuzzy c-means (K-FCM) are combined to form a non-linear variant of G-FCM, called kernel global fuzzy c-means (KG-FCM). G-FCM is a variant of FCM that uses an incremental seed selection method and is effective in alleviating sensitivity to initialization. There are several approaches to reduce the influence of noise and accommodate non-convex clusters, and K-FCM is one of them. K-FCM is used in this paper because it can easily be extended with different kernels. By combining G-FCM and K-FCM, KG-FCM can resolve the shortcomings mentioned above. The usefulness of the proposed method is demonstrated by experiments using artificial and real world data sets.

A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST (서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구)

  • Han, Sang-Il;Lee, Sung-Gun;Kim, Kyung-Hoon;Lee, Ju-Yeong;Kim, Young-Han;Hwang, Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.10
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

Online Clustering Algorithms for Semantic-Rich Network Trajectories

  • Roh, Gook-Pil;Hwang, Seung-Won
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.346-353
    • /
    • 2011
  • With the advent of ubiquitous computing, a massive amount of trajectory data has been published and shared in many websites. This type of computing also provides motivation for online mining of trajectory data, to fit user-specific preferences or context (e.g., time of the day). While many trajectory clustering algorithms have been proposed, they have typically focused on offline mining and do not consider the restrictions of the underlying road network and selection conditions representing user contexts. In clear contrast, we study an efficient clustering algorithm for Boolean + Clustering queries using a pre-materialized and summarized data structure. Our experimental results demonstrate the efficiency and effectiveness of our proposed method using real-life trajectory data.

Technology Clustering Using Textual Information of Reference Titles in Scientific Paper (과학기술 논문의 참고문헌 텍스트 정보를 활용한 기술의 군집화)

  • Park, Inchae;Kim, Songhee;Yoon, Byungun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.25-32
    • /
    • 2020
  • Data on patent and scientific paper is considered as a useful information source for analyzing technological information and has been widely utilized. Technology big data is analyzed in various ways to identify the latest technological trends and predict future promising technologies. Clustering is one of the ways to discover new features by creating groups from technology big data. Patent includes refined bibliographic information such as patent classification code whereas scientific paper does not have appropriate bibliographic information for clustering. This research proposes a new approach for clustering data of scientific paper by utilizing reference titles in each scientific paper. In this approach, the reference titles are considered as textual information because each reference consists of the title of the paper that represents the core content of the paper. We collected the scientific paper data, extracted the title of the reference, and conducted clustering by measuring the text-based similarity. The results from the proposed approach are compared with the results using existing methodologies that one is the approach utilizing textual information from titles and abstracts and the other one is a citation-based approach. The suggested approach in this paper shows statistically significant difference compared to the existing approaches and it shows better clustering performance. The proposed approach will be considered as a useful method for clustering scientific papers.

An Energy Efficient Algorithm Based on Clustering Formulation and Scheduling for Proportional Fairness in Wireless Sensor Networks

  • Cheng, Yongbo;You, Xing;Fu, Pengcheng;Wang, Zemei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.2
    • /
    • pp.559-573
    • /
    • 2016
  • In this paper, we investigate the problem of achieving proportional fairness in hierarchical wireless sensor networks. Combining clustering formulation and scheduling, we maximize total bandwidth utility for proportional fairness while controlling the power consumption to a minimum value. This problem is decomposed into two sub-problems and solved in two stages, which are Clustering Formulation Stage and Scheduling Stage, respectively. The above algorithm, called CSPF_PC, runs in a network formulation sequence. In the Clustering Formulation Stage, we let the sensor nodes join to the cluster head nodes by adjusting transmit power in a greedy strategy; in the Scheduling Stage, the proportional fairness is achieved by scheduling the time-slot resource. Simulation results verify the superior performance of our algorithm over the compared algorithms on fairness index.

Bulk Insertion Method for R-tree using Seeded Clustering (R-tree에서 Seeded 클러스터링을 이용한 다량 삽입)

  • 이태원;문봉기;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.30-38
    • /
    • 2004
  • In many scientific and commercial applications such as Earth Observation System (EOSDIS) and mobile Phone services tracking a large number of clients, it is a daunting task to archive and index ever increasing volume of complex data that are continuously added to databases. To efficiently manage multidimensional data in scientific and data warehousing environments, R-tree based index structures have been widely used. In this paper, we propose a scalable technique called seeded clustering that allows us to maintain R-tree indexes by bulk insertion while keeping pace with high data arrival rates. Our approach uses a seed tree, which is copied from the top k levels of a target R-tree, to classify input data objects into clusters. We then build an R-tree for each of the clusters and insert the input R-trees into the target R-tree in bulk one at a time. We present detailed algorithms for the seeded clustering and bulk insertion as well as the results from our extensive experimental study. The experimental results show that the bulk insertion by seeded clustering outperforms the previously known methods in terms of insertion cost and the quality of target R-trees measured by their query performance.