• Title/Summary/Keyword: cluster similarity analysis

Search Result 388, Processing Time 0.032 seconds

A Table Integration Technique Using Query Similarity Analysis

  • Choi, Go-Bong;Woo, Yong-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2019
  • In this paper, we propose a technique to analyze similarity between SQL queries and to assist integrating similar tables. First, the table information was extracted from the SQL queries through the query structure analyzer, and the similarity between the tables was measured using the Jacquard index technique. Then, similar table clusters are generated through hierarchical cluster analysis method and the co-occurence probability of the table used in the query is calculated. The possibility of integrating similar tables is classified by using the possibility of co-occurence of similarity table and table, and classifying them into an integrable cluster, a cluster requiring expert review, and a cluster with low integration possibility. This technique analyzes the SQL query in practice and analyse the possibility of table integration independent of the existing business, so that the existing schema can be effectively reconstructed without interruption of work or additional cost.

How to quantify the similarity of 2D distributions: Comparison of spatial distribution of Dark Matter and Intracluster light

  • Yoo, Jaewon;Ko, Jongwan;Sabiu, Cristiano G.;Chun, Kyungwon;Shin, Jihye;Hwang, Ho Seong;Smith, Rory;Kim, Hyowon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.46 no.2
    • /
    • pp.67.4-68
    • /
    • 2021
  • In studying the dynamical evolution of galaxy clusters, one intriguing approach is to compare the spatial distributions of various components, such as the dark matter, the member galaxies, the gas, and the intracluster light (ICL; the diffuse light from stars, which are not bound any individual cluster galaxy). If we find a visible component whose spatial distribution coincides with the dark matter distribution, then we could draw a dark matter map without requiring laborious weak lensing analysis. Furthermore, if the component traces the dark matter distribution better for more relaxed galaxy cluster, we could use the similarity as a dynamical stage estimator of the galaxy cluster. We present a novel new methodology to quantify the similarity of two or more 2-dimensional spatial distributions. We apply the method to a sample of galaxy clusters at different dynamical stages simulated within N-cluster Run, which is an N-body simulation using the galaxy replacement technique. Among the various components (stellar particles, galaxies, ICL), the velocity defined ICL+ brightest cluster galaxy (BCG) component traces the dark matter best. Between the sample galaxy clusters, the relaxed clusters show stronger similarity of the spatial distribution between the dark matter and ICL+BCG than the dynamically young clusters.

  • PDF

Comparison of Reseults using Average Taxonomic Distance and Correlation Coefficient Matrices for Cluster Analyses (Cluster Analyses에서 Average Taxonomic Distance와 Correlation Coefficient 행렬식들을 이용한 결과의 비교)

  • Koh, Hung-Sun
    • The Korean Journal of Zoology
    • /
    • v.24 no.2
    • /
    • pp.91-98
    • /
    • 1981
  • It has been confirmed that two dendrograms resulted from two similarity matrices, average taxonomic distance and correlation coefficient matrices, are different with each other when cluster analyses were performed with 571 adults of deer mice, Peromyscus maniculatus using 30 morphometric characters. To choose one of two similarity matrices mentioned above in order to construct a dendrogram representing phenetic relationships among taxa, an objective method using the result from principal component analysis as a standard result to compare with two matrices has been suggested.

  • PDF

Random Amplified Polymorphic DNA Analysis of Genetic Relationships Among Acanthopanax Species

  • Park, Sang-Yong;Yook, Chang-Soo;Nohara, Toshihiro;Mizutani, Takayuki;Tanaka , Takayuki
    • Archives of Pharmacal Research
    • /
    • v.27 no.12
    • /
    • pp.1270-1274
    • /
    • 2004
  • Random amplified polymorphic DNA (RAPD) analysis was used to determine the genetic relationships among seventeen species of the Acanthopanax species. The DNA isolated from the leaves of the samples was used as template in polymerase chain reaction (PCR) with twenty random decamer primers in order to distinguish plant subspecies at the level of their genomes. The RAPD patterns were compared by calculating pairwise distances using Dice similarity index, and produced to the genetic similarity dendrogram by unweighted pair-group method arithmetic averaged (UPGMA) analysis, showing three groups; a major cluster(twelve species), minor cluster (4 species) and single-clustering species. The results of RAPD were compatible with the morphological classification, as well as the chemotaxonomic classification of the Acanthopanax species. The Acanthopanax species containing 3,4-seco-lupane type triterpene compounds in their leaves corresponded to the major cluster, another species having oleanane or normal lupane type constituents to minor clusters, and one species not containing triterpenoidal compound to single-cluster.

On the Categorical Variable Clustering

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.2
    • /
    • pp.219-226
    • /
    • 1996
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, variable clustering was conducted based on some similarity measures between variables which have binary characteristics. We propose a variable clustering method when variables have more categories ordered in some sense. We also consider some measures of association as a similarity between variables. Numerical example is included.

  • PDF

Comparison of the Similarity Among the Plant Communities of the Grazing Pasture by the Cluster-Analysis (군집분석을 이용한 방목초지 식물군락의 유사성 비교)

  • Park, Geun-Je;Spatz, G.
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.24 no.4
    • /
    • pp.293-300
    • /
    • 2004
  • This study was carried out to investigate the ecological behaviour forage value and similarity among the plant communities of the grazing pasture near Witzenhausen in middle part of Germany. Sixteen plant communities of the different grazing pasture were mostly the Molinio-Arrhenatheretea and Festuco-Brometea, and those were named the class of plant sociological nomenclature. The ecological behaviour and forage value of the communities except mesobromion(half dry grassland community) were relatively good for forage production. The correlation coefficient between class No. 14 and 12 of plant communities was highest, and the similarity among the communities were greatly affected by botanical composition. The resemblance measure of the cluster-analysis by complete-linkage-method for the similarity among plant communities was better the euclidean distance than those of others. The clustering analysis showed that the communities of relatively similar botanical composition were closely grouped.

A Study of Similarity Measure Algorithms for Recomendation System about the PET Food (반려동물 사료 추천시스템을 위한 유사성 측정 알고리즘에 대한 연구)

  • Kim, Sam-Taek
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.11
    • /
    • pp.159-164
    • /
    • 2019
  • Recent developments in ICT technology have increased interest in the care and health of pets such as dogs and cats. In this paper, cluster analysis was performed based on the component data of pet food to be used in various fields of the pet industry. For cluster analysis, the similarity was analyzed by analyzing the correlation between components of 300 dogs and cats in the market. In this paper, clustering techniques such as Hierarchical, K-Means, Partitioning around medoids (PAM), Density-based, Mean-Shift are clustered and analyzed. We also propose a personalized recommendation system for pets. The results of this paper can be used for personalized services such as feed recommendation system for pets.

A Re-Ranking Retrieval Model based on Two-Level Similarity Relation Matrices (2단계 유사관계 행렬을 기반으로 한 순위 재조정 검색 모델)

  • 이기영;은희주;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1519-1533
    • /
    • 2004
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. In this paper, we apply the fuzzy retrieval model to solve the high time complexity of the retrieval system by constructing a reduced term set for the term's relatively importance degree. Furthermore, we perform a cluster retrieval to reflect the user's Query exactly through the similarity relation matrix satisfying the characteristics of the fuzzy compatibility relation. We have proven the performance of a proposed re-ranking model based on the similarity union of the fuzzy retrieval model and the document cluster retrieval model.

Nonlinear damage detection using linear ARMA models with classification algorithms

  • Chen, Liujie;Yu, Ling;Fu, Jiyang;Ng, Ching-Tai
    • Smart Structures and Systems
    • /
    • v.26 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Majority of the damage in engineering structures is nonlinear. Damage sensitive features (DSFs) extracted by traditional methods from linear time series models cannot effectively handle nonlinearity induced by structural damage. A new DSF is proposed based on vector space cosine similarity (VSCS), which combines K-means cluster analysis and Bayesian discrimination to detect nonlinear structural damage. A reference autoregressive moving average (ARMA) model is built based on measured acceleration data. This study first considers an existing DSF, residual standard deviation (RSD). The DSF is further advanced using the VSCS, and then the advanced VSCS is classified using K-means cluster analysis and Bayes discriminant analysis, respectively. The performance of the proposed approach is then verified using experimental data from a three-story shear building structure, and compared with the results of existing RSD. It is demonstrated that combining the linear ARMA model and the advanced VSCS, with cluster analysis and Bayes discriminant analysis, respectively, is an effective approach for detection of nonlinear damage. This approach improves the reliability and accuracy of the nonlinear damage detection using the linear model and significantly reduces the computational cost. The results indicate that the proposed approach is potential to be a promising damage detection technique.

Genetic Diversity among the Genera Allium in Mongolia Based on Random Amplified Polymorphic DNA (RAPD) Analysis

  • Chun, Jong-Un;Bae, Chang-Hyu
    • Plant Resources
    • /
    • v.4 no.3
    • /
    • pp.121-129
    • /
    • 2001
  • Intraspecific genetic diversity of sixteen accessions of Mogolian Alliums including fifteen species was investigated using randomly amplified polymorphic DNA (RAPD) analysis. Twenty three out of forty primers revealed scorable polymorphism. A total of 440 RAPD markers were generated on the 16 accessions of Mongolian Alliums. Among 440 RAPDs assayed, 439 were polymorphic with a mean polymorphic rate of 99.7%. Unweighted pair-group method using an arithmetic average (UPGMA) cluster analysis using RAPD data separated the 16 Allium accessions into two broad groups at similarity index 0.70. The clustering of the species was closely related with previous classification between A. altaicum and A. fistulosum. In addition, a high genetic similarity was showed between A. cepa and A. tagar.

  • PDF