• Title/Summary/Keyword: dissimilarity

Search Result 269, Processing Time 0.027 seconds

Geodesic Clustering for Covariance Matrices

  • Lee, Haesung;Ahn, Hyun-Jung;Kim, Kwang-Rae;Kim, Peter T.;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.321-331
    • /
    • 2015
  • The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.

Numerical Taxonomy of the Tribe Pterostichini Sloane from Korea(II) (한국산 길쭉먼지벌레족의 수리분류(II))

  • Park, Jong Kyun;Kwon, Young Jung
    • Current Research on Agriculture and Life Sciences
    • /
    • v.14
    • /
    • pp.1-14
    • /
    • 1996
  • A numerical taxonomy based on the phenetic characters of 59 Korean Pterostichini species is conducted to determine the effect on the assessment of the 7 different methods combined by 3 similarity or dissimilarity coefficients, using 87 morphological multistate characters.

  • PDF

Nearest Neighbor Based Prototype Classification Preserving Class Regions

  • Hwang, Doosung;Kim, Daewon
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1345-1357
    • /
    • 2017
  • A prototype selection method chooses a small set of training points from a whole set of class data. As the data size increases, the selected prototypes play a significant role in covering class regions and learning a discriminate rule. This paper discusses the methods for selecting prototypes in a classification framework. We formulate a prototype selection problem into a set covering optimization problem in which the sets are composed with distance metric and predefined classes. The formulation of our problem makes us draw attention only to prototypes per class, not considering the other class points. A training point becomes a prototype by checking the number of neighbors and whether it is preselected. In this setting, we propose a greedy algorithm which chooses the most relevant points for preserving the class dominant regions. The proposed method is simple to implement, does not have parameters to adapt, and achieves better or comparable results on both artificial and real-world problems.

A Survey of Advances in Hierarchical Clustering Algorithms and Applications

  • Munshi, Amr
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.17-24
    • /
    • 2022
  • Hierarchical clustering methods have been proposed for more than sixty years and yet are used in various disciplines for relation observation and clustering purposes. In 1965, divisive hierarchical methods were proposed in biological sciences and have been used in various disciplines such as, and anthropology, ecology. Furthermore, recently hierarchical methods are being deployed in economy and energy studies. Unlike most clustering algorithms that require the number of clusters to be specified by the user, hierarchical clustering is well suited for situations where the number of clusters is unknown. This paper presents an overview of the hierarchical clustering algorithm. The dissimilarity measurements that can be utilized in hierarchical clustering algorithms are discussed. Further, the paper highlights the various and recent disciplines where the hierarchical clustering algorithms are employed.

Personalized Hybrid Outfit Recommendation Based on Image Dissimilarity (이미지 비유사도 기반의 개인화된 하이브리드 의류 추천 모델)

  • Jeong-Won Yang;Ji-Hye Baek;Hyon-Hee Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.459-460
    • /
    • 2023
  • 기존의 추천시스템은 상품간 혹은 사용자 간의 유사도를 기반으로 작동한다. 하지만 이는 사용자가 유사한 상품 추천 속에 갇히게 되는 필터 버블의 문제와 추천시스템의 고질적인 문제인 데이터 희소성 문제를 피할 수 없게 된다. 따라서 본 연구에서는 사용자의 취향과 체형 정보를 반영하여 사용자의 평점을 예측하는 협업 필터링 기반 딥러닝 추천과 상품간 비유사성을 고려하여 사용자의 평점을 예측하는 내용 기반 추천을 혼합한 하이브리드 추천 모델을 구축하여 기존 추천시스템의 문제점을 해결하였다. 모델의 성능평가를 위해 인터넷 의류 쇼핑몰을 대상으로 유사한 이미지를 활용한 하이브리드 추천 모델과 NDCG 값을 비교하였고 유사도가 낮은 이미지를 활용한 모델이 더 우수한 성능을 보였다. 이는 다른 제품과는 달리 소비자가 의류를 구매할 경우 이미 구매한 상품과 유사한 상품보다는 유사하지 않은 상품을 구매할 가능성이 크다는 것을 보여준다.

How the Science Gifted Connect and Integrate Science Concepts in the Process of Problem Finding (과학영재들이 문제발견 과정에서 나타내는 과학개념 연결방식과 융합적 사고의 특징)

  • Park, Mi-jin;Seo, Hae-Ae
    • Journal of Science Education
    • /
    • v.42 no.2
    • /
    • pp.256-271
    • /
    • 2018
  • The study aimed to investigate how the science gifted connect and integrate science concepts in the process of problem finding. Research subject was sampled from 228 applicants for a science gifted education center affiliated with a university in 2015. A creative problem solving test (CPST) in science, which administered as an admission process, was utilized as a reference to sample two groups. Sixty-seven students from top 30% in test scores were selected for the upper group and 64 students from bottom 30% in test scores were selected for the lower group. The CPST, which was developed by researchers, included one item about how to connect two science concepts among eight science concepts, sound, electricity, weight, temperature, respiration, photosynthesis, weather, and earthquake extracted from elementary science curriculum. As results, there were differences in choosing two concepts among four science major areas. The ways of connecting science concepts were characterized by three categories, relation-based, similarity-based, and dissimilarity-based. In addition, relation-based was characterized by attributes, means, influences, predictions, and causes; similarity-based was by attributes, objects, scientific principles, and phenomena, and dissimilarity-based was by parallel, resource, and deletion. There were significant (p<.000) differences in ways of connecting science concepts between the upper and the lower groups. The upper group students preferred connecting science concepts of inter-science subjects while the lower group students preferred connecting science concepts of intra-science subject. The upper group students showed a tendency to connect the science concepts based on similarity. In contrast, the lower group students frequently showed ways of connecting the science concepts based on dissimilarity. In particular, they simply parallelled science concepts.

Genomic analysis of Mycobacterium fortuitum by pulsed-field gel electrophoresis (Pulsed-field Gel Electrophoresis를 이용한 Mycobacterium fortuitum의 유전형 분석)

  • Lee, Tae-Yoon;Do, In-A;Kim, Sung-Kwang
    • Journal of Yeungnam Medical Science
    • /
    • v.12 no.2
    • /
    • pp.366-385
    • /
    • 1995
  • Epidemiological studies are important in both the prevention and treatment of mycobacterial infections. This study was initiated to establish the pulsed-field gel electrophoresis (PFGE) method, which are not yet extensively studied. The most apprpriate restriction endonucleases included DraI, AsnI, and XbaI. The optimal PFGE condition was different according to the enzymes used. Two stage PFGE was performed, in case of DraI first stage was performed with 10 seconds of initial pulse and 15 seconds of final pulse, while the second stage was performed with 60 seconds of initial pulse and 70 seconds of final pulse. The electrophoresis time for DraI-PFGE was 14 hours for each stage. Electrophoresis was performed for 22 hours, in case of XbaI, with 3 seconds of initial pulse and 12 seconds of final pulse. Electrophoresis was performed for 22 hours, in case of AsnI, with 5 seconds of initial pulse and 25 seconds of final pulse. In all cases the voltage of the electrophoresis was maintained constantly at 200 voltage. Standard mycobacterial strains, which included Mycobacterium bovis BCG, M. tuberculosis, and M. fortuitum, could not be differentiated by PFGE analysis. PFGE analysis was performed to differentiate 9 clinically isolated M. fortuitum strains using AsnI. All M. fortuitum strains showed different genotypes except 2 strains. Cluster analysis divided M. fortuitum strains into 2 large groups. PFGE analysis was performed to further differentiate M. fortuitum isolates using XbaI. The undifferentiated 2 M. fortuitum strains showed different PFGE patterns with Xba I. Cluster analysis of the XbaI-PFGE patterns showed more complex grouping than AsnI-PFGE patterns, which showed that XbaI-PFGE analysis was better than AsnI-PFGE in M. fortuitum genotyping. The top dissimilarity values of AsnI-PFGE and XbaI-PFGE were 0.74 and 0.75, respectively. This value was higher than that of arbitrarily primed polymerase chain reaction (AP-PCR) analysis and lower than that of restriction fragment length polymorphism (RFLP) analysis. This suggested that PFGE can be used as a supportive or alternative genotyping method to RFLP analysis.

  • PDF

Taxonomy of Korean Calanthe species and few of its mutants based on AFLP data (AFLP에 의한 한국산 새우난초속 식물과 그의 수종 돌연변이에 대한 분류학적 연구)

  • Srikanth, Krishnamoorthy;Koo, Ja Choon;Ku, Jajung;Choi, Kyung;Park, Kwang-Woo;So, Soonku;Choi, Yong-Gook;Whang, Sung Soo
    • Korean Journal of Plant Taxonomy
    • /
    • v.42 no.3
    • /
    • pp.215-221
    • /
    • 2012
  • Five Korean Calanthe species, C. discolor, C. bicolor, C. sieboldii, C. reflexa, and C. aristulifera, were studied using amplified fragment length polymorphism (AFLP) to assess their taxonomic and genetic relationships. Sixteen accessions belonging to five native Calanthe spp. and mutants with yellow tepal and white lip (YW mutants) were studied. We identified 50 putative markers using AFLP analysis. The results of AMOVA showed that genetic variance was higher between species than within species. Genetic dissimilarity when compared with the rest of the species was the lowest for individuals of the YW mutants and the highest for individuals of C. reflexa. The mutants clustered outside the major group. Calanthe bicolor clustered with C. discolor, suggesting that its genetic composition is closer to that of C. discolor. Though it is suggested to have originated as a result of natural hybridization between C. sieboldii and C. discolor, introgression is likely to have occurred in the direction of C. discolor based on the data of molecular marker, clustering and genetic dissimilarity. Calanthe reflexa and C. aristulifera were genetically the most diverse of the species studied. In conclusion, the results showed that there is genetic diversity in Korean Calanthe species, that C. bicolor introgressed in the direction of C. discolor and that the YW mutants are genetically closer to C. sieboldii.

Taxonomic study of Viola albida complex based on RAPD data (RAPD 자료에 근거한 태백제비꽃군의 분류학적 연구)

  • Koo, Ja Choon;Tak, Hyo Jin;Whang, Sung Soo
    • Korean Journal of Plant Taxonomy
    • /
    • v.40 no.2
    • /
    • pp.118-129
    • /
    • 2010
  • A taxonomic study of Viola albida complex, containing the representative individuals of three taxa, V. albida var. albida, V. albida var. chaerophylloides, and V. albida var. takahashii, was done based on RAPD data. The amplified loci were 476 in total; obtained with 68 universal primers on seven OTUs. Nei's genetic dissimilarity appeared relatively low within individuals of V. albida var. albida and V. albida var. chaerophylloides (0.118-0.171 and 0.051 respectively), however, it was higher in individuals of V. albida var. takahashii (0.348). On the other hand, there is no specific trend in terms of genetic dissimilartiy among taxa, such as between individuals of V. albida var. albida and V. albida var. takahashii, between those of V. albida var. albida and V. albida var. chaerophylloides, and between those of V. albida var. albida and V. albida var. takahashii. The similarity of OTUs studied is high in clustering analysis, so that this result is compatible with the establishment of this complex. All OTUs are clustered within two groups. The individuals of V. albida var. takahashii, however, are clustered both to the group of V. albida var. albida and to the group of V. albida var. chaerophylloides, meaning that the genetic difference is high which would be commensurate with their morphological variations.

Genetic Variation Based on Random Amplified Polymorphic DNA (RAPD) and Internal Transcribed Spacer (ITS) Region Sequences in Lepista nuda (RAPD와 ITS 영역에 의한 민자주방망이 버섯의 유전적 변이)

  • Lee, Yang Suk;Kim, Nam Woo;Kim, Jong Bong
    • Journal of Life Science
    • /
    • v.22 no.11
    • /
    • pp.1470-1476
    • /
    • 2012
  • A genetic variation of Lepista nuda and two genus Lepista species (L. irina and L. sordida) were analyzed by random amplified polymorphic DNA (RAPD) and internal transcribed spacer (ITS) sequence analysis. In the resulting RAPD analysis, 22 out of 40 random primers amplified polymorphic RAPD fragment patterns, the amplified bands were 355, and DNA fragment sizes were 200-400bp. Intraspecific genetic dissimilarity of the 10 L. nuda strains were calculated to range from 0% to 21.60%, L. sordida from 16.93% to 24.82%, L. irina were 20.62% to 25.54%, and intraspecific genetic dissimilarity of L. sordida and L. irina was 23.49%. The 673 base pairs were sequenced during the analysis of the ITS I and II region; six L. nuda strains intraspecific genetic dissimilarities ranged from 1.58% to 11.47%, L. nuda and L. sordida from 3.83% to 12.88%, L. nuda and L. irina from 7.11% to 15.61%, and intra-specific genetic variation between L. sordida and L. irina was 4.79%. The findings showed that RAPD and ITS sequencing could be used for developing molecular genetic markers and screening of unidentified genus Lepista species.