• Title/Summary/Keyword: Dissimilarity Distance Matrix

Search Result 11, Processing Time 0.022 seconds

Evaluation on Development Performances of E-Commerce for 50 Major Cities in China (중국 주요 50개 도시의 전자상거래 발전성과에 대한 평가)

  • Jeong, Dong-Bin;Wang, Qiang
    • Journal of Distribution Science
    • /
    • v.14 no.1
    • /
    • pp.67-74
    • /
    • 2016
  • Purpose - In this paper, the degree of similarity and dissimilarity between pairs of 50 major cities in China can be shown on the basis of three evaluation variables(internet businessman index, internet shopping index and e-commerce development index). Dissimilarity distance matrix is used to analyze both similarity and dissimilarity between each fifty city in China by calculating dissimilarity as distance. Higher value signifies higher degree of dissimilarity between two cities. Cluster analysis is exploited to classify 50 cities into a number of different groups such that similar cities are placed in the same group. In addition, multidimensional scaling(MDS) technique can obtain visual representation for exploring the pattern of proximities among 50 major cities in China based on three development performance attributes. Research design, data, and methodology - This research is performed by the 2013 report provided with AliResearch in China(1/1/2013~11/30/2013) and utilized multivariate methods such as dissimilarity distance matrix, cluster analysis and MDS by using CLUSTER, KMEANS, PROXIMITIES and ALSCAL procedures in SPSS 21.0. Results - This research applies two types of cluster analysis and MDS on three development performances based on the 2013 report of Aliresearch. As a result, it is confirmed that grouping is possible by categorizing the types into four clusters which share similar characteristics. MDS is exploited to carry out positioning of both grouped locations of cluster and 50 major cities belonging to each cluster. Since all the values corresponding to Shenzhen, Guangzhou and Hangzhou(which belong to cluster 1 among 50 major cities) are very large, these cities are superior to other cities in all three evaluation attributes. Twelve cities(Beijing, ShangHai, Jinghua, ZhuHai, XiaMen, SuZhou, NanJing, DongWan, ZhangShan, JiaXing, NingBo and FoShan), which belong to cluster 3, are inferior to those of cluster 1 in terms of all three attributes, but they can be expected to be the next e-commerce revolution. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three attributes, so that this automatically evokes creative innovation, which leads to e-commerce development as a whole in China. In terms of internet businessman index, on the other hand, Tainan, Taizhong, and Gaoxiong(which belong to cluster 2) are situated superior to others. However, these three cities are inferior to others in an internet shopping index sense. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three evaluation attributes, so that this automatically evokes innovation and entrepreneurship, which leads to e-commerce development as a whole in China. Conclusions - This study suggests the implications to help e-governmental officers and companies make strategies in both Korea and China. This is expected to give some useful information in understanding the recent situation of e-commerce in China, by looking over development performances of 50 major cities. Therefore, we should develop marketing, branding and communication relevant to online Chinese consumers. One of these efforts will be incentives like loyalty points and coupons that can encourage consumers and building in-house logistics networks.

A practical application of cluster analysis using SPSS

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1207-1212
    • /
    • 2009
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input text data. Various measures of similarities (or dissimilarities) between objects (or variables) are developed. We introduce a real application problem of clustering procedure in SPSS when the distance matrix of the objects (or variables) is only given as an input data. It will be very helpful for the cluster analysis of huge data set which leads the size of the proximity matrix greater than 1000, particularly. Syntax command for matrix input data in SPSS for clustering is given with numerical examples.

  • PDF

On Optimizing Dissimilarity-Based Classifications Using a DTW and Fusion Strategies (DTW와 퓨전기법을 이용한 비유사도 기반 분류법의 최적화)

  • Kim, Sang-Woon;Kim, Seung-Hwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.2
    • /
    • pp.21-28
    • /
    • 2010
  • This paper reports an experimental result on optimizing dissimilarity-based classification(DBC) by simultaneously using a dynamic time warping(DTW) and a multiple fusion strategy(MFS). DBC is a way of defining classifiers among classes; they are not based on the feature measurements of individual samples, but rather on a suitable dissimilarity measure among the samples. In DTW, the dissimilarity is measured in two steps: first, we adjust the object samples by finding the best warping path with a correlation coefficient-based DTW technique. We then compute the dissimilarity distance between the adjusted objects with conventional measures. In MFS, fusion strategies are repeatedly used in generating dissimilarity matrices as well as in designing classifiers: we first combine the dissimilarity matrices obtained with the DTW technique to a new matrix. After training some base classifiers in the new matrix, we again combine the results of the base classifiers. Our experimental results for well-known benchmark databases demonstrate that the proposed mechanism achieves further improved results in terms of classification accuracy compared with the previous approaches. From this consideration, the method could also be applied to other high-dimensional tasks, such as multimedia information retrieval.

Genetic variation and relationship of Artemisia capillaris Thunb.(Compositae) by RAPD analysis

  • Kim, Jung-Hyun;Kim, Dong-Kap;Kim, Joo-Hwan
    • Korean Journal of Plant Resources
    • /
    • v.22 no.3
    • /
    • pp.242-247
    • /
    • 2009
  • Randomly Amplified Polymorphic DNA (RAPD) was performed to define the genetic variation and relationships of Artemisia capillaris. Fifteen populations by the distributions and habitat were collected to conduct RAPD analysis. RAPD markers were observed mainly between 300bp and 1600bp. Total 72 scorable markers from 7 primers were applied to generate the genetic matrix, and 69 bands were polymorphic and only 3 bands were monomorphic. The genetic dissimilarity matrix by Nei's genetic distance (1972) and UPGMA phenogram were produced from the data matrix. Populations of Artemisia capillaris were clustered with high genetic affinities and cluster patterns were correlated with distributional patterns. Two big groups were clustered as southern area group and middle area group. The closest OTUs were GW2 and GG1 in middle area group, and GB1 from southern area group was clustered with OTUs in middle area group. RAPD data was useful to define the genetic variations and relationships of A. capillaris.

Geodesic Clustering for Covariance Matrices

  • Lee, Haesung;Ahn, Hyun-Jung;Kim, Kwang-Rae;Kim, Peter T.;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.4
    • /
    • pp.321-331
    • /
    • 2015
  • The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.

WAVELET-BASED FOREST AREAS CLASSIFICATION BY USING HIGH RESOLUTION IMAGERY

  • Yoon Bo-Yeol;Kim Choen
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.698-701
    • /
    • 2005
  • This paper examines that is extracted certain information in forest areas within high resolution imagery based on wavelet transformation. First of all, study areas are selected one more species distributed spots refer to forest type map. Next, study area is cut 256 x 256 pixels size because of image processing problem in large volume data. Prior to wavelet transformation, five texture parameters (contrast, dissimilarity, entropy, homogeneity, Angular Second Moment (ASM≫ calculated by using Gray Level Co-occurrence Matrix (GLCM). Five texture images are set that shifting window size is 3x3, distance .is 1 pixel, and angle is 45 degrees used. Wavelet function is selected Daubechies 4 wavelet basis functions. Result is summarized 3 points; First, Wavelet transformation images derived from contrast, dissimilarity (texture parameters) have on effect on edge elements detection and will have probability used forest road detection. Second, Wavelet fusion images derived from texture parameters and original image can apply to forest area classification because of clustering in Homogeneous forest type structure. Third, for grading evaluation in forest fire damaged area, if data fusion of established classification method, GLCM texture extraction concept and wavelet transformation technique effectively applied forest areas (also other areas), will obtain high accuracy result.

  • PDF

Assessment of Educational Conditions for 28 National Universities in South Korea

  • Jeong, Dong-Bin
    • Asian Journal of Business Environment
    • /
    • v.7 no.1
    • /
    • pp.25-29
    • /
    • 2017
  • Purpose - In this paper, we categorize and segment the 28 national universities in South Korea and measure the degree of dissimilarity (or similarity) between pairs of ones by using dissimilarity distance matrix and cluster analysis, respectively, based on the seven quantitative evaluation of educational conditions (percentage of small-scale courses, percentage of lecture by the faculty, collection of books per student, material purchase per student, percentage of building capacity, percentage of real estate capacity and rate of accommodation) in 2015. In addition, multidimensional scaling (MDS) techniques can obtain visual representation for exploring patterns of proximities among 28 national universities based on seven attributes of educational conditions. Research design, data, and methodology - This work is carried out by the 2015 Announcement of University Information, which is provided by Ministry of Education in South Korea and utilized by multivariate analyses with CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0. Results - We make certain that 28 national universities can be categorized into five clusters which have similar traits by applying two-stage cluster analysis. MDS is utilized to perform positioning of grouped places of cluster and 28 national universities joining every cluster. Conclusions - Both types and traits of each national university can be relatively assessed and practically utilized for each university competitiveness based on underlying results.

An Efficient Multidimensional Scaling Method based on CUDA and Divide-and-Conquer (CUDA 및 분할-정복 기반의 효율적인 다차원 척도법)

  • Park, Sung-In;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.427-431
    • /
    • 2010
  • Multidimensional scaling (MDS) is a widely used method for dimensionality reduction, of which purpose is to represent high-dimensional data in a low-dimensional space while preserving distances among objects as much as possible. MDS has mainly been applied to data visualization and feature selection. Among various MDS methods, the classical MDS is not readily applicable to data which has large numbers of objects, on normal desktop computers due to its computational complexity. More precisely, it needs to solve eigenpair problems on dissimilarity matrices based on Euclidean distance. Thus, running time and required memory of the classical MDS highly increase as n (the number of objects) grows up, restricting its use in large-scale domains. In this paper, we propose an efficient approximation algorithm for the classical MDS based on divide-and-conquer and CUDA. Through a set of experiments, we show that our approach is highly efficient and effective for analysis and visualization of data consisting of several thousands of objects.

Evaluation of Shopping Items: Focused on Purchase of Foreign Tourists in South Korea

  • Jeong, Dong-Bin
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.7 no.2
    • /
    • pp.21-30
    • /
    • 2019
  • Purpose - In this work, we categorize the 21 shopping items which foreign tourists purchase in South Korea and monitor the level of dissimilarity (or similarity) between each item by utilizing distance matrix, and both hierarchical and k-means cluster analyses, respectively, based on several purpose of visit attributes in 2017. In addition, multidimensional scaling (MDS) method is applied for mining visual appearance of proximities among shopping items based on purpose of visit attributes. Research design and methodology - This study is carried out in 2017 by Ministry of Culture, Sports and Tourism and conduct a face-to-face survey of foreign tourists from 20 countries who purchase shopping items in South Korea. CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0 are used to perform this work. Results - We ascertain that 21 shopping items can be classified into five similar groups which have homogeneous traits by going through two-step cluster analysis. We can position homogeneous places of cluster and shopping items joining each cluster. Conclusions - We can relatively assess patterns and characteristics of each shopping item, come by useful information in activating shopping tour based on the actual state of recognition of foreign tourists and practically apply to each tourism industry on underlying results.

Relationships between Genetic Diversity and Fusarium Toxin Profiles of Winter Wheat Cultivars

  • Goral, Tomasz;Stuper-Szablewska, Kinga;Busko, Maciej;Boczkowska, Maja;Walentyn-Goral, Dorota;Wisniewska, Halina;Perkowski, Juliusz
    • The Plant Pathology Journal
    • /
    • v.31 no.3
    • /
    • pp.226-244
    • /
    • 2015
  • Fusarium head blight is one of the most important and most common diseases of winter wheat. In order to better understanding this disease and to assess the correlations between different factors, 30 cultivars of this cereal were evaluated in a two-year period. Fusarium head blight resistance was evaluated and the concentration of trichothecene mycotoxins was analysed. Grain samples originated from plants inoculated with Fusarium culmorum and naturally infected with Fusarium species. The genetic distance between the tested cultivars was determined and data were analysed using multivariate data analysis methods. Genetic dissimilarity of wheat cultivars ranged between 0.06 and 0.78. They were grouped into three distinct groups after cluster analysis of genetic distance. Wheat cultivars differed in resistance to spike and kernel infection and in resistance to spread of Fusarium within a spike (type II). Only B trichothecenes (deoxynivalenol, 3-acetyldeoxynivalenol and nivalenol) produced by F. culmorum in grain samples from inoculated plots were present. In control samples trichothecenes of groups A (H-2 toxin, T-2 toxin, T-2 tetraol, T-2 triol, scirpentriol, diacetoxyscirpenol) and B were detected. On the basis of Fusarium head blight assessment and analysis of trichothecene concentration in the grain relationships between morphological characters, Fusarium head blight resistance and mycotoxins in grain of wheat cultivars were examined. The results were used to create of matrices of distance between cultivars - for trichothecene concentration in inoculated and naturally infected grain as well as for FHB resistance Correlations between genetic distance versus resistance/mycotoxin profiles were calculated using the Mantel test. A highly significant correlation between genetic distance and mycotoxin distance was found for the samples inoculated with Fusarium culmorum. Significant but weak relationships were found between genetic distance matrix and FHB resistance or trichothecene concentration in naturally infected grain matrices.