• Title/Summary/Keyword: Hierarchical cluster analysis

Search Result 303, Processing Time 0.033 seconds

Optimize TOD Time-Division with Dynamic Time Warping Distance-based Non-Hierarchical Cluster Analysis (동적 타임 워핑 거리 기반 비 계층적 군집분석을 활용한 TOD 시간분할 최적화)

  • Hwang, Jae-Yeon;Park, Minju;Kim, Yongho;Kang, Woojin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.5
    • /
    • pp.113-129
    • /
    • 2021
  • Recently, traffic congestion in the city is continuously increasing due to the expansion of the living area centered in the metropolitan area and the concentration of population in large cities. New road construction has become impossible due to the increase in land prices in downtown areas and limited sites, and the importance of efficient data-based road operation is increasingly emerging. For efficient road operation, it is essential to classify appropriate scenarios according to changes in traffic conditions and to operate optimal signals for each scenario. In this study, the Dynamic Time Warping model for cluster analysis of time series data was applied to traffic volume and speed data collected at continuous intersections for optimal scenario classification. We propose a methodology for composing an optimal signal operation scenario by analyzing the characteristics of the scenarios for each data used for classification.

Re-evaluation of Obesity Syndrome Differentiation Questionnaire Based on Real-world Survey Data Using Data Mining (데이터 마이닝을 이용한 한의비만변증 설문지 재평가: 실제 임상에서 수집한 설문응답 기반으로)

  • Oh, Jihong;Wang, Jing-Hua;Choi, Sun-Mi;Kim, Hojun
    • Journal of Korean Medicine for Obesity Research
    • /
    • v.21 no.2
    • /
    • pp.80-94
    • /
    • 2021
  • Objectives: The purpose of this study is to re-evaluate the importance of questions of obesity syndrome differentiation (OSD) questionnaire based on real-world survey and to explore the possibility of simplifying OSD types. Methods: The OSD frequency was identified, and variance threshold feature selection was performed to filter the questions. Filtered questions were clustered by K-means clustering and hierarchical clustering. After principal component analysis (PCA), the distribution patterns of the subjects were identified and the differences in the syndrome distribution were compared. Results: The frequency of OSD in spleen deficiency, phlegm (PH), and blood stasis (BS) was lower than in food retention (FR), liver qi stagnation (LS), and yang deficiency. We excluded 13 questions with low variance, 7 of which were related to BS. Filtered questions were clustered into 3 groups by K-means clustering; Cluster 1 (17 questions) mainly related to PH, BS syndromes; Cluster 2 (11 questions) related to swelling, and indigestion; Cluster 3 (11 questions) related to overeating or emotional symptoms. After PCA, significant different patterns of subjects were observed in the FR, LS, and other obesity syndromes. The questions that mainly affect the FR distribution were digestive symptoms. And emotional symptoms mainly affect the distribution of LS subjects. And other obesity syndrome was partially affected by both digestive and emotional symptoms, and also affected by symptoms related to poor circulation. Conclusions: In-depth data mining analysis identified relatively low importance questions and the potential to simplify OSD types.

Analysis of Vegetation Structure of Castanopsis sieboldii Forest in the Warm-temperate Zone, Korea

  • Lee, Sung-Je;Ohno, Keiichi;Song, Jong-Suk
    • Journal of Environmental Science International
    • /
    • v.21 no.2
    • /
    • pp.135-144
    • /
    • 2012
  • This study aims at classifying and analyzing the vegetation structure of Castanopsis sieboldii forest, one of the evergreen broad-leaved forests found under the warm-temperate climate of Korea. It is also compared with the ones of the Castanopsis sieboldii forest in Japan where most similar such forest of Korea, to find unique vegetation structures of the only Korean forest. Vegetation structure of Korean Castanopsis sieboldii forest was divided into two units at the level of community units both of Ardisia japonica-Castanopsis sieboldii community and Ardisio-Castanopsietum sieboldii association. The association carries similar type with the vegetation system of Japan, but any subunits differentiated with the Japan were found vary much. Hierarchical cluster analysis brings in similar result with the analysis on the vegetation structure as well.

Influences of Information Technology Structure Taxonomy on Business Performance - Moderating Effect of Organization Structure and Control System - (정보기술구조유형이 경영성과에 미치는 영향 - 조직구조와 통제시스템의 조절효과를 중심으로 -)

  • Kim, Moon-Shik
    • Asia pacific journal of information systems
    • /
    • v.9 no.1
    • /
    • pp.17-38
    • /
    • 1999
  • While the value of information technology has long been a hot issue, few solid results have been found as of yet. It is partly due to methodological factors and model underspecifcation. This study empirically develops a ITS(information technology structure) taxonomy and investigates the relationships between ITS taxonomy and business performance in the Korean firms. Among factors that impact business performance, organization structure and control system are selected and they are hypothesized to moderate-the relationships between ITS taxonomy and business performance. By surveying 91 manufacturing firms and applying hierarchical cluster analysis, four ITS are identified : centralized, decentralized, centralized cooperative, decentralized cooperative. ANOVA, correlation analysis and crosstable analysis say the presence of moderating effect of organization structure and control system. Cooperative ITS is best in business performance. Centralized ITS is related to functional organizational form. Decentralized ITS is related to product organizational form with decentralized decision making, Centralized cooperative ITS is related to matrix organizational form. Decentralized cooperative ITS is related to matrix organizational form with high integration. These findings have implications for the opportunities and challenges to match information technology with organization structure and control system.

  • PDF

CLUSTERING DNA MICROARRAY DATA BY STOCHASTIC ALGORITHM

  • Shon, Ho-Sun;Kim, Sun-Shin;Wang, Ling;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.438-441
    • /
    • 2007
  • Recently, due to molecular biology and engineering technology, DNA microarray makes people watch thousands of genes and the state of variation from the tissue samples of living body. With DNA Microarray, it is possible to construct a genetic group that has similar expression patterns and grasp the progress and variation of gene. This paper practices Cluster Analysis which purposes the discovery of biological subgroup or class by using gene expression information. Hence, the purpose of this paper is to predict a new class which is unknown, open leukaemia data are used for the experiment, and MCL (Markov CLustering) algorithm is applied as an analysis method. The MCL algorithm is based on probability and graph flow theory. MCL simulates random walks on a graph using Markov matrices to determine the transition probabilities among nodes of the graph. If you look at closely to the method, first, MCL algorithm should be applied after getting the distance by using Euclidean distance, then inflation and diagonal factors which are tuning modulus should be tuned, and finally the threshold using the average of each column should be gotten to distinguish one class from another class. Our method has improved the accuracy through using the threshold, namely the average of each column. Our experimental result shows about 70% of accuracy in average compared to the class that is known before. Also, for the comparison evaluation to other algorithm, the proposed method compared to and analyzed SOM (Self-Organizing Map) clustering algorithm which is divided into neural network and hierarchical clustering. The method shows the better result when compared to hierarchical clustering. In further study, it should be studied whether there will be a similar result when the parameter of inflation gotten from our experiment is applied to other gene expression data. We are also trying to make a systematic method to improve the accuracy by regulating the factors mentioned above.

  • PDF

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.

A Study on the Classification of Jeokbyeok-ga's Version by the Computer Analysis Technique of Bibliographies (컴퓨터 문헌 분석 기법을 활용한 <적벽가> 이본의 계통 분류 연구)

  • Lee, Jin-O;Kim, Dong-Keon
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.6
    • /
    • pp.1-9
    • /
    • 2019
  • The purpose of this study is to examine the system of the Jeokbyeok-ga's version using the Computer analysis technique of bibliographies and to examine the achievements of the Jeokbyeok-ga's version studies. First, in order to provide basic data for analysis, a raw corpus was constructed for 46 species of Jeokbyeok-ga. Through this, the common narrative units of the Jeokbyeok-ga were identified as 5 layers, and thus 146 individual paragraphs could be extracted. Based on the encoded corpus, we tried to measure the similarity and the distance between the two. Next, we applied the Multidimensional scaling method, Hierarchical cluster analysis and Cladistic analysis method of the system to confirm the distribution of versions group and it was possible to visually grasp the distance between versions and the system of the work. As a result of analyzing Computer analysis technique of bibliographies, it was found that version's group of the Jeokbyeok-ga was divided into a Wanpan(完板) series and Changbon(唱本) series. Also, it was possible to examine the influence relationship between the Pansori's traditions and transmission.

The Difference Analysis between Maturity Stages of Venture Firms by Classification Techniques of Big Data (빅데이터 분류 기법에 따른 벤처 기업의 성장 단계별 차이 분석)

  • Jung, Byoungho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.197-212
    • /
    • 2019
  • The purpose of this study is to identify the maturity stages of venture firms through classification analysis, which is widely used as a big data technique. Venture companies should develop a competitive advantage in the market. And the maturity stage of a company can be classified into five stages. I will analyze a difference in the growth stage of venture firms between the survey response and the statistical classification methods. The firm growth level distinguished five stages and was divided into the period of start-up and declines. A classification method of big data uses popularly k-mean cluster analysis, hierarchical cluster analysis, artificial neural network, and decision tree analysis. I used variables that asset increase, capital increase, sales increase, operating profit increase, R&D investment increase, operation period and retirement number. The research results, each big data analysis technique showed a large difference of samples sized in the group. In particular, the decision tree and neural networks' methods were classified as three groups rather than five groups. The groups size of all classification analysis was all different by the big data analysis methods. Furthermore, according to the variables' selection and the sample size may be dissimilar results. Also, each classed group showed a number of competitive differences. The research implication is that an analysts need to interpret statistics through management theory in order to interpret classification of big data results correctly. In addition, the choice of classification analysis should be determined by considering not only management theory but also practical experience. Finally, the growth of venture firms needs to be examined by time-series analysis and closely monitored by individual firms. And, future research will need to include significant variables of the company's maturity stages.

Genetic distances of three venerid species identified by PCR analysis

  • Jeon, Jun-Hyub;Yoon, Jong-Man
    • The Korean Journal of Malacology
    • /
    • v.31 no.4
    • /
    • pp.257-262
    • /
    • 2015
  • The seven selected primers BION-13, BION-29, BION-61, BION-64, BION-68, BION-72 and BION-80 generated the total number of loci, average number of loci per lane and specific loci in Meretrix lusoria (ML), Saxidomus purpuratus (SP) and Cyclina sinensis (CS) species. Here, the complexity of the banding patterns varied dramatically between the primers from the three venerid clam species. The higher fragment sizes (> 1,000 bp) are much more observed in the SP species. The primer BION-68 generated 21 unique loci to each species, which were ascertaining each species, approximately 150 bp, 300 bp and 450 bp, in the ML species. Remarkably, the primer BION-80 detected 7 shared loci by the three clam species, major and/or minor fragments of sizes 500 bp, which were matching in all samples. As regards average bandsharing value (BS) results, individuals from CS clam species (0.754) exhibited higher bandsharing values than did individuals from SP clam species (0.607) (P < 0.05). In this study, the dendrogram obtained by the seven oligonucleotides primers indicates three genetic clusters: cluster 1 (LUSORIA01-LUSORIA07), cluster 2 (PURPURATUS08-PURPURATUS14), cluster 3 (SINENSIS15-SINENSIS21). Among the twenty one venerid clams, the shortest genetic distance that displayed significant molecular differences was between individuals 18 and 20 from the CS species (genetic distance = 0.071), while the longest genetic distance among the twenty-one individuals that displayed significant molecular differences was between individuals LUSORIA no. 02 and PURPURATUS no. 09 (genetic distance = 0.778). Relatively, individuals of SP venerid species were appropriately closely related to that of CS species, as shown in the hierarchical dendrogram of genetic distances. Eventually, PCR fragments exposed in the present study may be worthwhile as a DNA marker the three venerid clam species to discriminate.

Cluster Analysis Study based on Content Types of <Heungbu-jeon> versions (<흥부전> 이본의 내용 유형에 따른 군집 분석 연구)

  • Woonho Choi;Dong Gun Kim
    • Journal of Platform Technology
    • /
    • v.11 no.5
    • /
    • pp.23-36
    • /
    • 2023
  • This study aims to analyze the similarities and dissimilarities of various versions of <Heungbu-jeon> at both micro- and macro-levels using contents analysis techniques and the Hamming distance metrics. The 28 versions of <Heungbu-jeon> were segmented into 341 content units, and for each unit, the value of the content type was encoded. The dissimilarities between content types were compared among all versions by the content unit, respectively. The (dis-)similarities based on the content types of the 28 versions were aggregated and transformed into a distance matrix. The matrix was interpreted by multi-dimensional scaling, resulting into the two-dimensional coordinates. By visualizing the results by multi-dimensional scaling analysis, it was confirmed that the versions of <Heungbu-jeon> can be broadly divided into two groups. Hierarchical clustering and phylogenetic analysis were applied to analyze the clusters of the 28 versions, using the same distance matrix. The results showed that there are five clusters based on the micro-level analysis of (dis-)similarities within two major clusters. This study demonstrated the usefulness of applying digital humanities methods to encode the content of classical literary versions and analyze the data using clustering analysis techniques based on the (dis-)similarity of literary content.

  • PDF