• Title/Summary/Keyword: Ward's Hierarchical Clustering Analysis

Search Result 8, Processing Time 0.028 seconds

Customer Load Pattern Analysis using Clustering Techniques (클러스터링 기법을 이용한 수용가별 전력 데이터 패턴 분석)

  • Ryu, Seunghyoung;Kim, Hongseok;Oh, Doeun;No, Jaekoo
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.1
    • /
    • pp.61-69
    • /
    • 2016
  • Understanding load patterns and customer classification is a basic step in analyzing the behavior of electricity consumers. To achieve that, there have been many researches about clustering customers' daily load data. Nowadays, the deployment of advanced metering infrastructure (AMI) and big-data technologies make it easier to study customers' load data. In this paper, we study load clustering from the view point of yearly and daily load pattern. We compare four clustering methods; K-means clustering, hierarchical clustering (average & Ward's method) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). We also discuss the relationship between clustering results and Korean Standard Industrial Classification that is one of possible labels for customers' load data. We find that hierarchical clustering with Ward's method is suitable for clustering load data and KSIC can be well characterized by daily load pattern, but not quite well by yearly load pattern.

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.

Types of Train Delay of High-Speed Rail : Indicators and Criteria for Classification (고속철도 열차지연 유형의 구분지표 및 기준)

  • Kim, Hansoo;Kang, Joonghyuk;Bae, Yeong-Gyu
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.38 no.3
    • /
    • pp.37-50
    • /
    • 2013
  • The purpose of this study is to determine the indicators and the criteria to classify types of train delays of high-speed rail in South Korea. Types of train delays have divided into the chronic delays and the knock-on delays. The Indicators based on relevance, reliability, and comparability were selected with arrival delay rate of over five minutes, median of arrival delays of preceding train and following train, knock-on delay rate of over five minutes, correlation of delay between preceding train and following train on intermediate and last stations, average train headway, average number of passengers per train, and average seat usages. Types of train delays were separated using the Ward's hierarchical cluster analysis. The criteria for classification of train delay were presented by the Fisher's linear discriminant. The analysis on the situational characteristics of train delays is as follows. If the train headway in last station is short, the probability of chronic delay is high. If the planned running times of train is short, the seriousness of chronic delay is high. The important causes of train delays are short headway of train, shortly planned running times, delays of preceding train, and the excessive number of passengers per train.

Pattern Clustering of Symmetric Regional Cerebral Edema on Brain MRI in Patients with Hepatic Encephalopathy (간성뇌증 환자의 뇌 자기공명영상에서 대칭적인 지역 뇌부종 양상의 군집화)

  • Chun Geun Lim;Hui Joong Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.85 no.2
    • /
    • pp.381-393
    • /
    • 2024
  • Purpose Metabolic abnormalities in hepatic encephalopathy (HE) cause brain edema or demyelinating disease, resulting in symmetric regional cerebral edema (SRCE) on MRI. This study aimed to investigate the usefulness of the clustering analysis of SRCE in predicting the development of brain failure. Materials and Methods MR findings and clinical data of 98 consecutive patients with HE were retrospectively analyzed. The correlation between the 12 regions of SRCE was calculated using the phi (φ) coefficient, and the pattern was classified using hierarchical clustering using the φ2 distance measure and Ward's method. The classified patterns of SRCE were correlated with clinical parameters such as the model for end-stage liver disease (MELD) score and HE grade. Results Significant associations were found between 22 pairs of regions of interest, including the red nucleus and corpus callosum (φ = 0.81, p < 0.001), crus cerebri and red nucleus (φ = 0.72, p < 0.001), and red nucleus and dentate nucleus (φ = 0.66, p < 0.001). After hierarchical clustering, 24 cases were classified into Group I, 35 into Group II, and 39 into Group III. Group III had a higher MELD score (p = 0.04) and HE grade (p = 0.002) than Group I. Conclusion Our study demonstrates that the SRCE patterns can be useful in predicting hepatic preservation and the occurrence of cerebral failure in HE.

Classification of Daily Precipitation Patterns in South Korea using Mutivariate Statistical Methods

  • Mika, Janos;Kim, Baek-Jo;Park, Jong-Kil
    • Journal of Environmental Science International
    • /
    • v.15 no.12
    • /
    • pp.1125-1139
    • /
    • 2006
  • The cluster analysis of diurnal precipitation patterns is performed by using daily precipitation of 59 stations in South Korea from 1973 to 1996 in four seasons of each year. Four seasons are shifted forward by 15 days compared to the general ones. Number of clusters are 15 in winter, 16 in spring and autumn, and 26 in summer, respectively. One of the classes is the totally dry day in each season, indicating that precipitation is never observed at any station. This is treated separately in this study. Distribution of the days among the clusters is rather uneven with rather low area-mean precipitation occurring most frequently. These 4 (seasons)$\times$2 (wet and dry days) classes represent more than the half (59 %) of all days of the year. On the other hand, even the smallest seasonal clusters show at least $5\sim9$ members in the 24 years (1973-1996) period of classification. The cluster analysis is directly performed for the major $5\sim8$ non-correlated coefficients of the diurnal precipitation patterns obtained by factor analysis In order to consider the spatial correlation. More specifically, hierarchical clustering based on Euclidean distance and Ward's method of agglomeration is applied. The relative variance explained by the clustering is as high as average (63%) with better capability in spring (66%) and winter (69 %), but lower than average in autumn (60%) and summer (59%). Through applying weighted relative variances, i.e. dividing the squared deviations by the cluster averages, we obtain even better values, i.e 78 % in average, compared to the same index without clustering. This means that the highest variance remains in the clusters with more precipitation. Besides all statistics necessary for the validation of the final classification, 4 cluster centers are mapped for each season to illustrate the range of typical extremities, paired according to their area mean precipitation or negative pattern correlation. Possible alternatives of the performed classification and reasons for their rejection are also discussed with inclusion of a wide spectrum of recommended applications.

A Classification of Luxury Fashion Brands' E-commerce Sites

  • Kim, Sunghee
    • Journal of Fashion Business
    • /
    • v.17 no.6
    • /
    • pp.125-140
    • /
    • 2013
  • The aim of this study was to analyze e-commerce sites of luxury fashion brands in order to provide insights on how to enhance online site quality. For the research, forty-eight components of thirty-one luxury fashion brands' e-commerce sites were investigated during October 2013. For the analysis of clustering e-commerce site components and segmenting e-commerce sites of luxury brands, a hierarchical cluster analysis was applied through using the Ward's method and squared Euclidian distance for binary data. Further, Fisher's exact test was applied in order to distinguish three groups of characteristics in the luxury e-commerce sites. These analyses were carried out by SPSS 21. The result indicated that the components of e-commerce sites were grouped into three categories: basic elements, additional elements and elements of building brand identity. These components were categorized by whether their functions were basic and essential or additional and advanced. The other norm of categorization was related to brand identity. Furthermore, the luxury brands' e-commerce sites were segmented into three groups: a group of endeavoring to promote goods, a group of undistinguished performance, and a group of endeavoring to intensify brand identity. In this segmentation, brand identity or promotional aspects were decisive. Overall, luxury brands were trying to convey their traditional strength through their e-commerce sites. In order to achieve this purpose, brand identity or promotional aspects played an important role.

Lung Function Trajectory Types in Never-Smoking Adults With Asthma: Clinical Features and Inflammatory Patterns

  • Kim, Joo-Hee;Chang, Hun Soo;Shin, Seung Woo;Baek, Dong Gyu;Son, Ji-Hye;Park, Choon-Sik;Park, Jong-Sook
    • Allergy, Asthma & Immunology Research
    • /
    • v.10 no.6
    • /
    • pp.614-627
    • /
    • 2018
  • Purpose: Asthma is a heterogeneous disease that responds to medications to varying degrees. Cluster analyses have identified several phenotypes and variables related to fixed airway obstruction; however, few longitudinal studies of lung function have been performed on adult asthmatics. We investigated clinical, demographic, and inflammatory factors related to persistent airflow limitation based on lung function trajectories over 1 year. Methods: Serial post-bronchodilator forced expiratory volume (FEV) 1% values were obtained from 1,679 asthmatics who were followed up every 3 months for 1 year. First, a hierarchical cluster analysis was performed using Ward's method to generate a dendrogram for the optimum number of clusters using the complete post-FEV1 sets from 448 subjects. Then, a trajectory cluster analysis of serial post-FEV1 sets was performed using the k-means clustering for the longitudinal data trajectory method. Next, trajectory clustering for the serial post-FEV1 sets of a total of 1,679 asthmatics was performed after imputation of missing post-FEV1 values using regression methods. Results: Trajectories 1 and 2 were associated with normal lung function during the study period, and trajectory 3 was associated with a reversal to normal of the moderately decreased baseline FEV1 within 3 months. Trajectories 4 and 5 were associated with severe asthma with a marked reduction in baseline FEV1. However, the FEV1 associated with trajectory 4 was increased at 3 months, whereas the FEV1 associated with trajectory 5 was persistently disturbed over 1 year. Compared with trajectory 4, trajectory 5 was associated with older asthmatics with less atopy, a lower immunoglobulin E (IgE) level, sputum neutrophilia and higher dosages of oral steroids. In contrast, trajectory 4 was associated with higher sputum and blood eosinophil counts and more frequent exacerbations. Conclusions: Trajectory clustering analysis of FEV1 identified 5 distinct types, representing well-preserved to severely decreased FEV1. Persistent airflow obstruction may be related to non-atopy, a low IgE level, and older age accompanied by neutrophilic inflammation and low baseline FEV1 levels.