• Title/Summary/Keyword: Data Partition Algorithm

Search Result 128, Processing Time 0.021 seconds

Multidimensional scaling of categorical data using the partition method (분할법을 활용한 범주형자료의 다차원척도법)

  • Shin, Sang Min;Chun, Sun-Kyung;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.67-75
    • /
    • 2018
  • Multidimensional scaling (MDS) is an exploratory analysis of multivariate data to represent the dissimilarity among objects in the geometric low-dimensional space. However, a general MDS map only shows the information of objects without any information about variables. In this study, we used MDS based on the algorithm of Torgerson (Theory and Methods of Scaling, Wiley, 1958) to visualize some clusters of objects in categorical data. For this, we convert given data into a multiple indicator matrix. Additionally, we added the information of levels for each categorical variable on the MDS map by applying the partition method of Shin et al. (Korean Journal of Applied Statistics, 28, 1171-1180, 2015). Therefore, we can find information on the similarity among objects as well as find associations among categorical variables using the proposed MDS map.

A Cluster Validity Index Using Overlap and Separation Measures Between Fuzzy Clusters (클러스터간 중첩성과 분리성을 이용한 퍼지 분할의 평가 기법)

  • Kim, Dae-Won;Lee, Kwang-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.455-460
    • /
    • 2003
  • A new cluster validity index is proposed that determines the optimal partition and optimal number of clusters for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index exploits an overlap measure and a separation measure between clusters. The overlap measure is obtained by computing an inter-cluster overlap. The separation measure is obtained by computing a distance between fuzzy clusters. A good fuzzy partition is expected to have a low degree of overlap and a larger separation distance. Testing of the proposed index and nine previously formulated indexes on well-known data sets showed the superior effectiveness and reliability of the proposed index in comparison to other indexes.

Efficient Sphere Partition Method for Finding the Maximum Intersection of Spherical Convex Polygons (구 볼록 다각형들의 최대 교차를 찾기 위한 효율적인 구 분할 방식)

  • 하종성
    • Korean Journal of Computational Design and Engineering
    • /
    • v.6 no.2
    • /
    • pp.101-110
    • /
    • 2001
  • The maximum intersection of spherical convex polygons are to find spherical regions owned by the maximum number of the polygons, which is applicable for determining the feasibility in manufacturing problems such mould design and numerical controlled machining. In this paper, an efficient method for partitioning a sphere with the polygons into faces is presented for the maximum intersection. The maximum intersection is determined by examining the ownerships of partitioned faces, which represent how many polygons contain the faces. We take the approach of edge-based partition, in which, rather than the ownerships of faces, those of their edges are manipulated as the sphere is partitioned incrementally by each of the polygons. Finally, gathering the split edges with the maximum number of ownerships as the form of discrete data, we approximately obtain the centroids of all solution faces without constructing their boundaries. Our approach is analyzed to have an efficient time complexity Ο(nv), where n and v, respectively, are the numbers of polygons and all vertices. Futhermore, it is practical from the view of implementation since it can compute numerical values robustly and deal with all degenerate cases.

  • PDF

A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis (한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구)

  • Kang, Sun-Mee;Kwon, Oh-Il
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

On Color Cluster Analysis with Three-dimensional Fuzzy Color Ball

  • Kim, Dae-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.2
    • /
    • pp.262-267
    • /
    • 2008
  • The focus of this paper is on devising an efficient clustering task for arbitrary color data. In order to tackle this problem, the inherent uncertainty and vagueness of color are represented by a fuzzy color model. By taking a fuzzy approach to color representation, the proposed model makes a soft decision for the vague regions between neighboring colors. A definition on a three-dimensional fuzzy color ball is introduced, and the degree of membership of color is computed by employing a distance measure between a fuzzy color and color data. With the fuzzy color model, a novel fuzzy clustering algorithm for efficient partition of color data is developed.

A Restricted Partition Method to Detect Single Nucleotide Polymorphisms for a Carcass Trait in Hanwoo

  • Lee, Ji-Hong;Kim, Dong-Chul;Kim, Jong-Joo;Lee, Jea-Young
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.24 no.11
    • /
    • pp.1525-1528
    • /
    • 2011
  • The purpose of this study was to detect SNPs that were responsible for a carcass trait in Hanwoo populations. A non-parametric model applying a restricted partition method (RPM) was used, which exploited a partitioning algorithm considering statistical criteria for multiple comparison testing. Phenotypic and genotypic data were obtained from the Hanwoo Improvement Center, National Agricultural Cooperation Federation, Korea, in which the pedigree structure comprised 229 steers from 16 paternal half-sib proven sires that were born in Namwon or Daegwanryong livestock testing station between spring of 2002 and fall of 2003. A carcass trait, longissimus dorsi muscle area for each steer was measured after slaughter at approximately 722 days. Three SNPs (19_1, 18_4 and 28_2) near the microsatellite marker ILSTS035 on BTA6, around which the quantitative trait loci (QTL) for meat quality were previously detected, were used in this study. The RPM analyses resulted in two significant interaction effects between SNPs (19_1 and 18_4) and (19_1 and 28_2) at ${\alpha}$ = 0.05 level. However, under a general linear (parametric) model no interaction effect between any pair of the three SNPs was detected, while only one main effect for SNP19_1 was found for the trait. Also, under another non-parametric model using a multifactor dimensionality reduction (MDR) method, only one interaction effect of the two SNPs (19_1 and 28_2) explained the trait significantly better than the parametric model with the main effect of SNP19_1. Our results suggest that RPM is a good alternative to model choices that can find associations of the interaction effects of multiple SNPs for quantitative traits in livestock species.

The Optimal Partition of Initial Input Space for Fuzzy Neural System : Measure of Fuzziness (퍼지뉴럴 시스템을 위한 초기 입력공간분할의 최적화 : Measure of Fuzziness)

  • Baek, Deok-Soo;Park, In-Kue
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.39 no.3
    • /
    • pp.97-104
    • /
    • 2002
  • In this paper we describe the method which optimizes the partition of the input space by means of measure of fuzziness for fuzzy neural network. It covers its generation of fuzzy rules for input sub space. It verifies the performance of the system depended on the various time interval of the input. This method divides the input space into several fuzzy regions and assigns a degree of each of the generated rules for the partitioned subspaces from the given data using the Shannon function and fuzzy entropy function generating the optimal knowledge base without the irrelevant rules. In this scheme the basic idea of the fuzzy neural network is to realize the fuzzy rule base and the process of reasoning by neural network and to make the corresponding parameters of the fuzzy control rules be adapted by the steepest descent algorithm. According to the input interval the proposed inference procedure proves that the fast convergence of root mean square error (RMSE) owes to the optimal partition of the input space

Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning (그리드 분할에 의한 다차원 데이터 디클러스터링 성능 분석)

  • Kim, Hak-Cheol;Kim, Tae-Wan;Li, Ki-Joune
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1011-1020
    • /
    • 2004
  • A lot of work has been done to improve the I/O performance of such a system that store and manage a massive amount of data by distributing them across multiple disks and access them in parallel. Most of the previous work has focused on an efficient mapping from a grid ceil, which is determined bY the interval number of each dimension, to a disk number on the assumption that each dimension is split into disjoint intervals such that entire data space is GRID-like partitioned. However, they have ignored the effects of a GRID partitioning scheme on declustering performance. In this paper, we enhance the performance of mapping function based declustering algorithms by applying a good GRID par-titioning method. For this, we propose an estimation model to count the number of grid cells intersected by a range query and apply a GRID partitioning scheme which minimizes query result size among the possible schemes. While it is common to do binary partition for high-dimensional data, we choose less number of dimensions than needed for binary partition and split several times along that dimensions so that we can reduce the number of grid cells touched by a query. Several experimental results show that the proposed estimation model gives accuracy within 0.5% error ratio regardless of query size and dimension. We can also improve the performance of declustering algorithm based on mapping function, called Kronecker Sequence, which has been known to be the best among the mapping functions for high-dimensional data, up to 23 times by applying an efficient GRID partitioning scheme.

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.

Fast Algorithm for 360-degree Videos Based on the Prediction of Cu Depth Range and Fast Mode Decision

  • Zhang, Mengmeng;Zhang, Jing;Liu, Zhi;Mao, Fuqi;Yue, Wen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3165-3181
    • /
    • 2019
  • Spherical videos, which are also called 360-degree videos, have become increasingly popular due to the rapid development of virtual reality technology. However, the large amount of data in such videos is a huge challenge for existing transmission system. To use the existing encode framework, it should be converted into a 2D image plane by using a specific projection format, e.g. the equi-rectangular projection (ERP) format. The existing high-efficiency video coding standard (HEVC) can effectively compress video content, but its enormous computational complexity makes the time spent on compressing high-frame-rate and high-resolution 360-degree videos disproportionate to the benefits of compression. Focusing on the ERP format characteristics of 360-degree videos, this work develops a fast decision algorithm for predicting the coding unit depth interval and adaptive mode decision for intra prediction mode. The algorithm makes full use of the video characteristics of the ERP format by dealing with pole and equatorial areas separately. It sets different reference blocks and determination conditions according to the degree of stretching, which can reduce the coding time while ensuring the quality. Compared with the original reference software HM-16.16, the proposed algorithm can reduce time consumption by 39.3% in the all-intra configuration, and the BD-rate increases by only 0.84%.