• Title/Summary/Keyword: data partition

Search Result 416, Processing Time 0.024 seconds

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

  • Lee, Gye Sung
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.46-53
    • /
    • 2019
  • When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

Performance Improvement of SAR Autofocus Based on Partition Processing (분할처리 기반 SAR 자동초점 기법의 성능 개선)

  • Shin, Hee-Sub;Ok, Jae-Woo;Kim, Jin-Woo;Lee, Jae-Min
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.28 no.7
    • /
    • pp.580-583
    • /
    • 2017
  • To compensate the degraded SAR image due to the residual errors and the spatial variant errors remaining after the motion compensation in the airborne SAR, we have introduced the autofocus method based on the partition processing. Thus, after we perform the spatial partition for the spotlight SAR data and the time partition for the stripmap SAR data, we reconstruct the subpatch images for the partitioned data. Then, we perform the local autofocus with the suitability analysis process for the phase errors estimated by the autofocus. Moreover, if the estimated phase errors are not properly compensated for the subpatch images, we perform the phase compensation method with the weight to the estimated phase error close to the degraded subpatch image to increase the SAR image quality.

Fuzzy Partitioning with Fuzzy Equalization Given Two Points and Partition Cardinality (두 점과 분할 카디날리티가 주어진 퍼지 균등화조건을 갖는 퍼지분할)

  • Kim, Kyeong-Taek;Kim, Chong-Su;Kang, Sung-Yeol
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.31 no.4
    • /
    • pp.140-145
    • /
    • 2008
  • Fuzzy partition is a conceptual vehicle that encapsulates data into information granules. Fuzzy equalization concerns a process of building information granules that are semantically and experimentally meaningful. A few algorithms generating fuzzy partitions with fuzzy equalization have been suggested. Simulations and experiments have showed that fuzzy partition representing more characteristics of given input distribution usually produces meaningful results. In this paper, given two points and cardinality of fuzzy partition, we prove that it is not true that there always exists a fuzzy partition with fuzzy equalization in which two of points having peaks fall on the given two points. Then, we establish an algorithm that minimizes the maximum distance between given two points and adjacent points having peaks in the partition. A numerical example is presented to show the validity of the suggested algorithm.

Bayesian analysis of random partition models with Laplace distribution

  • Kyung, Minjung
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.5
    • /
    • pp.457-480
    • /
    • 2017
  • We develop a random partition procedure based on a Dirichlet process prior with Laplace distribution. Gibbs sampling of a Laplace mixture of linear mixed regressions with a Dirichlet process is implemented as a random partition model when the number of clusters is unknown. Our approach provides simultaneous partitioning and parameter estimation with the computation of classification probabilities, unlike its counterparts. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo posterior computation. The proposed method is illustrated with simulated data and one real data of the energy efficiency of Tsanas and Xifara (Energy and Buildings, 49, 560-567, 2012).

Mining Quantitative Association Rules using Commercial Data Mining Tools (상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝)

  • Kang, Gong-Mi;Moon, Yang-Sae;Choi, Hun-Young;Kim, Jin-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.2
    • /
    • pp.97-111
    • /
    • 2008
  • Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.

Distortion Measurement based Dynamic Packet Scheduling of Video Stream over IEEE 802.11e WLANs

  • Wu, Minghu;Chen, Rui;Zhou, Shangli;Zhu, Xiuchang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.11
    • /
    • pp.2793-2803
    • /
    • 2013
  • In H.264, three different data partition types are used, which have unequal importance to the reconstructed video quality. To improve the performance of H.264 video streaming transmission over IEEE 802.11e Wireless Local Area Networks, a prioritization mechanism that categorizes different partition types to different priority classes according to the calculated distortion within one Group of Pictures. In the proposed scheme, video streams have been encoded based on the H.264 codec with its data partition enabled. The dynamic scheduling scheme based on Enhanced Distributed Channel Access has been configured to differentiate the data partitions according to their distortion impact and the queue utilization ratio. Simulation results show that the proposed scheme improves the received video quality by 1dB in PSNR compared with the existing Enhanced Distributed Channel Access static mapping scheme.

Designing a Distribution Network for Faster Delivery of Online Retailing : A Case Study in Bangkok, Thailand

  • Amchang, Chompoonut;Song, Sang-Hwa
    • The Journal of Industrial Distribution & Business
    • /
    • v.9 no.5
    • /
    • pp.25-35
    • /
    • 2018
  • Purpose - The purpose of this paper is to partition a last-mile delivery network into zones and to determine locations of last mile delivery centers (LMDCs) in Bangkok, Thailand. Research design, data, and methodology - As online shopping has become popular, parcel companies need to improve their delivery services as fast as possible. A network partition has been applied to evaluate suitable service areas by using METIS algorithm to solve this scenario and a facility location problem is used to address LMDC in a partitioned area. Research design, data, and methodology - Clustering and mixed integer programming algorithms are applied to partition the network and to locate facilities in the network. Results - Network partition improves last mile delivery service. METIS algorithm divided the area into 25 partitions by minimizing the inter-network links. To serve short-haul deliveries, this paper located 96 LMDCs in compact partitioning to satisfy customer demands. Conclusions -The computational results from the case study showed that the proposed two-phase algorithm with network partitioning and facility location can efficiently design a last-mile delivery network. It improves parcel delivery services when sending parcels to customers and reduces the overall delivery time. It is expected that the proposed two-phase approach can help parcel delivery companies minimize investment while providing faster delivery services.

A Network Partition Approach for MFD-Based Urban Transportation Network Model

  • Xu, Haitao;Zhang, Weiguo;zhuo, Zuozhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4483-4501
    • /
    • 2020
  • Recent findings identified the scatter and shape of MFD (macroscopic fundamental diagram) is heavily influenced by the spatial distribution of link density in a road network. This implies that the concept of MFD can be utilized to divide a heterogeneous road network with different degrees of congestion into multiple homogeneous subnetworks. Considering the actual traffic data is usually incomplete and inaccurate while most traffic partition algorithms rely on the completeness of the data, we proposed a three-step partitioned algorithm called Iso-MB (Isoperimetric algorithm - Merging - Boundary adjustment) permitting of incompletely input data in this paper. The proposed algorithm was implemented and verified in a simulated urban transportation network. The existence of well-defined MFD in each subnetwork was revealed and discussed and the selection of stop parameter in the isoperimetric algorithm was explained and dissected. The effectiveness of the approach to the missing input data was also demonstrated and elaborated.

A Communication and Computation Overlapping Model through Loop Sub-partitioning and Dynamic Scheduling in Data Parallel Programs (데이타 병렬 프로그램에서 루프 세부 분할 및 동적 스케쥴링을 통한 통신과 계산의 중첩 모델)

  • Kim, Jung-Hwan;Han, Sang-Yong;Cho, Seung-Ho;Kim, Heung-Hwan
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.1
    • /
    • pp.23-33
    • /
    • 2000
  • We propose a model which overlaps communication with computation for efficient communication in the data-parallel programming paradigm. The overlapping model divides a given loop partition into several sub-partitions to obtain computation which can be overlapped with communication. A loop partition sometimes refers to other data partitions, but not all iterations in the loop partition require non-local data. So, a loop partition may be divided into a set of loop iterations which require non-local data, and a set of loop iterations which do not. Each loop sub-partition is dynamically scheduled depending on associated message arrival, The experimental results for a few benchmarks in IBM SP2 show enhanced performance in our overlapping model.

  • PDF

A Development Study of The VPT for the improvement of Hadoop performance (하둡 성능 향상을 위한 VPT 개발 연구)

  • Yang, Ill Deung;Kim, Seong Ryeol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.9
    • /
    • pp.2029-2036
    • /
    • 2015
  • Hadoop MR(MapReduce) uses a partition function for passing the outputs of mappers to reducers. The partition function determines target reducers after calculating the hash-value from the key and performing mod-operation by reducer number. The legacy partition function doesn't divide the job effectively because it is so sensitive to key distribution. If the job isn't divided effectively then it can effect the total processing time of the job because some reducers need more time to process. This paper proposes the VPT(Virtual Partition Table) and has tested appling the VPT with a preponderance of data. The applied VPT improved three seconds on average and we figure it will improve more when data is increased.