• Title/Summary/Keyword: histogram data

Search Result 488, Processing Time 0.018 seconds

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

Piecewise Continuous Linear Density Estimator

  • Jang, Dae-Heung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.959-968
    • /
    • 2005
  • The piecewise linear histogram can be used as a simple and efficient tool for the density estimator. But, this piecewise linear histogram is discontinuous function. We suppose the piecewise continuous linear histogram as a simple and efficient tool for the density estimator and the alternative of the piecewise linear histogram.

  • PDF

Reversible Data Hiding Scheme Based on Maximum Histogram Gap of Image Blocks

  • Arabzadeh, Mohammad;Rahimi, Mohammad Reza
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.8
    • /
    • pp.1964-1981
    • /
    • 2012
  • In this paper a reversible data hiding scheme based on histogram shifting of host image blocks is presented. This method attempts to use full available capacity for data embedding by dividing the image into non-overlapping blocks. Applying histogram shifting to each block requires that extra information to be saved as overhead data for each block. This extra information (overhead or bookkeeping information) is used in order to extract payload and recover the block to its original state. A method to eliminate the need for this extra information is also introduced. This method uses maximum gap that exists between histogram bins for finding the value of pixels that was used for embedding in sender side. Experimental results show that the proposed method provides higher embedding capacity than the original reversible data hiding based on histogram shifting method and its improved versions in the current literature while it maintains the quality of marked image at an acceptable level.

A Novel Filter ed Bi-Histogram Equalization Method

  • Sengee, Nyamlkhagva;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.6
    • /
    • pp.691-700
    • /
    • 2015
  • Here, we present a new framework for histogram equalization in which both local and global contrasts are enhanced using neighborhood metrics. When checking neighborhood information, filters can simultaneously improve image quality. Filters are chosen depending on image properties, such as noise removal and smoothing. Our experimental results confirmed that this does not increase the computational cost because the filtering process is done by our proposed arrangement of making the histogram while checking neighborhood metrics simultaneously. If the two methods, i.e., histogram equalization and filtering, are performed sequentially, the first method uses the original image data and next method uses the data altered by the first. With combined histogram equalization and filtering, the original data can be used for both methods. The proposed method is fully automated and any spatial neighborhood filter type and size can be used. Our experiments confirmed that the proposed method is more effective than other similar techniques reported previously.

Fuzzy histogram in estimating loss distributions for operational risk (운영 위험 관련 손실 분포 - 퍼지 히스토그램의 효과)

  • Pak, Ro-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.705-712
    • /
    • 2009
  • Histogram is the oldest and most widely used density estimator for presentation and exploration of observed univariate data. The structure of a histogram really depends on the number of bins and the width of the bins, so that slight changes on bins can produce totally different shape of a histogram. In order to solve this problem the fuzzy histogram was introduced and the result was good enough (Loquin and Strauss, 2008). In particular, when estimating loss distribution related with operational risk a histogram has been widely used. In this article, instead of an ordinary histogram we try to use a fuzzy histogram for estimating loss distribution and show that a fuzzy histogram provide more stable results.

  • PDF

Extensions of Histogram Construction Algorithms for Interval Data (구간 데이타에 대한 히스토그램 구축 알고리즘의 확장)

  • Lee, Ho-Seok;Shim, Kyu-Seok;Yi, Byoung-Kee
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.369-377
    • /
    • 2007
  • Histogram is one of tools that efficiently summarize data, and it is widely used for selectivity estimation and approximate query answering. Existing histogram construction algorithms are applicable to point data represented by a set of values. As often as point data, we can meet interval data such as daily temperature and daily stock prices. In this paper, we thus propose the histogram construction algorithms for interval data by extending several methods used in existing histogram construction algorithms. Our experiment results, using synthetic data, show our algorithms outperform naive extension of existing algorithms.

Double monothetic clustering for histogram-valued data

  • Kim, Jaejik;Billard, L.
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.263-274
    • /
    • 2018
  • One of the common issues in large dataset analyses is to detect and construct homogeneous groups of objects in those datasets. This is typically done by some form of clustering technique. In this study, we present a divisive hierarchical clustering method for two monothetic characteristics of histogram data. Unlike classical data points, a histogram has internal variation of itself as well as location information. However, to find the optimal bipartition, existing divisive monothetic clustering methods for histogram data consider only location information as a monothetic characteristic and they cannot distinguish histograms with the same location but different internal variations. Thus, a divisive clustering method considering both location and internal variation of histograms is proposed in this study. The method has an advantage in interpreting clustering outcomes by providing binary questions for each split. The proposed clustering method is verified through a simulation study and applied to a large U.S. house property value dataset.

Application of Zero-Inflated Poisson Distribution to Utilize Government Quality Assurance Activity Data (정부 품질보증활동 데이터 활용을 위한 Zero-Inflated 포아송 분포 적용)

  • Kim, JH;Lee, CW
    • Journal of Korean Society for Quality Management
    • /
    • v.46 no.3
    • /
    • pp.509-522
    • /
    • 2018
  • Purpose: The purpose of this study was to propose more accurate mathematical model which can represent result of government quality assurance activity, especially corrective action and flaw. Methods: The collected data during government quality assurance activity was represented through histogram. To find out which distributions (Poisson distribution, Zero-Inflated Poisson distribution) could represent the histogram better, this study applied Pearson's correlation coefficient. Results: The result of this study is as follows; Histogram of corrective action during past 3 years and Zero-Inflated Poisson distribution had strong relationship that their correlation coefficients was over 0.94. Flaw data could not re-parameterize to Zero-Inflated Poisson distribution because its frequency of flaw occurrence was too small. However, histogram of flaw data during past 3 years and Poisson distribution showed strong relationship that their correlation coefficients was 0.99. Conclusion: Zero-Inflated Poisson distribution represented better than Poisson distribution to demonstrate corrective action histogram. However, in the case of flaw data histogram, Poisson distribution was more accurate than Zero-Inflated Poisson distribution.

Histogram Equalization Using Background Speakers' Utterances for Speaker Identification (화자 식별에서의 배경화자데이터를 이용한 히스토그램 등화 기법)

  • Kim, Myung-Jae;Yang, Il-Ho;So, Byung-Min;Kim, Min-Seok;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.79-86
    • /
    • 2012
  • In this paper, we propose a novel approach to improve histogram equalization for speaker identification. Our method collects all speech features of UBM training data to make a reference distribution. The ranks of the feature vectors are calculated in the sorted list of the collection of the UBM training data and the test data. We use the ranks to perform order-based histogram equalization. The proposed method improves the accuracy of the speaker recognition system with short utterances. We use four kinds of speech databases to evaluate the proposed speaker recognition system and compare the system with cepstral mean normalization (CMN), mean and variance normalization (MVN), and histogram equalization (HEQ). Our system reduced the relative error rate by 33.3% from the baseline system.

Histogram-based Reversible Data Hiding Based on Pixel Differences with Prediction and Sorting

  • Chang, Ya-Fen;Tai, Wei-Liang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.12
    • /
    • pp.3100-3116
    • /
    • 2012
  • Reversible data hiding enables the embedding of messages in a host image without any loss of host content, which is proposed for image authentication that if the watermarked image is deemed authentic, we can revert it to the exact copy of the original image before the embedding occurred. In this paper, we present an improved histogram-based reversible data hiding scheme based on prediction and sorting. A rhombus prediction is employed to explore the prediction for histogram-based embedding. Sorting the prediction has a good influence on increasing the embedding capacity. Characteristics of the pixel difference are used to achieve large hiding capacity while keeping low distortion. The proposed scheme exploits a two-stage embedding strategy to solve the problem about communicating peak points. We also present a histogram shifting technique to prevent overflow and underflow. Performance comparisons with other existing reversible data hiding schemes are provided to demonstrate the superiority of the proposed scheme.