• Title/Summary/Keyword: Skewed Data

Search Result 203, Processing Time 0.025 seconds

On the Estimation of Parameters in ALT under Generalized Exponential Distribution

  • Yoon, Sang-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.923-931
    • /
    • 2005
  • The two parameter generalized exponential distribution was recently introduced by Gupta and Kundu (1999). It is observed that the generalized exponential distribution can be used quite effectively to analyze skewed data set. This paper develops the accelerated life test model using generalized exponential distribution and considers maximum likelihood estimation of parameters under the tampered random variable model. To show the performance of proposed maximum likelihood estimates, some simulation will be performed. Using a real data set, an example will be given.

  • PDF

J-Tree: An Efficient Index using User Searching Patterns for Large Scale Data (J-tree : 사용자의 검색패턴을 이용한 대용량 데이타를 위한 효율적인 색인)

  • Jang, Su-Min;Seo, Kwang-Seok;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.44-49
    • /
    • 2009
  • In recent years, with the development of portable terminals, various searching services on large data have been provided in portable terminals. In order to search large data, most applications for information retrieval use indexes such as B-trees or R-trees. However, only a small portion of the data set is accessed by users, and the access frequencies of each data are not uniform. The existing indexes such as B-trees or R-trees do not consider the properties of the skewed access patterns. And a cache stores the frequently accessed data for fast access in memory. But the size of memory used in the cache is restricted. In this paper, we propose a new index based on disk, called J-tree, which considers user's search patterns. The proposed index is a balanced tree which guarantees uniform searching time on all data. It also supports fast searching time on the frequently accessed data. Our experiments show the effectiveness of our proposed index under various settings.

A Physical Design Method of Storage Structures for MOLAP Systems of Data Warehouse (데이터 웨어하우스의 다차원 온라인 분석처리 시스템을 위한 저장구조의 물리적 설계기법)

  • Lee Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.3
    • /
    • pp.297-312
    • /
    • 2005
  • Aggregation is an operation that plays a key role in multidimensional OLAP (MOLAP) systems of data warehouse. Existing aggregation operations in MOLAP have been proposed for file structures such as multidimensional arrays. These tile structures do not work well with skewed distributions. This paper presents a physical design methodology for storage structures ni MOLAP that use the multidimensional tile organizations adapting to a skewed distribution. In uniform data distribution, we first show that the performance of multidimensional analytical processing is highly affected by the similarity of the shapes between query regions and page regions in the domain space of the multidimensional file organizations. And than, in skewed distributions, we reflect the effect of data distributions on the design by using the shapes of the normalized query regions that are weighted with data density of those query regions. Finally, we demonstrate that the physical design methodology theoretically derived is indeed correct in real environments. In the two-dimensional file organizations, the results of experiments indicate that the performance of the proposed method is enhanced by more than seven times over the conventional method. We expect that the performance will be more enhanced when the dimensionality is more than two. The result confirms that the proposed physical design methodology is useful in a practical way.

  • PDF

Effective Parallel Hash Join Algorithm Based on Histoftam Equalization in the Presence of Data Skew (데이터 편재 하에서 히스토그램 변환기법에 기초한 효율적인 병렬 해쉬 결합 알고리즘)

  • Park, Ung-Gyu;Choe, Hwang-Gyu;Kim, Tak-Gon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.338-348
    • /
    • 1997
  • In this pater, we first propose a data distribution framework to resolve load imbalance and bucket oerflow in parallel hash join.Using the histogram equalization technique, the framework transforms a histogram of skewed data to the desired uniform distribution that corresponds to the relative computing power of node processors in the system.Next we propose an effcient parallel hash join algorithm for handing skwed data based on the proposed data distribution methodology.For performance comparison of our algorithm with other hash join algorithms.we perform similation experiments and actual exeution on COREDB database computer with 8-node hyperube architecture. In these experiments, skwed data distebution of the join atteibute is modeled using a Zipf-like distribution.The perfomance studies undicate that our algorithm outperforms other algorithms in the skewed cases.

  • PDF

The Approximate MLE in a Skew-Symmetric Laplace Distribution

  • Son, Hee-Ju;Woo, Jung-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.573-584
    • /
    • 2007
  • We define a skew-symmetric Laplace distribution by a symmetric Laplace distribution and evaluate its coefficient of skewness. And we derive an approximate maximum likelihood estimator(AME) and a moment estimator(MME) of a skewed parameter in a skew-symmetric Laplace distribution, and hence compare simulated mean squared errors of those estimators. We compare asymptotic mean squared errors of two defined estimators of reliability in two independent skew-symmetric distributions.

  • PDF

Development of a Multiobjective Optimization Algorithm Using Data Distribution Characteristics (데이터 분포특성을 이용한 다목적함수 최적화 알고리즘 개발)

  • Hwang, In-Jin;Park, Gyung-Jin
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.34 no.12
    • /
    • pp.1793-1803
    • /
    • 2010
  • The weighting method and goal programming require weighting factors or target values to obtain a Pareto optimal solution. However, it is difficult to define these parameters, and a Pareto solution is not guaranteed when the choice of the parameters is incorrect. Recently, the Mahalanobis Taguchi System (MTS) has been introduced to minimize the Mahalanobis distance (MD). However, the MTS method cannot obtain a Pareto optimal solution. We propose a function called the skewed Mahalanobis distance (SMD) to obtain a Pareto optimal solution while retaining the advantages of the MD. The SMD is a new distance scale that multiplies the skewed value of a design point by the MD. The weighting factors are automatically reflected when the SMD is calculated. The SMD always gives a unique Pareto optimal solution. To verify the efficiency of the SMD, we present two numerical examples and show that the SMD can obtain a unique Pareto optimal solution without any additional information.

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

Semiparametric Bayesian Hierarchical Selection Models with Skewed Elliptical Distribution (왜도 타원형 분포를 이용한 준모수적 계층적 선택 모형)

  • 정윤식;장정훈
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.101-115
    • /
    • 2003
  • Lately there has been much theoretical and applied interest in linear models with non-normal heavy tailed error distributions. Starting Zellner(1976)'s study, many authors have explored the consequences of non-normality and heavy-tailed error distributions. We consider hierarchical models including selection models under a skewed heavy-tailed e..o. distribution proposed originally by Chen, Dey and Shao(1999) and Branco and Dey(2001) with Dirichlet process prior(Ferguson, 1973) in order to use a meta-analysis. A general calss of skewed elliptical distribution is reviewed and developed. Also, we consider the detail computational scheme under skew normal and skew t distribution using MCMC method. Finally, we introduce one example from Johnson(1993)'s real data and apply our proposed methodology.

Optimal Welding Condition for the Inclined and Skewed Fillet Joints ill the Curved Block of a Ship (I) (선박 골블록의 경사 필렛 이음부의 적정 용접조건 (I))

  • PARK JU-YONG
    • Journal of Ocean Engineering and Technology
    • /
    • v.18 no.6 s.61
    • /
    • pp.79-83
    • /
    • 2004
  • The curved blocks which compose the bow and stem of a ship contain many skewed joints that are inclined horizontally and vertically. Most of these joints have a large fitness error and are continuously changing their form and are not easily accessible. The welding position and parameter values should be appropriately set in correspondence to the shape and the inclination of the joints. The welding parameters such as current, voltage, travel speed, and melting rate, are related to each other and their values must be in a specific limited range for the sound welding. These correlations and the ranges are dependent up on the kind and size of wire, shielding gas, joint shape and fitness. To determine these relationships, extensive welding experiments were performed. The experimental data were processed using several information processing technologies. The regression method was used to determine the relationship between current voltage, and deposition rate. When a joint is inclined, the weld bead should be confined to a the limited size, inorder to avoid undercut as well as overlap due to flowing down of molten metal by gravity. The dependency of the limited weld size which is defined as the critical deposited area on various factors such as the horizontally and vertically inclined angle of the joint, skewed angle of the joint, up or down welding direction and weaving was investigated through a number of welding experiments. On the basis of this result, an ANN system was developed to estimate the critical deposited area. The ANN system consists of a 4 layer structure and uses an error back propagation learning algorithm. The estimated values of the ANN were validated using experimental values.

Estimation on Altitudinal Spectrum of Suitability for Four Species of the Mayfly Genus Ephemera (Ephemeroptera: Ephemeridae) Using Probability Distribution Models (확률분포모형을 이용한 하루살이속(Ephemera) 4종의 고도구배에 따른 서식처적합도 평가)

  • Dongsoo Kong;Bomi Kang
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.4
    • /
    • pp.302-315
    • /
    • 2023
  • Distribution characteristics and altitudinal gradient of four species (E. strigata, E. separigata, E. orientalis-sachalinensis group) of the mayfly genus Ephemera (Order Ephemeroptera) were analyzed with probability distribution models (exponential, normal, lognormal, logistic, Weibull, gamma, beta, Gumbel). Data was collected from 23,846 sampling units of 6,787 sites in Korea from 2010 to 2021. The beta distribution model showed the best fit for positively skewed E. orientalis-sachalinensis and little-skewed E. strigata along with altitudinal gradient. The reversed lognormal distribution model showed the best-fit for negatively skewed E. separigata. E. orientalis-sachalinensis distributed at the range of altitude 1~700 m (mean 251 m, median 226 m, mode 124 m, and standard deviation 161 m), E. strigata distributed at the range of altitude 5~871 m (mean 474 m, median 478 m, mode 492 m, and standard deviation 200 m), E. separigata distributed at the range of altitude 7~846 m (mean 620 m, median 659 m, mode 760 m, and standard deviation 181 m). Altitudinal habitat suitability ranges were estimated to be 42~257 m for E. orientalis-sachalinensis, 335~644 m for E. strigata, and 641~824 m for E. separigata. Based on the altitudinal spectrum of suitability and altitude-related temperature analysis results, E. orientalis-sachalinensis was estimated to be thermophilic, E. strigata to be mesophilic, and E. separigata to be thermophobic. This is the first national-scale evaluation of the altitudinal distribution of Ephemera in Korea. These results will be used in a further research study on altitudinal shift of the species of Ephemera under climate change.