• 제목/요약/키워드: Skewed Data

검색결과 203건 처리시간 0.023초

Spatial Prediction Based on the Bayesian Kriging with Box-Cox Transformation

  • Choi, Jung-Soon;Park, Man-Sik
    • Communications for Statistical Applications and Methods
    • /
    • 제16권5호
    • /
    • pp.851-858
    • /
    • 2009
  • In the last decades, there has been much interest in climate variability because its change has dramatic effects on humanity. Especially, the precipitation data are measured over space and their spatial association is so complicated. So we should take into account such a spatial dependency structure while analyzing the data. However, in linear models for analyzing the data, data sets show severely skewed distribution. In the paper, we consider the Box-Cox transformation to satisfy the normal distribution prior to the analysis, and employ a Bayesian hierarchical framework to investigate the spatial patterns. The data set we considered is monthly average precipitation of the third quarter of 2007 obtained from 347 automated monitoring stations in Contiguous South Korea.

ROBUST MEASURES OF LOCATION IN WATER-QUALITY DATA

  • Kim, Kyung-Sub;Kim, Bom-Chul;Kim, Jin-Hong
    • Water Engineering Research
    • /
    • 제3권3호
    • /
    • pp.195-202
    • /
    • 2002
  • The mean is generally used as a point estimator in water-quality data. Unfortunately, the nonnormal and skewed distributions of data hinder the direct application of the mean, which is inappropriate statistics in this case. The use of robust statistics such as L, M, and R-estimators are recommended and become more efficient. The median (L-estimator), the biweight (M-estimator), and the Hodges-Lehmann method (R-estimator) are briefly introduced and applied in this paper. From the actual data analyses, it is known that the median does not guarantee robustness for a small number of data sets, and robust measures of location or the arithmetic mean without outliers are highly recommended if the distribution has tails or outliers. Care must be taken to measure the location because water quality level within a water body can change depending on the selected point estimator.

  • PDF

Black Hispanic and Black Non-Hispanic Breast Cancer Survival Data Analysis with Half-normal Model Application

  • Khan, Hafiz Mohammad Rafiqullah;Saxena, Anshul;Vera, Veronica;Abdool-Ghany, Faheema;Gabbidon, Kemesha;Perea, Nancy;Stewart, Tiffanie Shauna-Jeanne;Ramamoorthy, Venkataraghavan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권21호
    • /
    • pp.9453-9458
    • /
    • 2014
  • Background: Breast cancer is the second leading cause of cancer death for women in the United States. Differences in survival of breast cancer have been noted among racial and ethnic groups, but the reasons for these disparities remain unclear. This study presents the characteristics and the survival curve of two racial and ethnic groups and evaluates the effects of race on survival times by measuring the lifetime data-based half-normal model. Materials and Methods: The distributions among racial and ethnic groups are compared using female breast cancer patients from nine states in the country all taken from the National Cancer Institute's Surveillance, Epidemiology, and End Results cancer registry. The main end points observed are: age at diagnosis, survival time in months, and marital status. The right skewed half-normal statistical probability model is used to show the differences in the survival times between black Hispanic (BH) and black non-Hispanic (BNH) female breast cancer patients. The Kaplan-Meier and Cox proportional hazard ratio are used to estimate and compare the relative risk of death in two minority groups, BH and BNH. Results: A probability random sample method was used to select representative samples from BNH and BH female breast cancer patients, who were diagnosed during the years of 1973-2009 in the United States. The sample contained 1,000 BNH and 298 BH female breast cancer patients. The median age at diagnosis was 57.75 years among BNH and 54.11 years among BH. The results of the half-normal model showed that the survival times formed positive skewed models with higher variability in BNH compared with BH. The Kaplan-Meir estimate was used to plot the survival curves for cancer patients; this test was positively skewed. The Kaplan-Meier and Cox proportional hazard ratio for survival analysis showed that BNH had a significantly longer survival time as compared to BH which is consistent with the results of the half-normal model. Conclusions: The findings with the proposed model strategy will assist in the healthcare field to measure future outcomes for BH and BNH, given their past history and conditions. These findings may provide an enhanced and improved outlook for the diagnosis and treatment of breast cancer patients in the United States.

Validation Comparison of Credit Rating Models Using Box-Cox Transformation

  • Hong, Chong-Sun;Choi, Jeong-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권3호
    • /
    • pp.789-800
    • /
    • 2008
  • Current credit evaluation models based on financial data make use of smoothing estimated default ratios which are transformed from each financial variable. In this work, some problems of the credit evaluation models developed by financial experts are discussed and we propose improved credit evaluation models based on the stepwise variable selection method and Box-Cox transformed data whose distribution is much skewed to the right. After comparing goodness-of-fit tests of these models, the validation of the credit evaluation models using statistical methods such as the stepwise variable selection method and Box-Cox transformation function is explained.

  • PDF

An Estimation of VaR in Stock Markets Using Transformations

  • Yeo, In-Kwon;Jeong, Choo-Mi
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.567-580
    • /
    • 2005
  • It is usually assumed that asset returns in the stock market are normally distributed. However, analyses of real data show that the distribution tends to be skewed and to have heavier tails than those of the normal distribution. In this paper, we investigate the method of estimating the value at risk(VaR) of stock returns. The VaR is computed by using the transformation and back-transformation method. The analysis of KOSPI and KOSDAQ data shows that the proposed estimation outperformed that under the normal assumption.

  • PDF

개별 관측치에서 지수변환을 이용한 EWMA 관리도 적용기법 (EWMA chart Application using the Transformation of the Exponential with Individual Observations)

  • 지선수
    • 산업경영시스템학회지
    • /
    • 제22권52호
    • /
    • pp.337-345
    • /
    • 1999
  • The long-tailed, positively skewed exponential distribution can be made into an almost symmetric distribution by taking the exponent of the data. In these situations, to use the traditional shewhart control limits on an individuals chart would be impractical and inconvenient. The transformed data, approximately bell-shaped, can be plotted conveniently on the individuals chart and exponentially weighted moving average chart. In this paper, using modifying statistics with transformed exponential of the data, we give a method for constructing control charts. Selecting method of exponent for individual chart is evaluated. And consider that smaller weight being assigned to the older data as time process and properties and taking method of exponent($\theta$), weighting factor($\alpha$) are suggested. Our recommendation, on the basis result of simulation, is practical method for EWMA chart.

  • PDF

A spatial heterogeneity mixed model with skew-elliptical distributions

  • Farzammehr, Mohadeseh Alsadat;McLachlan, Geoffrey J.
    • Communications for Statistical Applications and Methods
    • /
    • 제29권3호
    • /
    • pp.373-391
    • /
    • 2022
  • The distribution of observations in most econometric studies with spatial heterogeneity is skewed. Usually, a single transformation of the data is used to approximate normality and to model the transformed data with a normal assumption. This assumption is however not always appropriate due to the fact that panel data often exhibit non-normal characteristics. In this work, the normality assumption is relaxed in spatial mixed models, allowing for spatial heterogeneity. An inference procedure based on Bayesian mixed modeling is carried out with a multivariate skew-elliptical distribution, which includes the skew-t, skew-normal, student-t, and normal distributions as special cases. The methodology is illustrated through a simulation study and according to the empirical literature, we fit our models to non-life insurance consumption observed between 1998 and 2002 across a spatial panel of 103 Italian provinces in order to determine its determinants. Analyzing the posterior distribution of some parameters and comparing various model comparison criteria indicate the proposed model to be superior to conventional ones.

영상검지 카메라를 이용한 도로상의 차량흐름 계측방안 연구 (The Development of Camera Detection System for the Measurement Road Traffic Data)

  • 김희식;김진만
    • 한국안전학회지
    • /
    • 제18권4호
    • /
    • pp.23-27
    • /
    • 2003
  • To improve the road transportation safety, the road traffic data is monitored by applying an image detection system. The road traffic safety is analysed using image processing techniques. For more accurate measurement, the coordinate matching of real road data to image is one of the most essential parts of the image detection technique. The road image is skewed at the input screen, because the video camera is installed at the roadside. A fast and precise algorithm for the coordinate matching is developed to convert image coordinates into road coordinates.

Linear regression under log-concave and Gaussian scale mixture errors: comparative study

  • Kim, Sunyul;Seo, Byungtae
    • Communications for Statistical Applications and Methods
    • /
    • 제25권6호
    • /
    • pp.633-645
    • /
    • 2018
  • Gaussian error distributions are a common choice in traditional regression models for the maximum likelihood (ML) method. However, this distributional assumption is often suspicious especially when the error distribution is skewed or has heavy tails. In both cases, the ML method under normality could break down or lose efficiency. In this paper, we consider the log-concave and Gaussian scale mixture distributions for error distributions. For the log-concave errors, we propose to use a smoothed maximum likelihood estimator for stable and faster computation. Based on this, we perform comparative simulation studies to see the performance of coefficient estimates under normal, Gaussian scale mixture, and log-concave errors. In addition, we also consider real data analysis using Stack loss plant data and Korean labor and income panel data.

Efficient Continuous Skyline Query Processing Scheme over Large Dynamic Data Sets

  • Li, He;Yoo, Jaesoo
    • ETRI Journal
    • /
    • 제38권6호
    • /
    • pp.1197-1206
    • /
    • 2016
  • Performing continuous skyline queries of dynamic data sets is now more challenging as the sizes of data sets increase and as they become more volatile due to the increase in dynamic updates. Although previous work proposed support for such queries, their efficiency was restricted to small data sets or uniformly distributed data sets. In a production database with many concurrent queries, the execution of continuous skyline queries impacts query performance due to update requirements to acquire exclusive locks, possibly blocking other query threads. Thus, the computational costs increase. In order to minimize computational requirements, we propose a method based on a multi-layer grid structure. First, relational data object, elements of an initial data set, are processed to obtain the corresponding multi-layer grid structure and the skyline influence regions over the data. Then, the dynamic data are processed only when they are identified within the skyline influence regions. Therefore, a large amount of computation can be pruned by adopting the proposed multi-layer grid structure. Using a variety of datasets, the performance evaluation confirms the efficiency of the proposed method.