• Title/Summary/Keyword: data distributions

Search Result 2,607, Processing Time 0.031 seconds

Power study for 2 × 2 factorial design in 4 × 4 latin square design (4 × 4 라틴방격모형 내 2 × 2 요인모형의 검정력 연구)

  • Choi, Young Hun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1195-1205
    • /
    • 2014
  • Compared with single design, powers of rank transformed statistic for testing main and interaction effects for $2{\times}2$ factorial in $4{\times}4$ latin square design are rapidly increased as effect size and replication size are increased. In general powers of rank transformed statistic are superior without regard to the diversified effect composition and the type of error distributions as nontesting factors are few and effect size are small. Powers of rank transformed statistic show much higher level than those of parametric statistic in exponential and double exponential distributions. Further powers of rank transformed statistic are very similar with those of parametric statistic in normal and uniform distributions.

GIS- Based Predictive Model for Measure of Environmental Pollutant (GIS를 이용한 환경오염의 예측 모델)

  • Lee, Ja-Won
    • Journal of the Korean association of regional geographers
    • /
    • v.14 no.2
    • /
    • pp.114-125
    • /
    • 2008
  • Colored dissolved organic matter(CDOM) is an important component of ocean color that can be used as an invaluable tool in water quality and ocean color studies. With the largest source of coastal CDOM appearing to be from freshwater discharge into the ocean, coastal predictive models will do much to refine our knowledge about major processes that control CDOM distributions in coastal waters and provide a better insight into the global carbon cycle. This study aims at developing a GIS-based watershed-scale predictive model of CDOM distributions in Neponset river watersheds that can be used to appraise our understanding of CDOM sources and distributions in coastal waters and predict the response of CDOM concentration to changes in land use patterns. Weighting factors are developed for CDOM freshwater sources after extensive groundtruthing from various landuse types in the watershed. This model makes use of a publicly available DEM(Digital elevation model) as the base data for analysis. Stream networks, discharge, and land use data are used from public repositories while sub- watershed delineation, pour-points, and land use parcels are generated using Spatial Analysis of ArcGIS 9.2 to estimate the CDOM loading from various sources to the lower tributaries of rivers. The Neponset Watershed in eastern Massachusetts is selected as the site for development of the model.

  • PDF

A Physical Design Method of Storage Structures for MOLAP Systems of Data Warehouse (데이터 웨어하우스의 다차원 온라인 분석처리 시스템을 위한 저장구조의 물리적 설계기법)

  • Lee Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.3
    • /
    • pp.297-312
    • /
    • 2005
  • Aggregation is an operation that plays a key role in multidimensional OLAP (MOLAP) systems of data warehouse. Existing aggregation operations in MOLAP have been proposed for file structures such as multidimensional arrays. These tile structures do not work well with skewed distributions. This paper presents a physical design methodology for storage structures ni MOLAP that use the multidimensional tile organizations adapting to a skewed distribution. In uniform data distribution, we first show that the performance of multidimensional analytical processing is highly affected by the similarity of the shapes between query regions and page regions in the domain space of the multidimensional file organizations. And than, in skewed distributions, we reflect the effect of data distributions on the design by using the shapes of the normalized query regions that are weighted with data density of those query regions. Finally, we demonstrate that the physical design methodology theoretically derived is indeed correct in real environments. In the two-dimensional file organizations, the results of experiments indicate that the performance of the proposed method is enhanced by more than seven times over the conventional method. We expect that the performance will be more enhanced when the dimensionality is more than two. The result confirms that the proposed physical design methodology is useful in a practical way.

  • PDF

Distribution of Photovoltaic Energy Including Topography Effect (지형 효과를 고려한 지표면 태양광 분포)

  • Jee, Joon-Bum;Zo, Il-Sung;Lee, Kyu-Tae;Choi, Young-Jean
    • Journal of the Korean earth science society
    • /
    • v.32 no.2
    • /
    • pp.190-199
    • /
    • 2011
  • A photovoltaic energy map that included a topography effect on the Korean peninsula was developed using the Gangneung-Wonju National University (GWNU) solar radiation model. The satellites data (MODIS, OMI and MTSAT-1R) and output data from the Regional Data Assimilation Prediction System (RDAPS) model by the Korea Meteorological Administration (KMA) were used as input data for the GWNU model. Photovoltaic energy distributions were calculated by applying high resolution Digital Elevation Model (DEM) to the topography effect. The distributions of monthly accumulated solar energy indicated that differences caused by the topography effect are more important in winter than in summer because of the dependency on the solar altitude angle. The topography effect on photovoltaic energy is two times larger with 1 km resolution than with 4 km resolution. Therefore, an accurate calculation of the solar energy on the surface requires high-resolution topological data as well as high quality input data.

Estimation of the joint conditional distribution for repeatedly measured bivariate cholesterol data using nonparametric copula (비모수적 코플라를 이용한 반복측정 이변량 자료의 조건부 결합 분포 추정)

  • Kwak, Minjung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.689-700
    • /
    • 2016
  • We study estimation and inference of the joint conditional distributions of bivariate longitudinal outcomes using regression models and copulas. For the estimation of marginal models we consider a class of time-varying transformation models and combine the two marginal models using nonparametric empirical copulas. Regression parameters in the transformation model can be obtained as the solution of estimating equations and our models and estimation method can be applied in many situations where the conditional mean-based models are not good enough. Nonparametric copulas combined with time-varying transformation models may allow quite flexible modeling for the joint conditional distributions for bivariate longitudinal data. We apply our method to an epidemiological study of repeatedly measured bivariate cholesterol data.

Estimation of Mean and Variance for $NH_3-N$ data of Puyeo Intake (부여 취수장의 $NH_3-N$자료에 대한 평균 및 분산추정)

  • Kim, Hyeong-Su;Jeong, Geon-Hui;Kim, Eung-Seok;Kim, Jung-Hun
    • Journal of Korea Water Resources Association
    • /
    • v.34 no.4
    • /
    • pp.357-364
    • /
    • 2001
  • Sometimes the observed data is too small to discriminate it from noise of the instrument. Say, the data can be recorded as below DL(Detection Level) value. Even though the data below Detection Level(BDL) is small vague, it can be resulted in wrong estimates for mean and variance. However, in practice, the BDL data is generally eliminated as N.D. (Not Detected) and do not record it in Korea. This study investigates the distributions according to the data values of ammonia concentration (NH$_3$-N) in Puyeo intake. Also we try to find out DL value and an appropriate method for the estimations of mean and variance of BDL values that can be discriminate the distributions. The DL is estimated by trial and error method. The appropriate method for the estimations of mean and variance of above the detection level(ADL)and BDL dada sets is selected, and the mean and variance are estimated. As a result, it is found that the Bias Corrected Maximum Likelihood Estimator is the most accurate method for NH$_3$-N in Puyeo intake.

  • PDF

Statistical Analysis of Electrical Tree Inception Voltage, Breakdown Voltage and Tree Breakdown Time Data of Unsaturated Polyester Resin

  • Ahmad, Mohd Hafizi;Bashir, Nouruddeen;Ahmad, Hussein;Piah, Mohamed Afendi Mohamed;Abdul-Malek, Zulkurnain;Yusof, Fadhilah
    • Journal of Electrical Engineering and Technology
    • /
    • v.8 no.4
    • /
    • pp.840-849
    • /
    • 2013
  • This paper presents a statistical approach to analyze electrical tree inception voltage, electrical tree breakdown voltage and tree breakdown time of unsaturated polyester resin subjected to AC voltage. The aim of this work was to show that Weibull and lognormal distribution may not be the most suitable distributions for analysis of electrical treeing data. In this paper, an investigation of statistical distributions of electrical tree inception voltage, electrical tree breakdown voltage and breakdown time data was performed on 108 leaf-like specimen samples. Revelations from the test results showed that Johnson SB distribution is the best fit for electrical tree inception voltage and tree breakdown time data while electrical tree breakdown voltage data is best suited with Wakeby distribution. The fitting step was performed by means of Anderson-Darling (AD) Goodness-of-fit test (GOF). Based on the fitting results of tree inception voltage, tree breakdown time and tree breakdown voltage data, Johnson SB and Wakeby exhibit the lowest error value respectively compared to Weibull and lognormal.

Comparison of Nano Particle Size Distributions by Different Measurement Techniques

  • Bae, Min-Suk;Oh, Joon-Seok
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.26 no.2
    • /
    • pp.219-233
    • /
    • 2010
  • Understanding the Nano size particles is of great interest due to their chemical and physical behaviors such as compositions, size distributions, and number concentrations. Therefore, accurate measurements of size distributions and number concentrations in ultrafine particles are getting required because expected losses such as diffusion for the instrument system from ambient inlet to detector are a significant challenge. In this study, the data using the computed settling losses, impaction losses, diffusion losses for the sampling lines (explored different sampling line diameters, horizontal length, number of bending, line angles, flow rates with and without a bypass), and diffusion losses for the Scanning Mobility Particle Sizers are examined. As expected, the settling losses and impaction losses are very minor under 100 nm, however, diffusion loss corrections for the sampling lines and the size instrument make a large difference for any measurement conditions with high numbers of particles smaller mobility size. Both with and without the loss corrections, which can affect to size distributions and number concentrations are described. First, 80% or more of the smallest particles (less than 10 nm) can be lost in the condition of a flow rate of 0.3 liter per minute and the length of sampling line of 1.0 m, second, total number concentrations of measurements are quite significantly affected, and the mode structure of the size distribution changes dramatically after the loss corrections applied. With compared to the different measurements, statistically diffusion loss corrections yield a required process of the ambient particle concentrations. Based on the current study, as an implication, a possibility of establishing direct revelation mechanisms is suggested.

A Non-linear Variant of Improved Robust Fuzzy PCA (잡음 민감성이 향상된 주성분 분석 기법의 비선형 변형)

  • Heo, Gyeong-Yong;Seo, Jin-Seok;Lee, Im-Geun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.4
    • /
    • pp.15-22
    • /
    • 2011
  • Principal component analysis (PCA) is a well-known method for dimensionality reduction and feature extraction while maintaining most of the variation in data. Although PCA has been applied in many areas successfully, it is sensitive to outliers and only valid for Gaussian distributions. Several variants of PCA have been proposed to resolve noise sensitivity and, among the variants, improved robust fuzzy PCA (RF-PCA2) demonstrated promising results. RF-PCA, however, is still a linear algorithm that cannot accommodate non-Gaussian distributions. In this paper, a non-linear algorithm that combines RF-PCA2 and kernel PCA (K-PCA), called improved robust kernel fuzzy PCA (RKF-PCA2), is introduced. The kernel methods make it to accommodate non-Gaussian distributions. RKF-PCA2 inherits noise robustness from RF-PCA2 and non-linearity from K-PCA. RKF-PCA2 outperforms previous methods in handling non-Gaussian distributions in a noise robust way. Experimental results also support this.

Applying Conventional and Saturated Generalized Gamma Distributions in Parametric Survival Analysis of Breast Cancer

  • Yavari, Parvin;Abadi, Alireza;Amanpour, Farzaneh;Bajdik, Chris
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.5
    • /
    • pp.1829-1831
    • /
    • 2012
  • Background: The generalized gamma distribution statistics constitute an extensive family that contains nearly all of the most commonly used distributions including the exponential, Weibull and log normal. A saturated version of the model allows covariates having effects through all the parameters of survival time distribution. Accelerated failure-time models assume that only one parameter of the distribution depends on the covariates. Methods: We fitted both the conventional GG model and the saturated form for each of its members including the Weibull and lognormal distribution; and compared them using likelihood ratios. To compare the selected parameter distribution with log logistic distribution which is a famous distribution in survival analysis that is not included in generalized gamma family, we used the Akaike information criterion (AIC; r=l(b)-2p). All models were fitted using data for 369 women age 50 years or more, diagnosed with stage IV breast cancer in BC during 1990-1999 and followed to 2010. Results: In both conventional and saturated parametric models, the lognormal was the best candidate among the GG family members; also, the lognormal fitted better than log-logistic distribution. By the conventional GG model, the variables "surgery", "radiotherapy", "hormone therapy", "erposneg" and interaction between "hormone therapy" and "erposneg" are significant. In the AFT model, we estimated the relative time for these variables. By the saturated GG model, similar significant variables are selected. Estimating the relative times in different percentiles of extended model illustrate the pattern in which the relative survival time change during the time. Conclusions: The advantage of using the generalized gamma distribution is that it facilitates estimating a model with improved fit over the standard Weibull or lognormal distributions. Alternatively, the generalized F family of distributions might be considered, of which the generalized gamma distribution is a member and also includes the commonly used log-logistic distribution.