• Title/Summary/Keyword: multivariate normal distribution

Search Result 104, Processing Time 0.027 seconds

Multivariate empirical distribution functions and descriptive methods (다변량 경험분포함수와 시각적인 표현방법)

  • Hong, Chong Sun;Park, Jun;Park, Yong Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.1
    • /
    • pp.87-98
    • /
    • 2017
  • The multivaiate empirical distribution function (MEDF) is defined in this work. The MEDF's expectation and variance are derived and we have shown the MEDF converges to its real distribution function. Based on random samples from bivariate standard normal distribution with various correlation coefficients, we also obtain MEDFs and propose two kinds of graphical methods to visualize MEDFs on two dimensional plane. One is represented with at most n stairs with similar arguments as the step function, and the other is described with at most n curves which look like bivariate quantile vector. Even though these two descriptive methods could be expressed with three dimensional space, two dimensional representation is obtained with ease and it is enough to explain characteristics of bivariate distribution functions. Hence, it is possible to visualize trivariate empirical distribution functions with three dimensional quantile vectors. With bivariate and four variate illustrative examples, the proposed MEDFs descriptive plots are obtained and explored.

Other approaches to bivariate ranked set sampling

  • Al-Saleh, Mohammad Fraiwan;Alshboul, Hadeel Mohammad
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.283-296
    • /
    • 2018
  • Ranked set sampling, as introduced by McIntyre (Australian Journal of Agriculture Research, 3, 385-390, 1952), dealt with the estimation of the mean of one population. To deal with two or more variables, different forms of bivariate and multivariate ranked set sampling were suggested. For a technique to be useful, it should be easy to implement in practice. Bivariate ranked set sampling, as introduced by Al-Saleh and Zheng (Australian & New Zealand Journal of Statistics, 44, 221-232, 2002), is not easy to implement in practice, because it requires the judgment ranking of each of the combination of the order statistics of the two characteristics. This paper investigates two modifications that make the method easier to use. The first modification is based on ranking one variable and noting the rank of the other variable for one cycle, and do the reverse for another cycle. The second approach is based on ranking of one variable and giving the second variable the same rank (Concomitant Order Statistic) for one cycle and do the reverse for the other cycle. The two procedures are investigated for an estimation of the means of some well-known distributions. It is show that the suggested approaches can be used in practice and can be more efficient than using SRS. A real data set is used to illustrate the procedure.

Outlier Impact on the Power of Significance Test for Cronbach Alpha Reliability Coefficient

  • Yonghwan Um
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.5
    • /
    • pp.179-187
    • /
    • 2023
  • In this paper, we studied the impact of outliers on the power of the significance tests for Cronbach alpha reliability coefficient. Four variables were varied: sample size, the number of items, the number of outliers and population Cronbach Alpha levels. We simulated data using multivariate normal distribution and used outliers sampled from uniform distribution. To test the significance of Cronbach Alpha Reliability, parametric approach(F statistic) and permutation method were used. Consequently, we observed that the powers of permutation test are equal to or greater than those of F test under all conditions, and also both F test and permutation test lose the power as the number of outliers increases, and that these effects of outliers on the power are enhanced for increasing population alpha levels.

Statistical Assessment on the Heavy Metal Variation in the Soils around Abandoned Mine(Case Study for the Samgwang Mine) (폐광산지역 토양 중금속원소들에 대한 통계학적 환경오염 특성평가)

  • Cho, Il-Hyoung;Chun, Suk-Young;Chang, Soon-Woong
    • Journal of Environmental Science International
    • /
    • v.16 no.12
    • /
    • pp.1451-1462
    • /
    • 2007
  • Heavy metal concentrations in the soil were investigated for the abandoned Samkwang metal mine, Cheongyang-Gun, Chungnam Province, Korea. The concentrations of heavy metal(As, Cd, Cu, Ni, Pb, Zn) were determined in mine soils collected at the abandoned mine sites to obtain a general classification and specification of the pollution in this highly polluted region. The results estimated with the normal test and basis statistic on the central tendency and variation showed that the distribution of heavy metal concentration had significantly different at the range of all locations. The range of spatial distribution on the relationship of heavy metal concentration and pH was $4.8{\sim}8.8$ and heavy metal concentration on the type of land use was highest in forest land, and also Ni and Zn in farm and rice field showed the high concentration. The distribution of heavy metal concentration on the depth of a soil showed that the metal concentrations in subsoil were higher than of those in surface soil, while the concentration of Cu and Ni had no significant difference on the depth of soil. Results from the correlation analysis using the data except the extreme and unusual data revel that Zn-Cd(r=0.867), Zn-As(r=0.797), Zn-Pb(r=0.764), Cu-Cd(r=0.673), Cu-As(r=0.614) and Zn-Ni(r=0.605) were the most important parameters in assessing variations of heavy metal in soil. To discriminate pattern differences and similarities among samples, principal factor analysis(PFA) and cluster analysis(CF) were performed using a correlation matrix. This study suggests that PFA and CF techniques are useful tools for identification of important heavy metal and parameters. This study presents the necessity and usefulness of multivariate statistical assessment of complex databases in order to get better information about the quality of soil and gives the basis information to clean up the abandoned mine sites.

Knowledge Extraction from Affective Data using Rough Sets Model and Comparison between Rough Sets Theory and Statistical Method (러프집합이론을 중심으로 한 감성 지식 추출 및 통계분석과의 비교 연구)

  • Hong, Seung-Woo;Park, Jae-Kyu;Park, Sung-Joon;Jung, Eui-S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.631-637
    • /
    • 2010
  • The aim of affective engineering is to develop a new product by translating customer affections into design factors. Affective data have so far been analyzed using a multivariate statistical analysis, but the affective data do not always have linear features assumed under normal distribution. Rough sets model is an effective method for knowledge discovery under uncertainty, imprecision and fuzziness. Rough sets model is to deal with any type of data regardless of their linearity characteristics. Therefore, this study utilizes rough sets model to extract affective knowledge from affective data. Four types of scent alternatives and four types of sounds were designed and the experiment was performed to look into affective differences in subject's preference on air conditioner. Finally, the purpose of this study also is to extract knowledge from affective data using rough sets model and to figure out the relationships between rough sets based affective engineering method and statistical one. The result of a case study shows that the proposed approach can effectively extract affective knowledge from affective data and is able to discover the relationships between customer affections and design factors. This study also shows similar results between rough sets model and statistical method, but it can be made more valuable by comparing fuzzy theory, neural network and multivariate statistical methods.

Small Sample Characteristics of Generalized Estimating Equations for Categorical Repeated Measurements (범주형 반복측정자료를 위한 일반화 추정방정식의 소표본 특성)

  • 김동욱;김재직
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.297-310
    • /
    • 2002
  • Liang and Zeger proposed generalized estimating equations(GEE) for analyzing repeated data which is discrete or continuous. GEE model can be extended to model for repeated categorical data and its estimator has asymptotic multivariate normal distribution in large sample sizes. But GEE is based on large sample asymptotic theory. In this paper, we study the properties of GEE estimators for repeated ordinal data in small sample sizes. We generate ordinal repeated measurements for two groups using two methods. Through Monte Carlo simulation studies we investigate the empirical type 1 error rates, powers, relative efficiencies of the GEE estimators, the effect of unequal sample size of two groups, and the performance of variance estimators for polytomous ordinal response variables, especially in small sample sizes.

Empirical Evidence of Dynamic Conditional Correlation Between Asian Stock Markets and US Stock Indexes During COVID-19 Pandemic

  • TANTIPAIBOONWONG, Asidakarn;HONGSAKULVASU, Napon;SAIJAI, Worrawat
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.9
    • /
    • pp.143-154
    • /
    • 2021
  • This study aims to explore the dynamic conditional correlation (DCC) between ten Asian stock indexes, the US stock index, and Bitcoin by using the dynamic conditional correlation model. The time span of the daily data is between January 2015 to May 2021, the total observation is 1,116. DCC(1,1)-EGARCH(1,1) with multivariate t and normal distributions for the DCC and EGARCH models, respectively, outperforms other models by the goodness of fit values. Except for Bitcoin, we discovered that the majority of the securities' volatilities have a very high volatility persistence. Furthermore, the negative shocks/news have more impact on the volatilities than positive shocks/news in most of the cases, except the stock index of China and Bitcoin. Most of the correlation pairs exhibit higher correlation during the COVID-19 pandemic compared to the pre-COVID-19, except Hong Kong-The US and Malaysia-Indonesia. Moreover, the correlation between Asian stock indexes during the COVID-19 pandemic is statistically higher than the pre-COVID-19 pandemic. However, there are a few instances where the Hong Kong stock index and a few countries are identical. The result of correlation size shows the connectedness between Asian stock markets, which are well-connected within the region, especially with South Korea, Singapore, and Hong Kong.

A Study on the Training Optimization Using Genetic Algorithm -In case of Statistical Classification considering Normal Distribution- (유전자 알고리즘을 이용한 트레이닝 최적화 기법 연구 - 정규분포를 고려한 통계적 영상분류의 경우 -)

  • 어양담;조봉환;이용웅;김용일
    • Korean Journal of Remote Sensing
    • /
    • v.15 no.3
    • /
    • pp.195-208
    • /
    • 1999
  • In the classification of satellite images, the representative of training of classes is very important factor that affects the classification accuracy. Hence, in order to improve the classification accuracy, it is required to optimize pre-classification stage which determines classification parameters rather than to develop classifiers alone. In this study, the normality of training are calculated at the preclassification stage using SPOT XS and LANDSAT TM. A correlation coefficient of multivariate Q-Q plot with 5% significance level and a variance of initial training are considered as an object function of genetic algorithm in the training normalization process. As a result of normalization of training using the genetic algorithm, it was proved that, for the study area, the mean and variance of each class shifted to the population, and the result showed the possibility of prediction of the distribution of each class.

Analysis of extreme wind speed and precipitation using copula (코플라함수를 이용한 극단치 강풍과 강수 분석)

  • Kwon, Taeyong;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.797-810
    • /
    • 2017
  • The Korean peninsula is exposed to typhoons every year. Typhoons cause huge socioeconomic damage because tropical cyclones tend to occur with strong winds and heavy precipitation. In order to understand the complex dependence structure between strong winds and heavy precipitation, the copula links a set of univariate distributions to a multivariate distribution and has been actively studied in the field of hydrology. In this study, we carried out analysis using data of wind speed and precipitation collected from the weather stations in Busan and Jeju. Log-Normal, Gamma, and Weibull distributions were considered to explain marginal distributions of the copula. Kolmogorov-Smirnov, Cramer-von-Mises, and Anderson-Darling test statistics were employed for testing the goodness-of-fit of marginal distribution. Observed pseudo data were calculated through inverse transformation method for establishing the copula. Elliptical, archimedean, and extreme copula were considered to explain the dependence structure between strong winds and heavy precipitation. In selecting the best copula, we employed the Cramer-von-Mises test and cross-validation. In Busan, precipitation according to average wind speed followed t copula and precipitation just as maximum wind speed adopted Clayton copula. In Jeju, precipitation according to maximum wind speed complied Normal copula and average wind speed as stated in precipitation followed Frank copula and maximum wind speed according to precipitation observed Husler-Reiss copula.

The Risk Factors Related to Constipation in High School Students (고등학생 변비의 위험요인에 관한 연구)

  • Yoon, Yoon-Soo;Lee, Sok-Goo;Kim, Jeong-Yeon
    • Journal of agricultural medicine and community health
    • /
    • v.30 no.1
    • /
    • pp.15-28
    • /
    • 2005
  • Objectives: The purpose of this cross-sectional study was aimed to investigate the status of bowel health behaviors, prevalence of constipation and risk factors related to self-reported constipation in high school students. Methods: The study subjects were 1,882 students of six high schools located in a metropolitan city, who were selected by the accidental sampling from June to August, 2002. We analyzed the data by frequency analysis, chi-square test, and multivariate logistic regression using SPSS ver. 10.0. Results: The result of this study were summarized as follows: 1. A prevalence rate of self-reported constipation was 25.2%. A rate in male students was 13.4% and in female students 36.5%. 2. In regard to therapeutic behavior, 52.1% of study student with change in bowel habit had not find particular counsellor, 38.9% of the student had counselling with parents. 16.3% of students in constipation group had taken laxative medicine for treating the constipation. 73.5% of the student in constipation group had suffered from anal pain during defecation, but 48.0% in normal group. 41.6% of the student in constipation group had a experience of rectal bleeding after defecation, but 23.7% in normal group. So constipation related symptoms distribution had showed statistically significant difference between two group. 3. From the multivariate analysis by self-reported method, the risk factors related to the constipation were sex(female), experience of diet for weight reduction, absence of the breakfast and intake of vegetable more than 3 times per weeks. Conclusions: we had reconfirmed that we should improve eating habits to prevent and treat the constipation in a result of this study. Intervention that is target to girl students, abstain from weight reduction diet, regularity of taking meals, intake more vegetables, stress management should be provided to prevent the constipation especially in Korean high school students. Further prospective designed study are needed to establish the causal-effect relationship between so many risk factors with constipation.

  • PDF