• 제목/요약/키워드: Statistics data

검색결과 13,789건 처리시간 0.032초

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

확률강우량 산정을 위한 EDA 기법의 적용 (Application of EDA Techniques for Estimating Rainfall Quantiles)

  • 박현근;오세정;유철상
    • 대한토목학회논문집
    • /
    • 제29권4B호
    • /
    • pp.319-328
    • /
    • 2009
  • 본 연구에서는 자료의 구조를 이용하는 통계방법인 EDA 기법을 적용하여 자료를 정량화 하고, 이를 이용하여 빈도해석을 실시하였다. 모멘트법을 이용하는 전통적 방법이 극치값에 민감하게 반응하는 통계치를 주지만, EDA 기법은 변동이 적은 안정적인 통계치를 주는 장점이 있다. 빈도해석에 EDA 기법를 적용하는 경우에는 자료의 왜곡도를 반영하기 위해 원자료의 정규화 변환 및 역변환 과정을 거쳐야 한다. 즉, 원자료를 정규화 변환하고, EDA 기법을 적용하여 변환된 자료의 통계치를 추정하며, 이를 다시 역변환하여 원자료의 통계치를 결정해야 한다. 이렇게 결정된 통계치는 주어진 확률밀도함수를 이용한 빈도해석에 적용된다. 본 연구에서는 서울 및 포항지점의 연최대치 1시간 강우자료를 대상으로 분석을 수행하였다. 그 결과 EDA 기법을 적용하는 경우 극치값에 덜 민감한 안정적인 확률강우량의 산정이 가능한 것으로 확인되었다. 이러한 방법론은 특히 기후변화 등의 원인으로 강수자체의 경년변동이 매우 큰 지점의 빈도해석에 유용하게 사용될 수 있을 것이다.

데이터 마이닝의 수학적 배경과 교육방법론 (Mathematical Foundations and Educational Methodology of Data Mining)

  • 이승우
    • 한국수학사학회지
    • /
    • 제18권2호
    • /
    • pp.95-106
    • /
    • 2005
  • 본 논문에서는 수학을 기반으로 한 데이터베이스의 지식탐사 절차를 통하여 데이터의 선택, 정제, 통합, 변환, 축소, 데이터 마이닝 기법의 선택과 적용 및 모형의 평가에 관한 개념과 방법론을 소개하고 수학의 한 분야로서 통계학의 역할과 적용방법에 관하여 연구하고자 한다. 또한 오늘날 관심이 대상이 되고 있는 데이터 마이닝의 역사와 수학적 배경, 통계 및 정보 기술을 이용한 데이터 마이닝의 주요 모델링 기법, 실용적 응용 분야 및 적용 사례 그리고 데이터 마이닝과 통계의 차이점에 관하여 조사하고 논하고자 한다.

  • PDF

Estimations of Parameters in Multi-component Series Systems Using Masked Data

  • Sarhan Ammar M.;Abouammoh A.M.;Al-Ameri Mansour
    • International Journal of Reliability and Applications
    • /
    • 제7권1호
    • /
    • pp.41-53
    • /
    • 2006
  • The exact cause of the system's failure is often unknown in the masked system lifetime data. In such type of data, there are two observable quantities, namely (i) the systems time to failure and (ii) the set of systems components that contains the component, which might cause the system to fail. Our objective in this paper is to use the maximum likelihood procedure in the presence of masked data to make inference for the reliability of the system's components. We assume a multi-component series system where each component has a constant failure rate. Different cases that permit for closed form solutions of point estimates are considered. The results obtained in this paper generalize other published results.

  • PDF

Bayesian estimation in the generalized half logistic distribution under progressively type-II censoring

  • Kim, Yong-Ku;Kang, Suk-Bok;Se, Jung-In
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권5호
    • /
    • pp.977-989
    • /
    • 2011
  • The half logistic distribution has been used intensively in reliability and survival analysis especially when the data is censored. In this paper, we provide Bayesian estimation of the shape parameter and reliability function in the generalized half logistic distribution based on progressively Type-II censored data under various loss functions. We here consider conjugate prior and noninformative prior and corresponding posterior distributions are obtained. As an illustration, we examine the validity of our estimation using real data and simulated data.

Efficient Training Data Construction Scheme for Prediction of Transferring Students

  • Lee, Ji-Young;Song, Gyu-Moon;Kim, Tae-Yoon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권3호
    • /
    • pp.481-488
    • /
    • 2003
  • Kim et al.(2003) studied a prediction model for students likely to transfer. In their study they claim that a training data construction scheme is better than other schemes, which trains neural network on the data from the year right before prediction year. One problem with their claim is that it is based on rather high prediction error rate. In this paper we establish a more sound comparison for various training data construction schemes and check validity of their claim. It turns out that the favored scheme has sufficient advantages over other schemes.

  • PDF

Comprehensive comparison of normality tests: Empirical study using many different types of data

  • Lee, Chanmi;Park, Suhwi;Jeong, Jaesik
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권5호
    • /
    • pp.1399-1412
    • /
    • 2016
  • We compare many normality tests consisting of different sources of information extracted from the given data: Anderson-Darling test, Kolmogorov-Smirnov test, Cramervon Mises test, Shapiro-Wilk test, Shaprio-Francia test, Lilliefors, Jarque-Bera test, D'Agostino' D, Doornik-Hansen test, Energy test and Martinzez-Iglewicz test. For the purpose of comparison, those tests are applied to the various types of data generated from skewed distribution, unsymmetric distribution, and distribution with different length of support. We then summarize comparison results in terms of two things: type I error control and power. The selection of the best test depends on the shape of the distribution of the data, implying that there is no test which is the most powerful for all distributions.

Variance Estimation for Imputed Survey Data using Balanced Repeated Replication Method

  • Lee, Jun-Suk;Hong, Tae-Kyong;Namkung, Pyong
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.365-379
    • /
    • 2005
  • Balanced Repeated Replication(BRR) is widely used to estimate the variance of linear or nonlinear estimators from complex sampling surveys. Most of survey data sets include imputed missing values and treat the imputed values as observed data. But applying the standard BRR variance estimation formula for imputed data does not produce valid variance estimators. Shao, Chen and Chen(1998) proposed an adjusted BRR method by adjusting the imputed data to produce more accurate variance estimators. In this paper, another adjusted BRR method is proposed with examples of real data.

가속수명시험에 대한 적합도 검정에 관한 연구 (A Study on Goodness of Fit Test in Accelerated Life Tests)

  • 이우동;조건호
    • Journal of the Korean Data and Information Science Society
    • /
    • 제7권1호
    • /
    • pp.37-46
    • /
    • 1996
  • 계단충격가속수명시험에서 얻은 자료를 토대로 통계적 추론을 위해 가정하는 수명분포에 대한 적합도 검정을 Kolmogorov-Smirnov, Cramer-von Mises, Anderson-Darling과 같은 비모수적 검정통계량들을 이용한 검정절차를 제안하고, 각 통계량들을 검정력 측면에서 비교하고자 한다.

  • PDF

Credibility estimation via kernel mixed effects model

  • Shim, Joo-Yong;Kim, Tae-Yoon;Lee, Sang-Yeol;Hwa, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권2호
    • /
    • pp.445-452
    • /
    • 2009
  • Credibility models are actuarial tools to distribute premiums fairly among a heterogeneous group of policyholders. Many existing credibility models can be expressed as special cases of linear mixed effects models. In this paper we propose a nonlinear credibility regression model by reforming the linear mixed effects model through kernel machine. The proposed model can be seen as prediction method applicable in any setting where repeated measures are made for subjects with different risk levels. Experimental results are then presented which indicate the performance of the proposed estimating procedure.

  • PDF