• Title/Summary/Keyword: Statistical data

Search Result 15,004, Processing Time 0.04 seconds

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

Knowledge Extraction from Affective Data using Rough Sets Model and Comparison between Rough Sets Theory and Statistical Method (러프집합이론을 중심으로 한 감성 지식 추출 및 통계분석과의 비교 연구)

  • Hong, Seung-Woo;Park, Jae-Kyu;Park, Sung-Joon;Jung, Eui-S.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.631-637
    • /
    • 2010
  • The aim of affective engineering is to develop a new product by translating customer affections into design factors. Affective data have so far been analyzed using a multivariate statistical analysis, but the affective data do not always have linear features assumed under normal distribution. Rough sets model is an effective method for knowledge discovery under uncertainty, imprecision and fuzziness. Rough sets model is to deal with any type of data regardless of their linearity characteristics. Therefore, this study utilizes rough sets model to extract affective knowledge from affective data. Four types of scent alternatives and four types of sounds were designed and the experiment was performed to look into affective differences in subject's preference on air conditioner. Finally, the purpose of this study also is to extract knowledge from affective data using rough sets model and to figure out the relationships between rough sets based affective engineering method and statistical one. The result of a case study shows that the proposed approach can effectively extract affective knowledge from affective data and is able to discover the relationships between customer affections and design factors. This study also shows similar results between rough sets model and statistical method, but it can be made more valuable by comparing fuzzy theory, neural network and multivariate statistical methods.

A GEE approach for the semiparametric accelerated lifetime model with multivariate interval-censored data

  • Maru Kim;Sangbum Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.389-402
    • /
    • 2023
  • Multivariate or clustered failure time data often occur in many medical, epidemiological, and socio-economic studies when survival data are collected from several research centers. If the data are periodically observed as in a longitudinal study, survival times are often subject to various types of interval-censoring, creating multivariate interval-censored data. Then, the event times of interest may be correlated among individuals who come from the same cluster. In this article, we propose a unified linear regression method for analyzing multivariate interval-censored data. We consider a semiparametric multivariate accelerated failure time model as a statistical analysis tool and develop a generalized Buckley-James method to make inferences by imputing interval-censored observations with their conditional mean values. Since the study population consists of several heterogeneous clusters, where the subjects in the same cluster may be related, we propose a generalized estimating equations approach to accommodate potential dependence in clusters. Our simulation results confirm that the proposed estimator is robust to misspecification of working covariance matrix and statistical efficiency can increase when the working covariance structure is close to the truth. The proposed method is applied to the dataset from a diabetic retinopathy study.

Development of a R function for visualizing statistical information on Google static maps (구글 지도에 통계정보를 표현하기 위한 R 함수 개발)

  • Han, Kyung-Soo;Park, Se-Jin;Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.5
    • /
    • pp.971-981
    • /
    • 2012
  • Google map has become one of the most recognized and comfortable means for providing statistical information of geographically referenced data. In this article, we introduce R functions to embed google map images on R interface and develop a function to represent statistical graphs such as bar graph, pie chart, and rectangle graph on a google map images.

Statistical Analysis of Bivariate Recurrent Event Data with Incomplete Observation Gaps

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.283-290
    • /
    • 2013
  • Subjects can experience two types of recurrent events in a longitudinal study. In addition, there may exist intermittent dropouts that results in repeated observation gaps during which no recurrent events are observed. Therefore, theses periods are regarded as non-risk status. In this paper, we consider a special case where information on the observation gap is incomplete, that is, the termination time of observation gap is not available while the starting time is known. For a statistical inference, incomplete termination time is incorporated in terms of interval-censored data and estimated with two approaches. A shared frailty effect is also employed for the association between two recurrent events. An EM algorithm is applied to recover unknown termination times as well as frailty effect. We apply the suggested method to young drivers' convictions data with several suspensions.

Three-Parameter Gamma Distribution and Its Significance in Structural Reliability

  • Zhao, Yan-Gang;Alfredo H-S. Ang
    • Computational Structural Engineering : An International Journal
    • /
    • v.2 no.1
    • /
    • pp.1-10
    • /
    • 2002
  • Information on the distribution of the basic random variables is essential for the accurate evaluation of structural reliability. The usual method for determining the distribution is to fit a candidate distribution to the histogram of available statistical data of the variable and perform appropriate goodness-of-fit tests. Generally, such candidate distributions would have two parameters that may be evaluated from the mean value and standard deviation of the statistical data. In the present paper, a-parameter Gamma distribution, whose parameters can be directly defined in terms of the mean value, standard deviation and skewness of available data, is suggested. The flexibility and advantages of the distribution in fitting statistical data and its significance in structural reliability evaluation are identified and discussed. Numerical examples are presented to demonstrate these advantages.

  • PDF

Radioactive waste sampling for characterisation - A Bayesian upgrade

  • Pyke, Caroline K.;Hiller, Peter J.;Koma, Yoshikazu;Ohki, Keiichi
    • Nuclear Engineering and Technology
    • /
    • v.54 no.1
    • /
    • pp.414-422
    • /
    • 2022
  • Presented in this paper is a methodology for combining a Bayesian statistical approach with Data Quality Objectives (a structured decision-making method) to provide increased levels of confidence in analytical data when approaching a waste boundary. Development of sampling and analysis plans for the characterisation of radioactive waste often use a simple, one pass statistical approach as underpinning for the sampling schedule. Using a Bayesian statistical approach introduces the concept of Prior information giving an adaptive sample strategy based on previous knowledge. This aligns more closely with the iterative approach demanded of the most commonly used structured decision-making tool in this area (Data Quality Objectives) and the potential to provide a more fully underpinned justification than the more traditional statistical approach. The approach described has been developed in a UK regulatory context but is translated to a waste stream from the Fukushima Daiichi Nuclear Power Station to demonstrate how the methodology can be applied in this context to support decision making regarding the ultimate disposal option for radioactive waste in a more global context.

Review on statistical methods for protecting privacy and measuring risk of disclosure when releasing information for public use (정보공개 환경에서 개인정보 보호와 노출 위험의 측정에 대한 통계적 방법)

  • Lee, Yonghee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1029-1041
    • /
    • 2013
  • Recently, along with emergence of big data, there are incresing demands for releasing information and micro data for public use so that protecting privacy and measuring risk of disclosure for released database become important issues in goverment and business sector as well as academic community. This paper reviews statistical methods for protecting privacy and measuring risk of disclosure when micro data or data analysis sever is released for public use.

Bayesian Inference for Predicting the Default Rate Using the Power Prior

  • Kim, Seong-W.;Son, Young-Sook;Choi, Sang-A
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.685-699
    • /
    • 2006
  • Commercial banks and other related areas have developed internal models to better quantify their financial risks. Since an appropriate credit risk model plays a very important role in the risk management at financial institutions, it needs more accurate model which forecasts the credit losses, and statistical inference on that model is required. In this paper, we propose a new method for estimating a default rate. It is a Bayesian approach using the power prior which allows for incorporating of historical data to estimate the default rate. Inference on current data could be more reliable if there exist similar data based on previous studies. Ibrahim and Chen (2000) utilize these data to characterize the power prior. It allows for incorporating of historical data to estimate the parameters in the models. We demonstrate our methodologies with a real data set regarding SOHO data and also perform a simulation study.

Dual Generalized Maximum Entropy Estimation for Panel Data Regression Models

  • Lee, Jaejun;Cheon, Sooyoung
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.5
    • /
    • pp.395-409
    • /
    • 2014
  • Data limited, partial, or incomplete are known as an ill-posed problem. If the data with ill-posed problems are analyzed by traditional statistical methods, the results obviously are not reliable and lead to erroneous interpretations. To overcome these problems, we propose a dual generalized maximum entropy (dual GME) estimator for panel data regression models based on an unconstrained dual Lagrange multiplier method. Monte Carlo simulations for panel data regression models with exogeneity, endogeneity, or/and collinearity show that the dual GME estimator outperforms several other estimators such as using least squares and instruments even in small samples. We believe that our dual GME procedure developed for the panel data regression framework will be useful to analyze ill-posed and endogenous data sets.