• 제목/요약/키워드: conditional independence assumption

검색결과 10건 처리시간 0.028초

반복시행된 확률화 응답(RRD) 모형의 독립조건 (Independence Condition in the Repeated Randomized Response Models)

  • 이관제;국세정
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2000년도 추계학술발표회 논문집
    • /
    • pp.33-38
    • /
    • 2000
  • Krishnamoorphy and Raghavarao(1993) invented exact binomial and asymptotically normal test procedures for truthful answering in the repeated randomized response models under the assumption that two repeated response measures are independent. Under the same assumption, Lakshmi and Raghavarao(1992) suggested asymptotic chi-square test for respondents' truthful answering in the same models. In this article we detect the factors and the conditions with which two response variables might be independent, and find the condition for independence in the repeated randomized response models with considering untruthful answer. But, the condition of independence make the randomized model no meaning. Under the assumption of conditional independence between two response variables, we can apply the same logical statements on deriving the tests for truthful answering in the repeated randomized response models as in Krishnamoorphy and Raghavarao(1993).

  • PDF

Relation Based Bayesian Network for NBNN

  • Sun, Mingyang;Lee, YoonSeok;Yoon, Sung-eui
    • Journal of Computing Science and Engineering
    • /
    • 제9권4호
    • /
    • pp.204-213
    • /
    • 2015
  • Under the conditional independence assumption among local features, the Naive Bayes Nearest Neighbor (NBNN) classifier has been recently proposed and performs classification without any training or quantization phases. While the original NBNN shows high classification accuracy without adopting an explicit training phase, the conditional independence among local features is against the compositionality of objects indicating that different, but related parts of an object appear together. As a result, the assumption of the conditional independence weakens the accuracy of classification techniques based on NBNN. In this work, we look into this issue, and propose a novel Bayesian network for an NBNN based classification to consider the conditional dependence among features. To achieve our goal, we extract a high-level feature and its corresponding, multiple low-level features for each image patch. We then represent them based on a simple, two-level layered Bayesian network, and design its classification function considering our Bayesian network. To achieve low memory requirement and fast query-time performance, we further optimize our representation and classification function, named relation-based Bayesian network, by considering and representing the relationship between a high-level feature and its low-level features into a compact relation vector, whose dimensionality is the same as the number of low-level features, e.g., four elements in our tests. We have demonstrated the benefits of our method over the original NBNN and its recent improvement, and local NBNN in two different benchmarks. Our method shows improved accuracy, up to 27% against the tested methods. This high accuracy is mainly due to consideration of the conditional dependences between high-level and its corresponding low-level features.

Statistical micro matching using a multinomial logistic regression model for categorical data

  • Kim, Kangmin;Park, Mingue
    • Communications for Statistical Applications and Methods
    • /
    • 제26권5호
    • /
    • pp.507-517
    • /
    • 2019
  • Statistical matching is a method of combining multiple sources of data that are extracted or surveyed from the same population. It can be used in situation when variables of interest are not jointly observed. It is a low-cost way to expect high-effects in terms of being able to create synthetic data using existing sources. In this paper, we propose the several statistical micro matching methods using a multinomial logistic regression model when all variables of interest are categorical or categorized ones, which is common in sample survey. Under conditional independence assumption (CIA), a mixed statistical matching method, which is useful when auxiliary information is not available, is proposed. We also propose a statistical matching method with auxiliary information that reduces the bias of the conventional matching methods suggested under CIA. Through a simulation study, proposed micro matching methods and conventional ones are compared. Simulation study shows that suggested matching methods outperform the existing ones especially when CIA does not hold.

Generalized methods of moments in marginal models for longitudinal data with time-dependent covariates

  • Cho, Gyo-Young;Dashnyam, Oyunchimeg
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권4호
    • /
    • pp.877-883
    • /
    • 2013
  • The quadratic inference functions (QIF) method proposed by Qu et al. (2000) and the generalized method of moments (GMM) for marginal regression analysis of longitudinal data with time-dependent covariates proposed by Lai and Small (2007) both are the methods based on generalized method of moment (GMM) introduced by Hansen (1982) and both use generalized estimating equations (GEE). Lai and Small (2007) divided time-dependent covariates into three types such as: Type I, Type II and Type III. In this paper, we compared these methods in the case of Type II and Type III in which full covariates conditional mean assumption (FCCM) is violated and interested in whether they can improve the results of GEE with independence working correlation. We show that in the marginal regression model with Type II time-dependent covariates, GMM Type II of Lai and Small (2007) provides more ecient result than QIF and for the Type III time-dependent covariates, QIF with independence working correlation and GMM Type III methods provide the same results. Our simulation study showed the same results.

Quadratic inference functions in marginal models for longitudinal data with time-varying stochastic covariates

  • Cho, Gyo-Young;Dashnyam, Oyunchimeg
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권3호
    • /
    • pp.651-658
    • /
    • 2013
  • For the marginal model and generalized estimating equations (GEE) method there is important full covariates conditional mean (FCCM) assumption which is pointed out by Pepe and Anderson (1994). With longitudinal data with time-varying stochastic covariates, this assumption may not necessarily hold. If this assumption is violated, the biased estimates of regression coefficients may result. But if a diagonal working correlation matrix is used, irrespective of whether the assumption is violated, the resulting estimates are (nearly) unbiased (Pan et al., 2000).The quadratic inference functions (QIF) method proposed by Qu et al. (2000) is the method based on generalized method of moment (GMM) using GEE. The QIF yields a substantial improvement in efficiency for the estimator of ${\beta}$ when the working correlation is misspecified, and equal efficiency to the GEE when the working correlation is correct (Qu et al., 2000).In this paper, we interest in whether the QIF can improve the results of the GEE method in the case of FCCM is violated. We show that the QIF with exchangeable and AR(1) working correlation matrix cannot be consistent and asymptotically normal in this case. Also it may not be efficient than GEE with independence working correlation. Our simulation studies verify the result.

BAYESIAN CLASSIFICATION AND FREQUENT PATTERN MINING FOR APPLYING INTRUSION DETECTION

  • Lee, Heon-Gyu;Noh, Ki-Yong;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
    • /
    • pp.713-716
    • /
    • 2005
  • In this paper, in order to identify and recognize attack patterns, we propose a Bayesian classification using frequent patterns. In theory, Bayesian classifiers guarantee the minimum error rate compared to all other classifiers. However, in practice this is not always the case owing to inaccuracies in the unrealistic assumption{ class conditional independence) made for its use. Our method addresses the problem of attribute dependence by discovering frequent patterns. It generates frequent patterns using an efficient FP-growth approach. Since the volume of patterns produced can be large, we propose a pruning technique for selection only interesting patterns. Also, this method estimates the probability of a new case using different product approximations, where each product approximation assumes different independence of the attributes. Our experiments show that the proposed classifier achieves higher accuracy and is more efficient than other classifiers.

  • PDF

자전거 교통사고의 통계분석 및 활용 (Statistical analysis and its application of bicycle accidents)

  • 홍종선;김명진
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권6호
    • /
    • pp.1081-1090
    • /
    • 2010
  • 전세계뿐만 아니라 대한민국 정부에서도 저탄소 녹색성장을 실현하기 위해 노력하고 있으며, 그 방안의 하나로 자전거 수송분담율을 높이기 위하여 자전거도로와 자전거 편의시설 등을 확충하고 있다. 이러한 이유로 자전거 이용자들이 많이 증가하고 있는데 반면에 자전거 교통사고도 급증하는 추세이다. 자전거 교통사고 자료는 경찰청에서 발간하는 연감자료가 유일하나, 이 자료는 사고의 빈도수를 이차원 분할표 형태로 제공하고 있기때문에 일반인들이 이 자료를 통하여 사고의 성격과 위험율에 대하여 파악하고 이해하기 어렵다. 따라서 본 연구에서는 자전거사고 자료를 확률로 표현하고 자전거 인구와 자동차 등록대수 자료와 결합하여 자전거사고의 특징을 분석하고 파악한다. 또한 조건부 독립이라는 가정 하에 현실세계의 현상에서 자전거사고 발생확률을 구하는 방법을 제안하고, 이 방법을 활용하여 자전거사고를 줄일 수 있는 방안에 대해 논의한다.

Upgraded quadratic inference functions for longitudinal data with type II time-dependent covariates

  • Cho, Gyo-Young;Dashnyam, Oyunchimeg
    • Journal of the Korean Data and Information Science Society
    • /
    • 제25권1호
    • /
    • pp.211-218
    • /
    • 2014
  • Qu et. al. (2000) proposed the quadratic inference functions (QIF) method to marginal model analysis of longitudinal data to improve the generalized estimating equations (GEE). It yields a substantial improvement in efficiency for the estimators of regression parameters when the working correlation is misspecified. But for the longitudinal data with time-dependent covariates, when the implicit full covariates conditional mean (FCCM) assumption is violated, the QIF can not provide more consistent and efficient estimator than GEE (Cho and Dashnyam, 2013). Lai and Small (2007) divided time-dependent covariates into three types and proposed generalized method of moment (GMM) for longitudinal data with time-dependent covariates. They showed that their GMM type II and GMM moment selection methods can be more ecient than GEE with independence working correlation (GEE-ind) in the case of type II time-dependent covariates. We develop upgraded QIF method for type II time-dependent covariates. We show that this upgraded QIF method can provide substantial gains in efficiency over QIF and GEE-ind in the case of type II time-dependent covariates.

심전도 패턴 판별을 위한 빈발 패턴 베이지안 분류 (Frequent Pattern Bayesian Classification for ECG Pattern Diagnosis)

  • 노기용;김원식;이헌규;이상태;류근호
    • 정보처리학회논문지D
    • /
    • 제11D권5호
    • /
    • pp.1031-1040
    • /
    • 2004
  • 심장의 활동을 기록한 심전도는 심장의 상태에 대한 가치 있는 임상 정보를 제공한다. 지금까지 심전도를 이용한 심장 질환 진단 알고리즘에 대한 많은 연구가 진행되어 왔으나, 심장 질환에 대한 진단 결과의 부 정확성으로 인해 심전계에서는 외국의 진단 알고리즘을 사용하고 있다. 이 논문에서는 심전도 데이터의 수집에서부터 전 처리 과정 그리고 데이터마이닝을 이용한 심장 질환 패턴 분류 기법을 제안한다. 이 패턴 분류기법은 빈발 패턴 베이지안이며 기존의 나이브 베이지안과 빈발 패턴 마이닝의 통합이다. 빈발 패턴 베이지안은 훈련단계에서 탐사된 빈발 패턴들을 사용하여 Product Approximation 구성하므로써 클래스 조건 독립 가정을 가진 나이브 베이지안의 단점을 해결한다.

무제약 필기 숫자를 인식하기 위한 다수 인식기를 결합하는 의존관계 기반의 프레임워크 (Dependency-based Framework of Combining Multiple Experts for Recognizing Unconstrained Handwritten Numerals)

  • 강희중;이성환
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제27권8호
    • /
    • pp.855-863
    • /
    • 2000
  • K개의 인식기로부터 관찰된 K개 결정을 결합하는 결합 방법론 중의 하나인 BKS (Behavior-Knowledge Space) 방법은 아무런 가정 없이 이들 결정을 결합하지만, 관찰된 K개 결정을 저장하고 관리하려면 이론적으로 기하학적인 저장 공간을 만들어야 한다. 즉, K개의 인식기 결정을 결합하기 위하여 (K+1)차 확률 분포를 필요로 하는데, 작은 K라 할지라도 그 확률 분포를 저장하거나 평가하는 것이 어렵다는 것은 이미 잘 알려져 있다. 그러한 문제점을 극복하기 위해서는 고차 확률 분포를 몇 개의 구성 분포로 나누고, 이들 구성 분포의 곱(product)으로 고차 확률 분포를 근사시켜야 한다. 그러한 이전 방법 중의 하나는 그 확률 분포에 조건부 독립 가정을 적용하는 것이고, 다른 방법으로는 [1]에서와 같이 그 확률 분포를 단지 트리 의존관계 또는 2차 구성 분포의 곱으로 근사하는 것이다. 본 논문에서는, 구성 분포의 곱으로 근사하는 방법에서, 2차 이상의 고차 구성 분포까지 고려하여 (K+1)차 확률 분포를 d차 ($1{\le}d{\le}K$) 의존관계에 의한 최적의 곱으로 근사하고, 베이지안 방법과 그 곱을 기반으로 다수 인식기의 결정을 결합하는 의존관계 기반의 프레임워크를 제안한다. 이 프레임워크는 표준 CENPARMI 데이타베이스로 실험되어 평가되었다.

  • PDF