• Title/Summary/Keyword: Iterative proportional fitting

Search Result 5, Processing Time 0.017 seconds

Rule-Based Classification Analysis Using Entropy Distribution (엔트로피 분포를 이용한 규칙기반 분류분석 연구)

  • Lee, Jung-Jin;Park, Hae-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.527-540
    • /
    • 2010
  • Rule-based classification analysis is widely used for massive datamining because it is easy to understand and its algorithm is uncomplicated. In this classification analysis, majority vote of rules or weighted combination of rules using their supports are frequently used in order to combine rules. We propose a method to combine rules by using the multinomial distribution in this paper. Iterative proportional fitting algorithm is used to estimate the multinomial distribution which maximizes entropy constrained on rules' support. Simulation experiments show that this method can compete with other well known classification models in the case of two similar populations.

Estimating Missing Cells in Contingency Table with IPE (반복비율적합에 의한 다차원 분할표의 결측칸값 추정)

  • 최현집;신상준
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.1
    • /
    • pp.197-206
    • /
    • 2000
  • For estimating missing cells in contingency table, we suggest an iterative method which extends IPF (Iterative Proportional Fitting) method. The suggested m~thod is not restricted by the number and the location of missing cells, and does not distort the given quasi-independency.

  • PDF

Allocation in Multi-way Stratification by Linear Programing

  • NamKung, Pyong;Choi, Jae-Hyuk
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.327-341
    • /
    • 2006
  • Winkler (1990, 2001), Sitter and Skinner (1994), Wilson and Sitter (2002) present a method which applies linear programing to designing surveys with multi-way stratification, primarily in situation where the desired sample size is less than or only slightly larger than the total number of stratification cells. A comparison is made with existing methods both by illustrating the sampling schemes generated for specific examples, by evaluating sample mean, variance estimation, and mean squared errors, and by simulating sample mean for all methods. The computations required can, however, increase rapidly as the number of cells in the multi-way classification increase. In this article their approach is applied to multi-way stratification using real data.

Identification of Multiple Outlying Cells in Multi-way Tables

  • Lee, Jong Cheol;Hong, Chong Sun
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.687-698
    • /
    • 2000
  • An identification method is proposed in order to detect more than one outlying cells in multi-way contingency tables. The iterative proportional fitting method is applied to get expected values of several suspected outlying cells. Since the proposed method uses minimal sufficient statistics under quasi log-linear models, expected counts of outlying cells could be estimated under any hierarchical log-linear models. This method is an extension of the backwards-stepping method of Simonoff(1988) and requires les iteration to identify outlying cells.

  • PDF

A probabilistic information retrieval model by document ranking using term dependencies (용어간 종속성을 이용한 문서 순위 매기기에 의한 확률적 정보 검색)

  • You, Hyun-Jo;Lee, Jung-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.5
    • /
    • pp.763-782
    • /
    • 2019
  • This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.