Search | Korea Science

Lee, Jung-Jin;Park, Hae-Ki
- Communications for Statistical Applications and Methods
- /
- v.17 no.4
- /
- pp.527-540
- /
- 2010
Rule-based classification analysis is widely used for massive datamining because it is easy to understand and its algorithm is uncomplicated. In this classification analysis, majority vote of rules or weighted combination of rules using their supports are frequently used in order to combine rules. We propose a method to combine rules by using the multinomial distribution in this paper. Iterative proportional fitting algorithm is used to estimate the multinomial distribution which maximizes entropy constrained on rules' support. Simulation experiments show that this method can compete with other well known classification models in the case of two similar populations.
https://doi.org/10.5351/CKSS.2010.17.4.527 인용 PDF KSCI

최현집;신상준
- The Korean Journal of Applied Statistics
- /
- v.13 no.1
- /
- pp.197-206
- /
- 2000
For estimating missing cells in contingency table, we suggest an iterative method which extends IPF (Iterative Proportional Fitting) method. The suggested m~thod is not restricted by the number and the location of missing cells, and does not distort the given quasi-independency.
PDF

NamKung, Pyong;Choi, Jae-Hyuk
- Communications for Statistical Applications and Methods
- /
- v.13 no.2
- /
- pp.327-341
- /
- 2006
Winkler (1990, 2001), Sitter and Skinner (1994), Wilson and Sitter (2002) present a method which applies linear programing to designing surveys with multi-way stratification, primarily in situation where the desired sample size is less than or only slightly larger than the total number of stratification cells. A comparison is made with existing methods both by illustrating the sampling schemes generated for specific examples, by evaluating sample mean, variance estimation, and mean squared errors, and by simulating sample mean for all methods. The computations required can, however, increase rapidly as the number of cells in the multi-way classification increase. In this article their approach is applied to multi-way stratification using real data.
https://doi.org/10.5351/CKSS.2006.13.2.327 인용 PDF KSCI

Lee, Jong Cheol;Hong, Chong Sun
- Communications for Statistical Applications and Methods
- /
- v.7 no.3
- /
- pp.687-698
- /
- 2000
An identification method is proposed in order to detect more than one outlying cells in multi-way contingency tables. The iterative proportional fitting method is applied to get expected values of several suspected outlying cells. Since the proposed method uses minimal sufficient statistics under quasi log-linear models, expected counts of outlying cells could be estimated under any hierarchical log-linear models. This method is an extension of the backwards-stepping method of Simonoff(1988) and requires les iteration to identify outlying cells.
PDF

You, Hyun-Jo;Lee, Jung-Jin
- The Korean Journal of Applied Statistics
- /
- v.32 no.5
- /
- pp.763-782
- /
- 2019
This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.
https://doi.org/10.5351/KJAS.2019.32.5.763 인용 PDF KSCI