통합 검색 | Korea Science

이정진;박해기
- Communications for Statistical Applications and Methods
- /
- 제17권4호
- /
- pp.527-540
- /
- 2010
규칙기반 분류분석(rule-based classification analysis)은 직관적인 이해가 쉽고 알고리즘이 복잡하지 않아 최근 대용량 데이터마이닝에 많이 이용되는 기법이다. 하지만 현재의 규칙기반 분석은 여러 개의 규칙들을 찾은후 이 규칙들을 단순히 다수결이나 또는 중요도의 가중 합으로서 새로운 데이터를 분류한다. 본 연구에서는 다항분포를 이용한 이항데이터의 분류분석 기법을 규칙 조합방법에 응용하고자한다. 다향분포의 추정을 위해서는 변형된 반복 비율 적합(iterative proportional fitting; IPF) 알고리즘을 이용하여 최대 엔트로피 분포(entropy distribution)를 찾는다. 시뮬레이션 실험 결과 이 방법은 두 집단의 데이터가 서로 유사한 경우 어느 정도 의미 있는 분류 결과를 보여주였다.
https://doi.org/10.5351/CKSS.2010.17.4.527 인용 PDF KSCI

최현집;신상준
- 응용통계연구
- /
- 제13권1호
- /
- pp.197-206
- /
- 2000
반복비율적합 방법을 확장하여 준독립성모형하에서 불완전한 다차원 분할표에 포함된 결측칸의 최우추정값을 얻기 위한 추정방법을 제안하였다. 제안된 방법은 주변합이 영이 아닌 모든 불완전한 분할표에 적용할 수 있으며 주어진 준로그선형모형의 구조를 해치지 않는다. 또한 결측칸의 위치와 수에 영향을 받지 않고 항상 수렴한다는 것을 확인하였다.
PDF

NamKung, Pyong;Choi, Jae-Hyuk
- Communications for Statistical Applications and Methods
- /
- 제13권2호
- /
- pp.327-341
- /
- 2006
Winkler (1990, 2001), Sitter and Skinner (1994), Wilson and Sitter (2002) present a method which applies linear programing to designing surveys with multi-way stratification, primarily in situation where the desired sample size is less than or only slightly larger than the total number of stratification cells. A comparison is made with existing methods both by illustrating the sampling schemes generated for specific examples, by evaluating sample mean, variance estimation, and mean squared errors, and by simulating sample mean for all methods. The computations required can, however, increase rapidly as the number of cells in the multi-way classification increase. In this article their approach is applied to multi-way stratification using real data.
https://doi.org/10.5351/CKSS.2006.13.2.327 인용 PDF KSCI

Lee, Jong Cheol;Hong, Chong Sun
- Communications for Statistical Applications and Methods
- /
- 제7권3호
- /
- pp.687-698
- /
- 2000
An identification method is proposed in order to detect more than one outlying cells in multi-way contingency tables. The iterative proportional fitting method is applied to get expected values of several suspected outlying cells. Since the proposed method uses minimal sufficient statistics under quasi log-linear models, expected counts of outlying cells could be estimated under any hierarchical log-linear models. This method is an extension of the backwards-stepping method of Simonoff(1988) and requires les iteration to identify outlying cells.
PDF

유현조;이정진
- 응용통계연구
- /
- 제32권5호
- /
- pp.763-782
- /
- 2019
텍스트 문서 집합에 대한 정보검색에서는 주어진 질의에 부합하는 각 문서의 적합도 확률을 계산하고 이 확률이 높은 것부터 낮은 순으로 문서 순위를 정하여 사용자에게 제공한다, 각 문서의 적합도 확률 계산에 많이 사용되는 모형은 단어들이 확률적으로 독립이라는 가정 하에 확률을 추정한다. 이 모형은 단어들의 결합 확률을 계산하는 것이 현실적으로 어렵다는 점에서 많이 이용되고 있지만 질의에 사용되는 단어들이 대개 서로 관련성을 가지고 있다는 사실을 고려하고 있지 않다. 본 논문에서는 단어 자질들의 의존 구조를 고려하여 문서의 적합도 확률을 계산하기 위하여 단어들의 결합 패턴의 확률을 다항분포 모형으로 가정하고, 최대 엔트로피 방법으로 확률을 추정하여 문서 순위를 매기는 정보검색 모형을 제안한다. 여러 가지 다항분포 상황에서 시뮬레이션 실험을 한 결과 변수들의 독립을 가정한 모형보다 더 우수한 추정 결과를 보여 준다. 실제 LETOR OHSUMED 데이터 이용한 문서 순위 매기기 실험의 결과도 더 나은 검색 결과를 보여 준다.
https://doi.org/10.5351/KJAS.2019.32.5.763 인용 PDF KSCI