Sparse Web Data Analysis Using MCMC Missing Value Imputation and PCA Plot-based SOM

Jun, Sung-Hae;Oh, Kyung-Whan;

doi:10.3745/KIPSTD.2003.10D.2.277

The KIPS Transactions:PartD (정보처리학회논문지D)

Volume 10D Issue 2
/
Pages.277-282
/
2003
/
1598-2866(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Sparse Web Data Analysis Using MCMC Missing Value Imputation and PCA Plot-based SOM

MCMC 결측치 대체와 주성분 산점도 기반의 SOM을 이용한 희소한 웹 데이터 분석

전성해 (청주대학교 통계학과) ;
오경환 (서강대학교 컴퓨터학과)

Published : 2003.04.01

https://doi.org/10.3745/KIPSTD.2003.10D.2.277 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The knowledge discovery from web has been studied in many researches. There are some difficulties using web log for training data on efficient information predictive models. In this paper, we studied on the method to eliminate sparseness from web log data and to perform web user clustering. Using missing value imputation by Bayesian inference of MCMC, the sparseness of web data is removed. And web user clustering is performed using self organizing maps based on 3-D plot by principal component. Finally, using KDD Cup data, our experimental results were shown the problem solving process and the performance evaluation.

웹으로부터 유용한 정보를 얻기 위한 연구는 현재 많이 진행되고 있다. 본 논문에서는 특히 웹 로그 데이터의 희소성에 대한 문제 해결과 이를 통한 웹 사용자의 군집화 방안에 대하여 연구하였다. MCMC 방법의 베이지안 추론에 의한 결측치 대체 기법을 이용하여 웹 데이터의 희소성을 제거하였고, 주성분에 의한 산점도를 통하여 형상지도의 차원을 결정한 자기 조직화지도를 이용하여 웹 사용자의 군집화를 수행하였다. 제안 기법은 기존의 방법들에 비해 모형의 정확도와 빠른 학습 시간을 제공하여 주었다. KDD Cup 데이터를 이용한 실험을 통하여 제안 방법에 대한 문제 해결 절차 및 성능 평가를 객관적으로 확인하였다.

Keywords

References

Sonny Han Seng Chee, 'RecTree : A Linear Collaborative Filtering Algorithm,' M. Sc. thesis, Dept. of Computer Science, Univ. Of Toronto, 1992
C. Guilfoyle, 'Ventors of agent technology,' in Proc. UNICOM Seminar Intell. Agents and Their Business Applicat., London, U.K., pp.135-142, 1995
J. Han, M. Kamber, 'Data Mining : Concepts and Techniques,' Morgan Kaufmann Publishers, 123-124, 2001
W. J. Kennedy, Jr James E. Gentle, 'Statistical Computing,' Marcel Dekker, INC., 1980
T. Kohonen, 'Self-organized formation of topologically correct feature maps,' Biological Cybernetics, 43, pp.59-69, 1982 https://doi.org/10.1007/BF00337288
T. Kohonen, 'Self-Organizing and Associative Memory,' Springer, 1984
T. Kohonen, 'Self Organizing Maps,' Springer, 1997
T. M. Mitchell, 'Machine Learning,' McGraw-Hill, 1997
M. E. J. Newman, G. T. Barkema, 'Monte Carlo Methods in Statistical Physics,' Clarendon Press, 1999
S. M. Ross, 'Introductory Statistics,' McGraw-Hill, 1996
D. B. Rubin, 'Multiple Imputation for Nonresponse in Surveys,' John Wiley & Sons, Inc., 1987
B. M. Sarwar , G. Karypis, J. A. Konstan, J. Riedl , 'Application of Dimensionality Reduction in Recommender System-A Case Study,' WebKDD, Web Mining for E-Commerce Workshop, 2000
B. M. Sarwar, 'Sparsity, Scalability, and Distribution in Recommender Systems,' Ph. D. Thesis, Computer Science Dept., Univ. of Minnesota, 2001
J. L. Schafer, 'Analysis of Incomplete Multivariate Data,' Chapman and Hall, 1997
V. N. Vapnik, 'Statistical Learning Theory,' John Wiley & Sons Inc., 1998
http://www.ecn.purdue.edu/KDDCUP

The KIPS Transactions:PartD (정보처리학회논문지D)

Sparse Web Data Analysis Using MCMC Missing Value Imputation and PCA Plot-based SOM

MCMC 결측치 대체와 주성분 산점도 기반의 SOM을 이용한 희소한 웹 데이터 분석

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)