Reputation Analysis of Document Using Probabilistic Latent Semantic Analysis Based on Weighting Distinctions

가중치 기반 PLSA를 이용한 문서 평가 분석

  • 조시원 (동국대 공대 전기공학과) ;
  • 이동욱 (동국대 공대 전기공학과)
  • Published : 2009.03.01

Abstract

Probabilistic Latent Semantic Analysis has many applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. In this paper, we propose an algorithm using weighted Probabilistic Latent Semantic Analysis Model to find the contextual phrases and opinions from documents. The traditional keyword search is unable to find the semantic relations of phrases, Overcoming these obstacles requires the development of techniques for automatically classifying semantic relations of phrases. Through experiments, we show that the proposed algorithm works well to discover semantic relations of phrases and presents the semantic relations of phrases to the vector-space model. The proposed algorithm is able to perform a variety of analyses, including such as document classification, online reputation, and collaborative recommendation.

Keywords

References

  1. Bo Pang and Lillian Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2(1-2), pp. 1-135, 2008 https://doi.org/10.1561/1500000011
  2. T. Hofmann, Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42(1-2), pp. 177-196, 2001 https://doi.org/10.1023/A:1007617005950
  3. Thomas Landauer, P. W. Foltz, and D. Laham, Introduction to Latent Semantic Analysis. Discourse Processes 25: 259-284, 1998 https://doi.org/10.1080/01638539809545028
  4. Daniel D. Lee and H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization, Nature, vol 401, pp. 788-791, 1999 https://doi.org/10.1038/44565
  5. 홍영국, 이종혁, 이근배, 의존문법에 기반을 둔 한국어 구문 분석기, 한국정보과학회 1993년 봄 학술논문발표집 제20권 제8호, pp. 33-46, 1994
  6. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1-21, 1977
  7. R. J. Kozick and B. M. Sadler, Maximum-likelihood array processing m non-Gaussian noise with Gaussian mixtures, IEEE Trans. on Signal Processing, vol. 48, No. 12, pp. 3520-3535, 2000 https://doi.org/10.1109/78.887045
  8. H. Chen, R. Perry, and K. Buckley, Direct and EM-based map sequence estimation with unknown time-varying channels, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2129-2132, 2001 https://doi.org/10.1109/ICASSP.2001.940414
  9. R. A. Boyles, On the convergence of the EM algorithm, J. Roy. Sta. B., vol. 45, no. 1, pp. 47-50, 1983
  10. C. Wu, On the convergence properties of the EM algorithm, Ann. Statist., vol. 11. 1, pp. 95-103, 1983 https://doi.org/10.1214/aos/1176346060
  11. 김성수, 강지혜, 새로운 고속 EM 알고리즘, 한국정보과학회, 정보과학회논문지 : 시스템 및 이론 제31권 제9.10호, pp. 575-587, 2004
  12. G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, vol. 24, no. 5, pages 513-523, 1988 https://doi.org/10.1016/0306-4573(88)90021-0
  13. Shimodaira, H., Improving Predictive Inference under Covariate Shift by Weighting the Log-likelihood Function. Journal of Statistical Planning and Inference, Vol. 90, 227-244, 2000 https://doi.org/10.1016/S0378-3758(00)00115-4
  14. 이경찬, 강승식, 범주 대표어의 가중치 계산 방식에 의한 자동 문서 분류 시스템, 한국정보과학회, 한국정보과학회 2002년도 봄 학술발표논문집 제29권 제1호(B), pp.475-477, 2002
  15. 한국과학기술정보연구원, http://www.kristalinfo.com/K-Lab/Text-CatiKRTC.2003.tar.gz