DOI QR코드

DOI QR Code

Utilization of similarity measures by PIM with AMP as association rule thresholds

모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안

  • Received : 2012.12.10
  • Accepted : 2013.01.07
  • Published : 2013.01.31

Abstract

Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.

연관성 규칙 탐사는 상당한 양의 데이터베이스에 내재되어 있는 항목들 간의 관련성을 파악하는 것으로 쇼핑몰, 보건 및 의료, 교육분야 등의 현장에서 많이 적용되고 있다. 이러한 연관성 규칙을 생성하기 위해 연관성 규칙 평가 기준인 지지도, 신뢰도, 향상도 등이 활용되고 있다. 이들 중에서 신뢰도가 연관성 평가 기준으로 가장 많이 활용되고는 있으나 항상 양의 값을 취하는 비대칭적 측도이기 때문에 항목 간에 연관성 규칙을 생성하는 데 어려움이 존재하게 된다. 이러한 문제를 해결하기 위해 본 논문에서는 주변 비율 전부를 포함한 확률적 흥미도 기반 유사성 측도를 연관성 평가 기준으로 활용하는 방안을 고려하였다. 이 측도들은 주변비율 전부와 교차표의 모든 항을 고려하여 연관성의 강도를 측정하는 측도이므로 나타나는 모든 정보를 충실히 반영해주는 측도라고 할 수 있다. 모의실험을 통해 확인한 결과, 모든 주변 비율을 고려한 확률적 흥미도 기반 유사성 측도 대부분이 기존의 연관성 평가 기준과 마찬가지로 연관성의 정도를 파악할 수 있는 동시에 부호를 포함하고 있어서 연관성의 방향도 알 수 있었다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. https://doi.org/10.1177/001316446002000104
  3. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. https://doi.org/10.1007/BF02310555
  4. Fleiss, J. L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31, 651-659. https://doi.org/10.2307/2529549
  5. Imberman S., Domanski B. and Thompson H.(2001). Boolean analyer-An algorithm that uses a probabilistic interestingness measure to find dependency /association rules in a head trauma data. Proceedings of Americas Conference on Information Systems, 369-375.
  6. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B., (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  7. Kuder, G. F. and Richardson, M. W. (1937). The theory of estimation of test reliability. Psychometrika, 2, 151-160. https://doi.org/10.1007/BF02288391
  8. Maxwell, A. E. and Pilliner, A. E. G. (1968). Deriving coefficients of reliability and agreement for ratings. British Journal of Mathematical and Statistical Psychology, 21, 105-116. https://doi.org/10.1111/j.2044-8317.1968.tb00401.x
  9. Orchard, R. A. (1975). On the determination of relationships between computer system state variables. Bell Laboratories Technical Memorandum, Bell Laboratories, New Jersey.
  10. Park, H. C. (2010a). Weighted association rules considering item RFM scores. Journal of the Korean Data & Information Science Society, 21, 1147-1154.
  11. Park, H. C. (2010b). Standardization for basic association measures in association rule mining. Journal of the Korean Data & Information Science Society, 21, 891-899.
  12. Park, H. C. (2011a). Proposition of negatively pure association rule threshold. Journal of the Korean Data & Information Science Society, 22, 179-188.
  13. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  14. Park, H. C. (2011c). The application of some similarity measures to association rule thresholds. Journal of the Korean Data Analysis Society, 13, 1331-1342.
  15. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 14, 707-716.
  16. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  17. Park, H. C. (2012c). Exploration of PIM based similarity measures with PMP as association rule thresholds. Journal of the Korean Data Analysis Society, to be published.
  18. Piatetsky-Shapiro, G (1991). Discovery, analysis and presentation of strong rules. Proceedings of the 9th National Conference on Artificial Intelligence: Knowledge Discovery in Databases, 229-248.
  19. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Proceedings of the 21st VLDB Conference, 407-419.
  20. Stiles, H. E. (1961). The association factor in information retrieval. Journal of the Association for Com-puting Machinery, 8, 271-279. https://doi.org/10.1145/321062.321074
  21. Warrens M. J. (2008). Similarity coefficients for binary data, properties of coefficients, coefficient matrices, multi-way metrics and multivariate coefficients, The Doctoral paper of Leiden University, Netherlands.