DOI QR코드

DOI QR Code

확률적 흥미도를 이용한 유사성 측도의 연관성 평가 기준

Exploration of PIM based similarity measures as association rule thresholds

  • 투고 : 2012.10.22
  • 심사 : 2012.11.12
  • 발행 : 2012.11.30

초록

연관성 규칙 기법은 대용량데이터베이스에 있는 항목들 간의 관련성을 수치화 하는 것으로 데이터 마이닝 기법 중에서는 가장 많이 활용되고 있다. 연관성 규칙을 탐사하기 위한 연관성 규칙 평가 기준에는 지지도, 신뢰도, 향상도 등이 있다. 이들 중에서 가장 중심이 되는 신뢰도는 비대칭적 측도일 뿐만 아니라 항상 양의 값만을 취하고 있어서 항목 간에 연관성 규칙을 생성하는 데 여러가지 문제가 존재한다. 이러한 문제를 해결하기 위해 본 논문에서는 확률적 흥미도 측도 기반, 특히 주변 비율을 고려하지 않은 유사성 측도를 연관성 평가 기준으로 적용하는 방안에 대해 연구하였다. 예제에 의한 비교를 통하여 Yule과 Michael의 유사성 계수와 Pearson의 파이 계수는 신뢰도와 동일하게 연관성의 정도를 파악할 수 있는 동시에 부호를 포함하고 있어서 연관성의 방향도 알 수 있었으나, 카이 제곱 통계량 기반 측도들은 항상 양의 값만 나타날 뿐만 아니라 신뢰도와는 변화하는 양상이 다르다는 것을 확인할 수 있었다.

Association rule mining is the method to quantify the relationship between each set of items in a large database. One of the well-studied problems in data mining is exploration for association rules. There are three primary quality measures for association rule, support and confidence and lift. We generate some association rules using confidence. Confidence is the most important measure of these measures, but it is an asymmetric measure and has only positive value. Thus we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure to find a solution to this problem. The comparative studies with support, two confidences, lift, and some similarity measures by probabilistic interestingness measure are shown by numerical example. As the result, we knew that the similarity measures by probabilistic interestingness measure could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values.

키워드

참고문헌

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
  3. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. Proceedings of ACM SIGMOD Conference on Management of Data, 85-93.
  4. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  5. Cho, K. H. and Park, H. C. (2011). Discovery of insignificant association rule s using external variable. Journal of the Korean Data Analysis Society, 13, 1343-1352.
  6. Doolittle, M. H. (1885). The verification of predictions. Bulletin of the Philosophical Society of Washington, 7, 122-127.
  7. Han, J. and Fu, Y. (1995). Discovery of multiple-level association rules from large databases. Proceeding of the 21st VLDB Conference, 420-431.
  8. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  9. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  10. Imberman S., Domanski B. and Thompson H.(2001), Boolean analyer - An algorithm that uses a probabilistic interestingness measure to find dependency/association rules in a head trauma data. Proceedings of Americas Conference on Information Systems, 369-375.
  11. Lim, J., Lee, K. and Cho, Y. (2010). A study of association rule by considering the frequency. Journal of the Korean Data & Information Science Society, 21, 1061-1069.
  12. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  13. Michael, E. L. (1920). Marine ecology and the coefficient of association. Journal of Animal Ecology, 8, 54-59. https://doi.org/10.2307/2255213
  14. Montgomery, A. C. and Crittenden, K. S. (1977). Improving coding reliability for open-ended questions. Public Opinion Quarterly, 41, 235-243. https://doi.org/10.1086/268378
  15. Orchard, R. A. (1975). On the determination of relationships between computer system state variables, Bell Laboratories Technical Memorandum, Bell Laboratories, New Jersey.
  16. Park, H. C. (2010a). Weighted association rules considering item RFM scores. Journal of the Korean Data & Information Science Society, 21, 1147-1154.
  17. Park, H. C. (2010b). Standardization for basic association measures in association rule mining. Journal of the Korean Data & Information Science Society, 21, 891-899.
  18. Park, H. C. (2011a). Proposition of negatively pure association rule threshold. Journal of the Korean Data & Information Science Society, 22, 179-188.
  19. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  20. Park, H. C. (2011c). The application of some similarity measures to association rule thresholds. Journal of the Korean Data Analysis Society, 13, 1331-1342.
  21. Park, J. S., Chen, M. S. and Philip, S. Y. (1995). An effective hash-based algorithms for mining association rules. Proceedings of ACM SIGMOD Conference on Management of Data, 175-186.
  22. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  23. Pearson, K. (1926). On the coefficient of racial likeness. Biometrika, 9, 105-117.
  24. Pearson, K and Heron, D. (1913). On theories of association. Biometrika, 9, 159-315. https://doi.org/10.1093/biomet/9.1-2.159
  25. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  26. Piatetsky-Shapiro, G (1991). Discovery, analysis and presentation of strong rules, Knowledge Discovery in Databases. AAAI/MIT Press, 229-248.
  27. Srinkant R., Vu Q. and Agrawal R. (1997). Mining association rules with item constraints. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 67-73.
  28. Toivonen H. (1996). Sampling large database for association rules. Proceedings of the 22nd VLDB Conference, 134-145.
  29. Warrens M. J. (2008). Similarity coefficients for binary data, properties of coefficients, coefficient matrices, multi-way metrics and multivariate coefficients, The Doctoral paper of Leiden University, Netherlands.
  30. Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society, 75, 257-319.
  31. Yule, G. U. (1912). On the methods of measuring the association between two attributes. Journal of the Royal Statistical Society , 75, 579-652. https://doi.org/10.2307/2340126

피인용 문헌

  1. Utilization of similarity measures by PIM with AMP as association rule thresholds vol.24, pp.1, 2013, https://doi.org/10.7465/jkdi.2013.24.1.117
  2. Comparison of confidence measures useful for classification model building vol.25, pp.2, 2014, https://doi.org/10.7465/jkdi.2014.25.2.365
  3. Proposition of causally confirmed measures in association rule mining vol.25, pp.4, 2014, https://doi.org/10.7465/jkdi.2014.25.4.857
  4. A study on the ordering of similarity measures with negative matches vol.26, pp.1, 2015, https://doi.org/10.7465/jkdi.2015.26.1.89
  5. A study on the ordering of PIM family similarity measures without marginal probability vol.26, pp.2, 2015, https://doi.org/10.7465/jkdi.2015.26.2.367
  6. Signed Hellinger measure for directional association vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.353
  7. Proposition of causal association rule thresholds vol.24, pp.6, 2013, https://doi.org/10.7465/jkdi.2013.24.6.1189
  8. Development of association rule threshold by balancing of relative rule accuracy vol.25, pp.6, 2014, https://doi.org/10.7465/jkdi.2014.25.6.1345
  9. Exploration of relationship between confirmation measures and association thresholds vol.24, pp.4, 2013, https://doi.org/10.7465/jkdi.2013.24.4.835
  10. The proposition of cosine net confidence in association rule mining vol.25, pp.1, 2014, https://doi.org/10.7465/jkdi.2014.25.1.97
  11. The development of symmetrically and attributably pure confidence in association rule mining vol.25, pp.3, 2014, https://doi.org/10.7465/jkdi.2014.25.3.601