DOI QR코드

DOI QR Code

The proposition of compared and attributably pure confidence in association rule mining

연관 규칙 마이닝에서 비교 기여 순수 신뢰도의 제안

  • Received : 2013.04.15
  • Accepted : 2013.05.11
  • Published : 2013.05.31

Abstract

Generally, data mining is the process of analyzing big data from different perspectives and summarizing it into useful information. The most widely used data mining technique is to generate association rules, and it finds the relevance between two items in a huge database. This technique has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, etc. Among many interestingness measures, confidence is the most frequently used, but it has the drawback that it can not determine the direction of the association. The attributably pure confidence and compared confidence are able to determine the direction of the association, but their ranges are not [-1, +1]. So we can not interpret the degree of association operationally by their values. This paper propose a compared and attributably pure confidence to compensate for this drawback, and then describe some properties for a proposed measure. The comparative studies with confidence, compared confidence, attributably pure confidence, and a proposed measure are shown by numerical example. The results show that the a compared and attributably pure confidence is better than any other confidences.

데이터 마이닝은 빅 데이터에 잠재되어 있는 지식이나 패턴을 찾아내는 기술이며, 대표적인 기법 중의 하나가 연관성 규칙 마이닝이다. 이 기법은 지지도, 신뢰도, 향상도 등의 연관성 평가 기준을 기반으로 하여 각 항목들 간의 관련성을 찾아내는 데 활용되고 있다. 연관성을 평가하기 위한 기준으로 여러 가지 흥미도 측도가 개발되어 있는데, 그 중에서도 신뢰도가 가장 많이 활용되고 있으나 연관성의 방향을 알 수가 없다는 단점을 가지고 있다. 이를 보완하기 위한 측도로 순수 신뢰도가 개발되었으나. 양의 신뢰도과 음의 신뢰도의 값이 동일한 경우에는 이 측도의 값이 같아지므로 정확한 연관성 규칙을 발견할 수 없게 된다. 이러한 단점을 보완하기 위해 기여 순수 신뢰도와 비교 신뢰도가 개발되었는데 이들은 이들 측도들이 취할 수 있는 값의 범위에 대한 문제를 제외하고는 흥미도 측도로서는 매우 바람직하다고 할 수 있으나 값의 범위에 대한 문제점이 존재한다. 이 문제를 해결하기 위해 본 논문에서는 기여 순수 신뢰도와 비교 신뢰도의 크기를 동시에 고려한 비교 기여 순수 신뢰도를 제안하였다. 또한 예제를 통하여 그 유용성을 알아본 결과, 비교 기여 순수 신뢰도는 그 부호에 의해 연관성 규칙의 방향을 파악할 수 있는 동시에 그 값의 범위가 [-1, +1]의 값을 가지므로 행태적 해석이 가능한 것으로 확인되었다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Ahn, K. and Kim, S. (2003). A new interstingness measure in association rules mining. Journal of the Korean Institute of Industrial Engineers, 29, 41-48.
  3. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. Proceedings of ACM SIGMOD Conference on Management of Data, 85-93.
  4. Berzal, F., Cubero, J. C., Marin, N. and Sanchez, D. (2004). Building multi-way decision trees with numerical attributes. Information Sciences, 165, 73-90. https://doi.org/10.1016/j.ins.2003.09.018
  5. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  6. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  7. Cho, K. H. and Park, H. C. (2011b). Discovery of insignificant association rule s using external variable. Journal of the Korean Data Analysis Society, 13, 1343-1352.
  8. Freitas, A. (1999). On rule interestingness measures. Knowledge-based System, 12, 309-315. https://doi.org/10.1016/S0950-7051(99)00019-2
  9. Han, J. and Fu, Y. (1995). Discovery of multiple-level association rules from large databases. Proceeding of the 21st VLDB Conference, 420-431.
  10. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  11. Hilderman, R. J. and Hamilton, H. J. (2000). Applying objective interestingness measures in data mining systems. Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 432-439.
  12. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  13. Kuo, Y. T. (2009) Mining surprising patterns, The doctoral paper of Melbourne university, Australia.
  14. Liu, B., Hsu, W., Chen, S. and Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems, 15, 47-55. https://doi.org/10.1109/5254.889106
  15. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-341.
  16. Park, H. C. (2010a). Standardization for basic association measures in association rule mining. Journal of the Korean Data & Information Science Society, 21, 891-899.
  17. Park, H. C. (2010b). Weighted association rules considering item RFM scores. Journal of the Korean Data & Information Science Society, 21, 1147-1154.
  18. Park, H. C. (2011a). Proposition of negatively pure association rule threshold. Journal of the Korean Data & Information Science Society, 22, 179-188.
  19. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  20. Park, H. C. (2011c). The application of some similarity measures to association rule thresholds. Journal of the Korean Data Analysis Society, 13, 1331-1342.
  21. Park, H. C. (2012). Exploration of symmetric similarity measures by conditional probabilities as association rule thresholds. Journal of the Korean Data Analysis Society, 14, 707-716.
  22. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  23. Pei, J., Han, J. and Mao, R. (2000). CLOSET: an efficient algorithm for mining frequent closed item-sets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  24. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.
  25. Silberschatz, A. and Tuzhilin, A. (1996) What makes patterns interesting in knowledge discovery systems. IEEE transactions on Knowledge Data Engineering, 8, 970-974. https://doi.org/10.1109/69.553165
  26. Srinkant, R., Vu, Q. and Agrawal, R. (1997). Mining association rules with item constraints. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 67-73.
  27. Tan, P. N., Kumar, V. and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 32-41.
  28. Toivonen, H. (1996). Sampling large database for association rules. Proceedings of the 22nd VLDB Conference, 134-145.

Cited by

  1. Signed Hellinger measure for directional association vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.353
  2. The development of symmetrically and attributably pure confidence in association rule mining vol.25, pp.3, 2014, https://doi.org/10.7465/jkdi.2014.25.3.601
  3. Proposition of balanced comparative confidence considering all available diagnostic tools vol.26, pp.3, 2015, https://doi.org/10.7465/jkdi.2015.26.3.611