DOI QR코드

DOI QR Code

A study on the relatively causal strength measures in a viewpoint of interestingness measure

흥미도 측도 관점에서 상대적 인과 강도의 고찰

  • Received : 2016.12.28
  • Accepted : 2017.01.10
  • Published : 2017.01.31

Abstract

Among the techniques for analyzing big data, the association rule mining is a technique for searching for relationship between some items using various relevance evaluation criteria. This associative rule scheme is based on the direction of rule creation, and there are positive, negative, and inverse association rules. The purpose of this paper is to investigate the applicability of various types of relatively causal strength measures to the types of association rules from the point of view of interestingness measure. We also clarify the relationship between various types of confidence measures. As a result, if the rate of occurrence of the posterior item is more than 0.5, the first measure ($RCS_{IJ1}$) proposed by Good (1961) is more preferable to the first measure ($RCS_{LR1}$) proposed by Lewis (1986) because the variation of the value is larger than that of $RCS_{LR1}$, and if the ratio is less than 0.5, $RCS_{LR1}$ is more preferable to $RCS_{IJ1}$.

빅 데이터를 분석하기 위한 기법 중에서 연관성 규칙은 여러 가지 연관성 평가 기준을 이용하여 항목들 간에 연관성 유무를 탐색하는 기법이다. 이러한 연관성 규칙 기법은 규칙의 생성 방향에 따라 정과 부, 그리고 역의 연관성 규칙 등이 있다. 본 논문에서는 여러 가지 상대적 인과 강도를 흥미도 측도의 관점에서 어떤 유형의 연관성 규칙에 적용 가능한 지를 탐색하는 동시에 기존의 기본적인 평가측도 증에서 여러 가지 유형의 신뢰도들과의 관계를 규명하고자 하였다. 그 결과, 후항변수가 발생할 비율이 0.5 이상이면 Good이 제안한 측도 ($RCS_{IJ1}$)가 Lewis가 제안한 측도 ($RCS_{LR1}$) 보다 값의 변화폭이 더 크므로 $RCS_{IJ1}$이 더 바람직한 측도가 되며, 그 비율이 0.5 미만이면 $RCS_{LR1}$이 더 바람직하다고 할 수 있다.

Keywords

References

  1. Ahn, K. and Kim, S. (2003). A new interstingness measure in association rules mining. Journal of the Korean Institute of Industrial Engineers, 29, 41-48.
  2. Bing Liu, B., Hsu, W., Chen, S. and Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems, 15, 47-55. https://doi.org/10.1109/5254.889106
  3. Cheng, P. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367-405. https://doi.org/10.1037/0033-295X.104.2.367
  4. Eells, E. (1991). Probabilistic causality, Cambridge University Press, U.K.
  5. Fitelson, B. and Hitchcock, C. (2011). Probabilistic measures of causal strength. Causality in the sciences, Oxford University Press, Oxford, 600-627.
  6. Freitas, A (1999). On rule interestingness measures. Knowledge-based System, 12, 309-315. https://doi.org/10.1016/S0950-7051(99)00019-2
  7. Good, I. J. (1961). A causal calculus I. British Journal for the Philosophy of Science, 11, 305-18.
  8. Good, I. J. (1962). A causal calculus II. British Journal for the Philosophy of Science, 12, 43-51.
  9. Hilderman, R. J. and Hamilton H. J. (2000). Applying objective interestingness measures in data mining systems. Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, Springer-Verlag, London, UK, 432-439.
  10. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  11. Kang, M., Kim, S. and Park, S. (2012). Analysis and utilization of big data. Communications of the Korean Institute of Information Scientists and Engineers, 30, 25-32.
  12. Kim, H. and Lee, M. (2016). Big data and entertainment content: Case studies and prospects. Journal of Internet Computing and Services, 17, 109-118.
  13. Lewis, D. (1986). Postscripts to causation. Philosophical Papers, 2, 173-213.
  14. Park, H. C. (2014a). Comparison of cosine family similarity measures in the aspect of association rule. Journal of the Korean Data Analysis Society, 16, 729-737.
  15. Park, H. C. (2014b). Comparison of confidence measures useful for classification model building. Journal of the Korean Data & Information Science Society, 25, 1-7. https://doi.org/10.7465/jkdi.2014.25.1.1
  16. Park, H. C. (2015a). Proposition of balanced comparative confidence considering all available diagnostic tools. Journal of the Korean Data & Information Science Society, 26, 611-618. https://doi.org/10.7465/jkdi.2015.26.3.611
  17. Park, H. C. (2015b). Comparison study of symmetric confirmation measures and probabilistic interestingness measure. Journal of the Korean Data Analysis Society, 17, 749-758.
  18. Park, H. C. (2015c). A study on the bounds of PIM based similarity measures with AMP. Journal of the Korean Data Analysis Society, 17, 1839-1847.
  19. Park, H. C. (2016). Signed Hellinger measure for directional association. Journal of the Korean Data & Information Science Society, 27, 353-362. https://doi.org/10.7465/jkdi.2016.27.2.353
  20. Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge Data Engineering, 8, 970-974. https://doi.org/10.1109/69.553165
  21. Tan, P. N., Kumar, V. and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, USA, 32-41.