• Title/Summary/Keyword: 연관성 측도

Search Result 47, Processing Time 0.021 seconds

Non-linear regression model considering all association thresholds for decision of association rule numbers (기본적인 연관평가기준 전부를 고려한 비선형 회귀모형에 의한 연관성 규칙 수의 결정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.267-275
    • /
    • 2013
  • Among data mining techniques, the association rule is the most recently developed technique, and it finds the relevance between two items in a large database. And it is directly applied in the field because it clearly quantifies the relationship between two or more items. When we determine whether an association rule is meaningful, we utilize interestingness measures such as support, confidence, and lift. Interestingness measures are meaningful in that it shows the causes for pruning uninteresting rules statistically or logically. But the criteria of these measures are chosen by experiences, and the number of useful rules is hard to estimate. If too many rules are generated, we cannot effectively extract the useful rules.In this paper, we designed a variety of non-linear regression equations considering all association thresholds between the number of rules and three interestingness measures. And then we diagnosed multi-collinearity and autocorrelation problems, and used analysis of variance results and adjusted coefficients of determination for the best model through numerical experiments.

A Spatial Statistical Approach to Residential Differentiation (I): Developing a Spatial Separation Measure (거주지 분화에 대한 공간통계학적 접근 (I): 공간 분리성 측도의 개발)

  • Lee, Sang-Il
    • Journal of the Korean Geographical Society
    • /
    • v.42 no.4
    • /
    • pp.616-631
    • /
    • 2007
  • Residential differentiation is an academic theme which has been given enormous attention in urban studies. This is due to the fact that residential segregation can be seen as one of the best indicators for socio-spatial dialectics occurring on urban space. Measuring how one population group is differentiated from the other group in terms of residential space has been a focal point in the residential segregation studies. The index of dissimilarity has been the most extensively used one. Despite its popularity, however, it has been accused of inability to capture the degree of spatial clustering that unevenly distributed population groups usually display. Further, the spatial indices of segregation which have been introduced to edify the problems of the index of dissimilarity also have some drawbacks: significance testing methods have never been provided; recent advances in spatial statistics have not been extensively exploited. Thus, the main purpose of the research is to devise a spatial separation measure which is expected to gauge not only how unevenly two population groups are distributed over urban space, but also how much the uneven distributions are spatially clustered (spatial dependence). The main results are as follows. First, a new measure is developed by integrating spatial association measures and spatial chi-square statistics. A significance testing method based on the generalized randomization test is also provided. Second, a case study of residential differentiation among groups by educational attainment in major Korean metropolitan cities clearly shows the applicability of the analytical framework presented in the paper.

Negatively attributable and pure confidence for generation of negative association rules (음의 연관성 규칙 생성을 위한 음의 기여 순수 신뢰도의 제안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.5
    • /
    • pp.939-948
    • /
    • 2012
  • The most widely used data mining technique is to explore association rules. This technique has been used to find the relationship between items in a massive database based on the interestingness measures such as support, confidence, lift, etc. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control.In general, association rule technique generates the rule, 'If A, then B.', whereas negative association rule technique generates the rule, 'If A, then not B.', or 'If not A, then B.'. We can determine whether we promote other products in addition to promote its products only if we add negative association rules to existing association rules. In this paper, we proposed the negatively attributable and pure confidence to overcome the problems faced by negative association rule technique, and then we checked three conditions for interestingness measure. The comparative studies with negative confidence, negatively pure confidence, and negatively attributable and pure confidence are shown by numerical examples. The results show that the negatively attributable and pure confidence is better than negative confidence and negatively pure confidence.

Comparisons of several measures related to k-anonymity algorithms (k-익명성 알고리즘 관련 측도들에 대한 비교 분석)

  • Youn-kyoung Shin;Ju-Sung Kang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.1405-1408
    • /
    • 2008
  • 개인정보 노출 위험을 최소화하면서 데이터 유용성을 최대화하기 위한 기법 중의 하나인 k-익명성 개념과 연관된 다양한 측도(measure)들을 비교 분석한다. 원본 데이터와 변형된 데이터가 주어졌을 때, 각각 다른 k-익명성 알고리즘들에서 제안된 높이(height), 정확도(precision), 손실측도(loss metric), 비용(cost), 점수(score) 등의 측도들이 데이터의 정확성(accuracy)을 측정하는 데에 대한 일관성과 개별성을 조사하고, 그 측도들의 특징에 따른 의미와 효율성을 비교분석한다.

Association rule ranking function by decreased lift influence (향상도 영향 감소화에 의한 연관성 순위결정함수)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.397-405
    • /
    • 2010
  • Data mining is the method to find useful information for large amounts of data in database, and one of the important goals is to search and decide the association for several variables. The task of association rule mining is to find certain association relationships among a set of data items in a database. There are three primary measures for association rule, support and confidence and lift. In this paper we developed a association rule ranking function by decreased lift influence to generate association rule for items satisfying at least one of three criteria. We compared our function with the functions suggested by Park (2010), and Wu et al. (2004) using some numerical examples. As the result, we knew that our decision function was better than the function of Park's and Wu's functions because our function had a value between -1 and 1regardless of the range for three association thresholds. Our function had the value of 1 if all of three association measures were greater than their thresholds and had the value of -1 if all of three measures were smaller than the thresholds.

Association rule ranking function using conditional probability increment ratio (조건부 확률증분비를 이용한 연관성 순위 결정 함수)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.4
    • /
    • pp.709-717
    • /
    • 2010
  • The task of association rule mining is to find certain association relationships among a set of data items in a database. There are three primary measures for association rule, support and confidence and lift. In this paper we developed a association rule ranking function using conditional probability increment ratio. We compared our function with several association rule ranking functions by some numerical examples. As the result, we knew that our decision function was better than the existing functions. The reasons were that the proposed function of the reference value is not affected by a particular association threshold, and our function had a value between -1 and 1 regardless of the range for three association thresholds. And we knew that the ranking function using conditional probability increment ratio was very well reflected in the difference between association rule measures and the minimum association rule thresholds, respectively.

The proposition of attributably pure confidence in association rule mining (연관 규칙 마이닝에서 기여 순수 신뢰도의 제안)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.2
    • /
    • pp.235-243
    • /
    • 2011
  • The most widely used data mining technique is to explore association rules. This technique has been used to find the relationship between each set of items based on the association thresholds such as support, confidence, lift, etc. There are many interestingness measures as the criteria for evaluating association rules. Among them, confidence is the most frequently used, but it has the drawback that it can not determine the direction of the association. The net confidence measure was developed to compensate for this drawback, but it is useless in the case that the value of positive confidence is the same as that of negative confidence. This paper propose a attributably pure confidence to evaluate association rules and then describe some properties for a proposed measure. The comparative studies with confidence, net confidence, and attributably pure confidence are shown by numerical example. The results show that the attributably pure confidence is better than confidence or net confidence.

Generally non-linear regression model containing standardized lift for association number estimation (연관성 규칙 수의 추정을 위한 일반적인 비선형 회귀모형에서의 표준화 향상도 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.629-638
    • /
    • 2016
  • Among data mining techniques, the association rule is one of the most used in the real fields because it clearly displays the relationship between two or more items in large databases by quantifying the relationship between the items. There are three primary quality measures for association rule; support, confidence, and lift. We evaluate association rules using these measures. The approach taken in the previous literatures as to estimation of association rule number has been one of a determination function method or a regression modeling approach. In this paper, we proposed a few of non-linear regression equations useful in estimating the number of rules and also evaluated the estimated association rules using the quality measures. Furthermore we assessed their usefulness as compared to conventional regression models using the values of regression coefficients, F statistics, adjusted coefficients of determination and variation inflation factor.

Proposition of balanced comparative confidence considering all available diagnostic tools (모든 가능한 진단도구를 활용한 균형비교신뢰도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.611-618
    • /
    • 2015
  • By Wikipedia, big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Association rule is a well researched method for discovering interesting relationships between itemsets in huge databases and has been applied in various fields. There are positive, negative, and inverse association rules according to the direction of association. If you want to set the evaluation criteria of association rule, it may be desirable to consider three types of association rules at the same time. To this end, we proposed a balanced comparative confidence considering sensitivity, specificity, false positive, and false negative, checked the conditions for association threshold by Piatetsky-Shapiro, and compared it with comparative confidence and inversely comparative confidence through a few experiments.

A clustering method with some side conditions on the cluster (群集間에 制約條件이 있는 경우의 群集方法에 대한 연구)

  • 김성주
    • The Korean Journal of Applied Statistics
    • /
    • v.1 no.1
    • /
    • pp.45-56
    • /
    • 1987
  • 본 논문은 선거구 劃定을 서로 연관된 전체와 均等比例라는 制約條件이 있는 경우의 群集方法이라는 측면에서 관찰한다. 두 지역의 類似性을 측정할 수 있는 새로운 測度가 개발되며 이는 하나의 선거구가 되기 위해 중요시 되는 몇가지 기준에 대해 두 지역이 일치한 횟수로 정의된다. 이러한 유사성 측도를 기초로 해서 선거구 劃定을 위한 새로운 階層群集方法이 제시된다. 새로운 유사성 측도와 계측군집방법을 경기도내 29개 市 $\cdot$ 郡에 적응하여 얻어진 결과는 경기도의 현행 국회의원 선거구와 비교 설명되어 진다.