DOI QR코드

DOI QR Code

Similarity Measure Design on High Dimensional Data

  • Nipon, Theera-Umpon (Department of Electrical Engineering, Faculty of Engineering, Chiang Mai University) ;
  • Lee, Sanghyuk (Department of Electrical and Electronics Engineering, Xi'an Jiaotong-Liverpool University)
  • Received : 2013.03.12
  • Accepted : 2013.03.22
  • Published : 2013.03.31

Abstract

Designing of similarity on high dimensional data was done. Similarity measure between high dimensional data was considered by analysing neighbor information with respect to data sets. Obtained result could be applied to big data, because big data has multiple characteristics compared to simple data set. Definitely, analysis of high dimensional data could be the pre-study of big data. High dimensional data analysis was also compared with the conventional similarity. Traditional similarity measure on overlapped data was illustrated, and application to non-overlapped data was carried out. Its usefulness was proved by way of mathematical proof, and verified by calculation of similarity for artificial data example.

Keywords

References

  1. Fisher D.H., Knowledge acquisition via incremental conceptual clustering, In Machine Learning, 1987.
  2. Jain A.K. and R.C. Dubes, Algorithms for Clustering Data. Prentice-Hall, 1988.
  3. Murtagh F., A survey of recent hierarchical clustering algorithms, In the Computer Journal, 1983.
  4. Michalski R. S. and R.E. Stepp, "Learning from observation: conceptual clustering, In Machine Learning: An artificial intelligence approaches", 1983, 331-363.
  5. Friedman H.P. and J. Rubin, "On Some Invariant Criteria for Grouping Data", J. Am. Statistical Assoc., 1967, 62, 1159-1178. https://doi.org/10.1080/01621459.1967.10500923
  6. Fukunaga K., Introduction to Statistical Pattern Recognition, Academic Press, 1990.
  7. Advancing Discovery in Science and Engineering. Computing Community Consortium, Spring 2011.
  8. Advancing Personalized Education. Computing Community Consortium, Spring 2011.
  9. Smart Health and Wellbeing. Computing Community Consortium, Spring 2011.
  10. Liu Xuecheng, "Entropy, distance measure and similarity measure of fuzzy sets and their relations", Fuzzy Sets and Systems, 1992, 52, 305-318. https://doi.org/10.1016/0165-0114(92)90239-Z
  11. Lee S.H., W. Pedrycz, and Gyoyong Sohn, "Design of Similarity and Dissimilarity Measures for Fuzzy Sets on the Basis of Distance Measure", International Journal of Fuzzy Systems, 2009, 11, 67-72.
  12. Lee S.H., K.H. Ryu, G.Y. Sohn, "Study on Entropy and Similarity Measure for Fuzzy Set", IEICE Trans. Inf. & Syst., 2009, E92-D, 1783-1786. https://doi.org/10.1587/transinf.E92.D.1783
  13. Lee S.H., S. J. Kim, N. Y. Jang, "Design of Fuzzy Entropy for Non Convex Membership Function", CCIS, 2008, 15, 55-60.
  14. Cheng Y. and G. Church, "Biclustering of expression data", In Proc. of 8th international conference on intelligent system for molecular biology, 2000.