DOI QR코드

DOI QR Code

A Clustering Algorithm for Handling Missing Data

손실 데이터를 처리하기 위한 집락분석 알고리즘

  • 이종찬 (청운대학교 인터넷학과)
  • Received : 2017.09.29
  • Accepted : 2017.11.20
  • Published : 2017.11.28

Abstract

In the ubiquitous environment, there has been a problem of transmitting data from various sensors at a long distance. Especially, in the process of integrating data arriving at different locations, data having different property values of data or having some loss in data had to be processed. This paper present a method to analyze such data. The core of this method is to define an objective function suitable for the problem and to develop an algorithm that can optimize this objective function. The objective function is used by modifying the OCS function. MFA (Mean Field Annealing), which was able to process only binary data, is extended to be applicable to fields with continuous values. It is called CMFA and used as an optimization algorithm.

유비쿼터스 환경에서는 다양한 센서로 부터 원거리에 데이터를 전송해야 하는 문제가 제기되어져 왔다. 특히 서로 다른 위치에서 도착한 데이터를 통합하는 과정에서 데이터의 속성 값들이 상이하거나 데이터에 일부 손실이 있는 데이터들도 처리해야 하는 어려운 문제를 가지고 있었다. 본 논문은 이와 같은 데이터들을 대상으로 집락분석 하는 방법을 제시한다. 이 방법의 핵심은 문제에 적합한 목적함수를 정의하고, 이 목적함수를 최적화 할 수 있는 알고리즘을 개발하는데 있다. 목적함수는 OCS 목적함수를 변형하여 사용한다. 이진 값을 가지는 데이터만을 처리할 수 있었던 MFA(Mean Field Annealing)을 연속 값을 가지는 분야에도 적용할 수 있도록 확장한다. 그리고 이를 CMFA이라 명하고 최적화 알고리즘으로 사용한다.

Keywords

References

  1. J. C. Bezdek, "Numerical Taxonomy with Fuzzy Sets", J. of Mathematical Biology, 1, pp. 57-71, 1974. https://doi.org/10.1007/BF02339490
  2. J. C. Bezdek, "A Physical Interpresetation of Fuzzy ISODATA", IEEE Trans. on SMC, pp. 387-389, 1974
  3. A. D. Gorden, Classification. Chapman and Hall, New York, 1981.
  4. J. A. Hartigan, Clustering algorithm, Wiley, New York, 1975.
  5. J. C. Bezdek, J. C. Dunn, "Optimal Fuzzy Partitions: A Heuristic for Estimation the Parameters in a Mixture of Normal Distributions", IEEE Trans. On Computers. pp. 835-838, 1975.
  6. R. J. Hathaway, J. C. Bezdek, "Fuzzy c-Means Clustering of Incomplete Data", IEEE Trans. On Systems, Man and Cybernetics Art B: Cybernetics, Vol. 31, No. 5, pp. 735-744, 2001. https://doi.org/10.1109/3477.956035
  7. P. J. Rousseeuw, "Discussion : Fuzzy Clustering at the Intersection", Technometics, Vol. 37, No. 3, pp. 283-286, 1995. https://doi.org/10.1080/00401706.1995.10484333
  8. M. B. Ferraro, P. Giordani,"Possibilistic and fuzzy clustering methods for robust analysis of non-precise data",International Journal of Approximate Reasoning, Vol. 88, pp. 23-38, 2017. https://doi.org/10.1016/j.ijar.2017.05.002
  9. A.Chaghari, M.F.Derakhshi, M.Balafar, "Fuzzy clustering based on Forest optimization algorithm",Journal of King Saud University-Computer and Information Sciences, http://dx.doi.org/10.1016/j.jksuci.2016.09.005, 2016.
  10. J. Hu, T. Li, C. Luo, H. Fujita, Y. Yang, "Incremental fuzzy cluster ensemble learning based on rough set theory", Expert Systems with Applications, Vol. 48, pp. 35-41, 2016. https://doi.org/10.1016/j.eswa.2015.11.011
  11. S. Kirkpatrick, C. Gelatt, M. Vecchi, "Optimization by Simulated Annealing", Science, Vol. 220, pp. 671-680, 1983. https://doi.org/10.1126/science.220.4598.671
  12. D. E. Van den Bout, T. K. Miller III, "A Traveling Salesman Object Function That It Works", Proc ICNN, Vol 2, pp. 299-303, 1988.
  13. C. Peterson, B. Soderburg, "A New Method for Mapping Optimization Problems onto Neural Networks", International Journal of Neural Systems, Vol. 1. No. 1, pp. 3-22, 1989. https://doi.org/10.1142/S0129065789000414
  14. D. E. van den Bout and T. Miller, "Graph partitioning using annealed neural networks", IEEE Transactions on neural networks, Vol. 1, pp. 192-203, 1990. https://doi.org/10.1109/72.80231
  15. Y. H. Kim, J. C. Lee, S. H. Lee, "A Clustering Method with an Ambiguous Class", Fifth Conference of the International Federation of Classification Societies, pp282-284, 1996.
  16. http://www.ics.uci.edu/-mlearn/MLSummary.html, UCI data summary