DOI QR코드

DOI QR Code

ACCELERATION OF MACHINE LEARNING ALGORITHMS BY TCHEBYCHEV ITERATION TECHNIQUE

  • LEVIN, MIKHAIL P. (INSTITUTE OF SYSTEM PROGRAMMING OF RUSSIAN ACADEMY OF SCIENCES)
  • Received : 2017.05.11
  • Accepted : 2017.12.18
  • Published : 2018.03.25

Abstract

Recently Machine Learning algorithms are widely used to process Big Data in various applications and a lot of these applications are executed in run time. Therefore the speed of Machine Learning algorithms is a critical issue in these applications. However the most of modern iteration Machine Learning algorithms use a successive iteration technique well-known in Numerical Linear Algebra. But this technique has a very low convergence, needs a lot of iterations to get solution of considering problems and therefore a lot of time for processing even on modern multi-core computers and clusters. Tchebychev iteration technique is well-known in Numerical Linear Algebra as an attractive candidate to decrease the number of iterations in Machine Learning iteration algorithms and also to decrease the running time of these algorithms those is very important especially in run time applications. In this paper we consider the usage of Tchebychev iterations for acceleration of well-known K-Means and SVM (Support Vector Machine) clustering algorithms in Machine Leaning. Some examples of usage of our approach on modern multi-core computers under Apache Spark framework will be considered and discussed.

Keywords

References

  1. Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. MacLachlan Angus Ng, Bing Liu, PhilipS. Yu, Zhi-Hau Zhou, Michael Steinbach, David J. Hand, and Dan Steinberd, Top 10 Algorithms in Data Mining, - In: Knowledge Information Systems, 2008, 14, pp. 1-37. https://doi.org/10.1007/s10115-007-0114-2
  2. Dan Pelleg and Andrew Moore, Accelerating Exact k-means Algorithms with Geometric Reasoning, - In: KDD '99 The Fifth International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA - August 15-18, 1999; ACM New York, NY, USA, 1999, pp. 277-281.
  3. Phillip Eliasson and Niklas Rosen, Efficient K-means clustering and the importance of seeding, - In: KTH Royal Institute of Technology, School of Computer Science and Communication Reports, (http://www.csc.kth.se/utbilding/kth/kurser/DD143X/kand13/Group4Per/report/40-eliasson-rosen.pdf)
  4. Sugato Basu, Arindam Banerjee, and Raymond Mooney, Semi-supervised Clustering by Seeding, - In: Proceedings of the 19th International Conference on Machine Learning (ICML-2002), pp. 19-26, Sydney, Australia, July 2002.
  5. David Arthur, and Sergei Vassilvitskii, k-means++: The advantages of Careful Seeding, - In: Stanford InfoLab Publications, 2006.
  6. K. Karteeka Pavan, Allam Appa Rao, A. V. Dattatreya Rao, and G. R. Sridhar, Robust seed selection algorithm for k-means type algorithms - Optimal centroids using high density object, - In: International Journal of Computer Science & Information Technology (IJCSIT), vol.3, No 5, Oct 2011, pp. 148-163.
  7. Foud Khan, An inial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application, - In: Appied Soft Computing, vol. 12, issue 11, November 2012, pp. 3698-3700. https://doi.org/10.1016/j.asoc.2012.07.021
  8. G. F. Jenks, The data model concept in statistical mapping, - In: International Yearbook of Cartography 7 (1967), pp. 186-190.
  9. Yu Guan, Ali A. Ghorbani, and Nabil Bekacel, K-means+: An Autonomous Clusterin Algorithm, - In: (http://neuro.bstu.by/ai/To-dom/My research/Papers-0/For-research/D-mining/Iris-kmean/kmeans-plus.pdf), 2004.
  10. A.A.Samarskii, Introduction into Numerical Methods, Moscow, Science Publisher, 1997, pp. 234.
  11. Stephen J. Wright, Coordinate descent algorithms, - In: Mathematical Programming 151 (1): pp. 3-34, 2015. arXiv1502.04759. https://doi.org/10.1007/s10107-015-0892-3
  12. http://www.cyberforum.ru/matlab/tread1330639.html
  13. B.T.Polyak, Some methods of speeding up the convergence of iteration methods, - In: USSR Computational Mathematics and Mathematical Physics, vol.4, issue 5, 1964, pp. 1-17. https://doi.org/10.1016/0041-5553(64)90137-5
  14. V.N.Vapnik, and A.Ya.Chervonenkis, Theory of pattern recognition: Statistical problems of learning, Moscow, Science Publisher, 1974, pp. 416.
  15. W.N.Wapnik, A.J.Tscherwonenkis, Theorie der Zeichenerkennung, Berlin, Akademie-Verlag, 1979, pp. 343.
  16. John C. Platt, Using Analytic QP and Sparseness to Speed Training of Support Vector Machines, - In: Advances in Neural Information Processing Systems 11, 1999, pp. 557-563.