DOI QR코드

DOI QR Code

Clustering Algorithm using the DFP-Tree based on the MapReduce

맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘

  • Seo, Young-Won (Department of IT Convergence and Application Engineering, Pukyong National University) ;
  • Kim, Chang-soo (Department of IT Convergence and Application Engineering, Pukyong National University)
  • Received : 2015.09.10
  • Accepted : 2015.11.10
  • Published : 2015.12.31

Abstract

As BigData is issued, many applications that operate based on the results of data analysis have been developed, typically applications are products recommend service of e-commerce application service system, search service on the search engine service and friend list recommend system of social network service. In this paper, we suggests a decision frequent pattern tree that is combined the origin frequent pattern tree that is mining similar pattern to appear in the data set of the existing data mining techniques and decision tree based on the theory of computer science. The decision frequent pattern tree algorithm improves about problem of frequent pattern tree that have to make some a lot's pattern so it is to hard to analyze about data. We also proposes to model for a Mapredue framework that is a programming model to help to operate in distributed environment.

빅 데이터가 이슈화됨에 따라 데이터 분석의 결과를 기반으로 동작하는 많은 응용들이연구되고 왔고, 대표적인 응용들은 전자상거래 시스템의 상품 추천 서비스, 검색 엔진에서의 검색 서비스, 소셜 네트워크 서비스에서의 친구 추천 서비스 등이 있다. 본 논문은 기존의 데이터 마이닝 기법 중 데이터 집합에서 나타나는 유사한 패턴들을 마이닝하는 빈발 패턴 트리와 컴퓨터 과학의 이론에 기초한 결정트리를 결합하여 결정 빈발 트리 알고리즘을 제안한다. 이는 기존의 빈발 패턴 트리 알고리즘은 패튼 트리에서 패턴 생성에 대한 정확성은 보장되나 소셜 데이터처럼 다양한 패턴이 나타는 데이터에 대해서는 많은 수의 패턴들을 생성시켜 분석에 대한 어려움이 있어, 서브트리들과의 수렴 여부를 판단하는 모델로 변형시켜 문제를 개선한다. 또한 맵리듀스로 모델링하여 분산처리를 통한 고속 처리 알고리즘을 제시한다.

Keywords

References

  1. Y.Lim "IT's Evolution Scenario as Machine Learning" http://www.zdnet.co.kr/news/news_view.asp?artice_id=20141212161631
  2. A.Das M.Datar and A.Garg "Google News Personalization: Scalable Online Collaborative Filtering" University of Illinois at Urbana Champaign http://www2007.org/papers/paper570.pdf
  3. S.Kim "A Accuracy of Deepspace's Picture Taging System are 97%" http://biz.chosun.com/site/data/html_dir/2014/03/21/2014032103146.html
  4. Wikipedia "Bigdata" ttps://en.wikipedia.org/wiki/Big_data
  5. Zoubin Ghahramani "Unsupervised Learning" "http://mlg.eng.cam.ac.uk/zoubin/papers/ul.pdf"
  6. Wikipedia "Apriori_algorithm" https://en.wikipedia.org/wiki/Apriori_algorithm
  7. G.Lee and U.Yun, "Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining" Journal of Internet Computing and Services, Vol.16, No.2, pp.77-83, 2015 https://doi.org/10.7472/jksii.2015.16.2.77
  8. K.Lee, H,Namgoong, E.Kim, K.Lee and H.Kim "Analysis of multi-demensional interaction among SNS users" Journal of Korean Society for Internet Information, Vol.12, No.2, pp.113-121, 2011
  9. Jeffrey Dean and Sanjay Ghemawat "MapReduce: Simplified Data Processing on Large Clusters" Google,Inc.http://static.googleusercontent.com/media/research.google.com/ko//archive/mapreduce-osdi04.pdf
  10. K.Shvachko, H.Kuang, S.Radia and R.Chansler "The Hadoop Distributed File System" Yahoo! Sunnyvale, California USA IEEE 978-1-4244-7153-9 2010 http://zoo.cs.yale.edu/classes/cs422/2014fa/readings/papers/shvachko10hdfs.pdf
  11. D.Cho, K.Chung, K.Rim and J.Lee "Method of Associative Group Using FP-Tree in Personalized Recommendation System" Journal of Korea Contents Association Vol.7 No.10, pp.19-26, 2007 https://doi.org/10.5392/JKCA.2007.7.10.019
  12. B.Jeong and A.Farhan "Efficient Dynamic Weighted Frequent Pattern Mining by using a Prefix-Tree" Journal of Information Processing Systems D Vol.17-D No.4 pp.253-258 2010
  13. G.Lee, U.Yun, D.Kim, G.Ryang, J.Hwang, B.Yang and C.Jeong "Performence Evaluation and Analysis of Various Techniques on Graph Pattern Mining" Journal of Korean Society for Internet Information, Vol.16, No.1, pp.77-78, 2015
  14. E.Jeong and B.Lee "A strategy of emotional information classification for SNS using Support Vector Machine" ournal of Korean Society for Internet Information, Vol.16, No.1, pp.261-262, 2015
  15. Stanford SNAP Group http://snap.stanford.edu/
  16. Amazon Meta data represented by Stanford SNAP Group http://snap.stanford.edu/data/amazon-meta.html

Cited by

  1. Development of Supervised Machine Learning based Catalog Entry Classification and Recommendation System vol.20, pp.1, 2015, https://doi.org/10.7472/jksii.2019.20.1.57