Two-Phase Clustering Method Considering Mobile App Trends

Heo, Jeong-Man;Park, So-Young;

doi:10.9708/jksci.2015.20.4.017

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

제20권4호
/
Pages.17-23
/
2015
/
1598-849X(pISSN)
/
2383-9945(eISSN)

한국컴퓨터정보학회 (Korean Society of Computer Information)

DOI QR Code

모바일 앱 트렌드를 고려한 2단계 군집화 방법

Two-Phase Clustering Method Considering Mobile App Trends

허정만 (상명대학교 게임학과) ;
박소영 (상명대학교 게임학과)

Heo, Jeong-Man (Dept. of Game Design & Development, SangMyung University) ;
Park, So-Young (Dept. of Game Design & Development, SangMyung University)

투고 : 2014.11.13
심사 : 2015.03.16
발행 : 2015.04.30

https://doi.org/10.9708/jksci.2015.20.4.017 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 단어 군집을 사용하여 모바일 앱을 군집화하는 방법을 제안한다. 모바일 앱 트렌드의 빠른 변화를 고려하여, 제안하는 방법은 미리 정의된 분류체계를 사용하지 않고, 모바일 앱 집합에 군집화 기술을 적용하여 의미적으로 유사한 모바일 앱을 묶는다. 짧은 모바일 앱 소개 글의 자료 부족 문제를 완화하기 위해서, 각 단어에 대해 unigram 뿐만 아니라, bigram, trigram, 단어 군집 정보를 추가적으로 확보하여 활용한다. 모바일 앱을 전체적으로 정확하게 군집화하기 위해서, 제안하는 방법은 단어 군집을 활용하여 모바일 앱 군집의 크기가 지나치게 작거나 크지 않도록 관리한다. 실험결과 제안하는 방법은 단어 군집을 활용하여 전체 정확도를 57.48%에서 79.66%로 22.18% 개선시켰다.

In this paper, we propose a mobile app clustering method using word clusters. Considering the quick change of mobile app trends, the proposed method divides the mobile apps into some semantically similar mobile apps by applying a clustering algorithm to the mobile app set, rather than the predefined category system. In order to alleviate the data sparseness problem in the short mobile app description texts, the proposed method additionally utilizes the unigram, the bigram, the trigram, the cluster of each word. For the purpose of accurately clustering mobile apps, the proposed method manages to avoid exceedingly small or large mobile app clusters by using the word clusters. Experimental results show that the proposed method improves 22.18% from 57.48% to 79.66% on overall accuracy by using the word clusters.

키워드

참고문헌

S. S. Kim, K. S. Han, B. S. Kim, S. K. Park and S. K. Ahn, "An Empirical Study on Users' Intention to Use Mobile Applications", Journal of Korean Institute of Information Technology, Vol. 9, No. 8, pp. 213-228, Aug. 2011.
J. M. Lim, J. Y. Yu, S. J. Jang, J. H. Lee and J. M. Yu, "Survey on the Internet Usage", Korea Internet & Security Agency, pp. 284, Dec. 2013.
S. Y. Park, J. Chang, and T. Kihl, "Document Classification Model using Web Documents for Balancing Training Corpus Size per Category," Journal of Information and Communication Convergence Engineering, Vol. 11, No. 4, Dec. 2013.
J. Heo, S. Y. Park, "Word Cluster-based Mobile Application Categorization", Journal of The Korea Society of Computer and Information, Vol. 19, No. 3, pp.17-24, Mar. 2014. https://doi.org/10.9708/jksci.2014.19.3.017
H. S. Lim, "Development Trends and Construction of an Automatic Document Classifier", Journal of Internet Computing and Services, Vol. 3, No. 3, pp. 48-56, Sep. 2002.
Y. Yang, J. O. Pedersenm, "A Comparative Study on Feature Selection in Text Categorization", Proc. of the International Conference in Machine Learning, pp. 412-420, July. 1997.
J. P. Moon, W. S. Lee, J. H. Chang, "A Proper Folder Recommendation Technique using Frequent Itemsets for Efficient e-mail Classification," Journal of the Korea Society of Computer and Information, Vol. 16, No. 2, pp. 33-46, Feb. 2011. https://doi.org/10.9708/jksci.2011.16.2.033
C. Apte and F. Damerau, "Automated Learning of Decision Rules for Text Categorization", ACM Trans. on Information Systems, Vol. 12, No. 3, pp. 223-251, July. 1994.
E. Weiner, J. O. Pedersenm and A. S. Weigned, "A Neural Network Approach to Topic Spotting", Proc. of the Annual Symposium on Document Analysis and Information Retrieval, pp.317-332, Apr. 1995.
T. Joachims, "Text Categorization with Support Vector Machines : Learning with many relevant features", Proc. of International Conference on Machine Learning, pp. 137-142, July. 1998.
Y. S. Hwang, J. C. Moon, S. J. Cho, "Classification of Malicious Web Pages by Using SVM," Journal of the Korea Society of Computer and Information, Vol. 17, No. 3, pp. 77-83, Mar. 2012. https://doi.org/10.9708/jksci.2012.17.3.077
D. W. Noh, S. Y. Lee and D. Y. Ra, "Developing a Text Categorization System Based on Unsupervised Learning Using an Information Retrieval Technique", Journal of KIISE : Computer Systems and Theory, Vol. 34, No. 2, pp. 160-168, Feb. 2007.
P. Liang, D. Klein, "Online EM for unsupervised models", Proc. of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 611-619, Jun. 2009.
O. Zamir, "Fast and Intuitive Clustering of Web Documents," Proc. of the International Conference on Knowledge Discovery and Data Mining, pp. 287-290, Aug. 1997.
O. Zamir and O. Etzioni, "Web Document Clustering: A Feasibility Demonstration," Proc. of ACM SIGIR, pp.46-54, Aug. 1998.
O. Zamir and O. Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results," Proc. of the International World Wide Web Conference, pp.1361-1374, May. 1999.
G. Wei, "Named Entity Recognition and An Apply on Document Clustering," MSCs thesis, Dalhousie University, Oct. 2004.
H. Toda and R. Kataoka, "A Search Result Clustering Method Using Informatively Named Entities," Proc. of ACM International workshop on WIDM, pp.81-86, Nov. 2005.
K. Y. Sung and B. H. Yun, "Topic based Web Document Clustering using Named Entities", Journal of the Korea Contents Association, Vol. 10, No. 5, pp. 29-36, May. 2010. https://doi.org/10.5392/JKCA.2010.10.5.029
D. H. Kim, K. H. Joo and J. T. Choi, "An Effective Content Clustering Method for the Large Documents", Proceedings of KIIT Summer Conference, Hanbat National University, Korea, pp. 289-297, Jun. 2006.
J. C. Shin and C. Y. Ock, "Search Results Clustering In Real-time", Korea Computer Congress 2009, Mokpo National Maritime University, Korea, pp. 474-479, Jun. 2009.
H. G. Yoon, S. Kim, and S. B. Park, "Noise Elimination in Mobile App Descriptions based on Topic Model," in Proceeding of the Conference on Human & Cognitive Language Technology, pp.64-68, Oct. 2013.
S. Z. Lee, J. I. Tsujii, and H. C. Rim, "Hidden Markov Model-based Korean Part-of-Speech Tagging Considering High Agglutinativity, Word-spacing, and Lexical Correlativity," in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 384-391, Oct. 2000.
J. A. Hartigan, and M. A. Wong, "A K-means Clustering Algorithm", Applied. Statistics, Vol. 28, No. 1, pp.100-108, Mar. 1979. https://doi.org/10.2307/2346830

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

모바일 앱 트렌드를 고려한 2단계 군집화 방법

Two-Phase Clustering Method Considering Mobile App Trends

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)