DOI QR코드

DOI QR Code

트위터를 이용한 질의어 관련 이슈 탐지를 위한 인접도 행렬 기반 연관 어휘 추출

Related Term Extraction with Proximity Matrix for Query Related Issue Detection using Twitter

  • 김제상 (금오공과대학교 컴퓨터공학부) ;
  • 조효근 (금오공과대학교 컴퓨터공학부) ;
  • 김동성 (금오공과대학교 컴퓨터공학부) ;
  • 김병만 (금오공과대학교 컴퓨터소프트웨어공학과) ;
  • 이현아 (금오공과대학교 컴퓨터소프트웨어공학과)
  • 투고 : 2013.08.12
  • 심사 : 2013.11.14
  • 발행 : 2014.01.31

초록

트위터와 페이스북 등의 SNS(Social Network Service)는 일반 대중의 관심사나 트렌드 등의 이슈를 탐지하기 좋은 지식원이다. 본 논문에서는 검색 질의어에 관련된 이슈나 화제를 질의어에 대한 연관 어휘로 보고, 이를 트위터에서 추출하기 위한 방법을 제안한다. 제안하는 방법에서는 질의어와 연관성이 높은 단어는 질의어와 가까운 위치에서 자주 발생한다고 가정하고, 단어 간 거리에 반비례하고 공기 빈도에 비례하는 단어 간 인접도의 합으로 단어간 연관도를 구한다. 구해진 연관도 값이 임계치를 넘는 어휘를 연관 어휘로 보고 네트워크의 형태로 관련 이슈를 제시한다. 제안한 방법에서는 네트워크의 특성을 분석하여 복합어를 손쉽게 탐지할 수 있다.

Social network services(SNS) including Twitter and Facebook are good resources to extract various issues like public interest, trend and topic. This paper proposes a method to extract query-related issues by calculating relatedness between terms in Twitter. As a term that frequently appears near query terms should be semantically related to a query, we calculate term relatedness in retrieved documents by summing proximity that is proportional to term frequency and inversely proportional to distance between words. Then terms, relatedness of which is bigger than threshold, are extracted as query-related issues, and our system shows those issues with a connected network. By analyzing single transitions in a connected network, compound words are easily obtained.

키워드

참고문헌

  1. Pum-Mo Ryu, Hyeon Jin Kim, HyunKi Kim, Sang Kyu Park, "Social Media Issue Detection & Monitoring based on Deep Language Analysis Techniques", Korea Information Science Society Review, Vol. 30, No. 6, pp. 47-58, 2012.
  2. Mario Cataldi, Luigi Di Caro, Claudio Schifanella, "Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation", Proceedings of the 10th International Workshop on Multimedia Data Mining at KDD, 2010.
  3. Michael Mathioudakis, Nick Koudas, "TwitterMonitor: trend detection over the twitter stream", Proceedings of the ACM SIGMOD International Conference on Management of data, pp. 1155-1158, 2010.
  4. Heung-Seon Oh, Yoonjung Choi, Wookhyun Shin, Yoonjae Jeong, Sung-Hyon Myaeng, "Trend Properties and a Ranking Method for Automatic Trend Analysis", Journal of KIISE: Software and Applications, Vol. 36, No. 3, pp. 236-243, 2009.
  5. Daumsoft Ltd., "http://insight.some.co.kr/searchKeyword Map.html"
  6. Han-joon Kim, Jaeyoung Chang, "Discovering News Keyword Associations Using Association Rule Mining", Journal of Institute of Webcasting, Internet and Telecommunication, Vol. 11, No. 6, pp. 63-71, 2011.
  7. Tetsuya Oishi, Shunsuke Kuramoto, Tsunenori Mine, Ryuzo Hasegawa, Hiroshi Fujita, Miyuki Moshimura, "A Method for Query Expansio Using the Related Word Extraction Algorithm", IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008.
  8. Bo Seok Jung, Yung-Keun Kwon, Seung Jin Kwak, "A Knowledge Map Based on a Keyword-Relation Network by Using a Research Paper Database in the Computer Engineering Field", KIPS Transaction:PartD, Vol. 18, No. 6, pp. 501-508, 2011. https://doi.org/10.3745/KIPSTD.2011.18D.6.501
  9. Kwang-Mo Ahn, Young-Hoon Seo, Jeong Heo, Chung-Hee Lee, Myung-Gil Jang, "Relevant Keyword Collection using Click-log", KIPS Transactions:PartB, Vol. 19, No. 2, pp 149-154, 2012. https://doi.org/10.3745/KIPSTB.2012.19B.2.149
  10. Kil-Hong Joo, Joo-Il Lee, and Won-Suk Lee, "An Associated Keywords Extraction and a Spread Clustering Methods for an Efficient Document Searching", Journal of Korean Institute of Information Technology, Vol. 9, No. 6, pp. 155-166, 2011.
  11. Seok-pal Jung, Seong-Hyeon Lim, Jin-Hyeong Jeon, Byeong Man Kim and Hyun Ah Lee, "Web Search Result Clustering using Snippets", Journal of KIISE: Database, Vol. 39, No. 5, pp. 321-331, 2012.
  12. Lucene Core 4.0 and SolrTM 4.0 Available, "http://lucene.apache.org", 12 October 2012.
  13. Twitter search system, "https://dev.twitter.com/"