DOI QR코드

DOI QR Code

Analysis of Term Ambiguity based on Genetic Algorithm

유전자 알고리즘 기반 용어 중의성 분석

  • 김정준 (한국산업기술대학교 컴퓨터공학과) ;
  • 정성택 (한국산업기술대학교 컴퓨터공학과) ;
  • 박정민 (한국산업기술대학교 컴퓨터공학과)
  • Received : 2017.08.11
  • Accepted : 2017.10.13
  • Published : 2017.10.31

Abstract

Recently, with the development of Internet media, many document materials have become exponentially increasing on the web. These materials are described, and the information on what is the most by this text are classified according. However, the text has meant that many have room for ambiguous interpretation must look at it from various angles in order to interpret them correctly. In conventional classification methods it was simply a classification only have the appearance of the text. In this paper, we analyze it in terms genetic algorithm and local preserving based techniques and implemented a clustering system fragmentation them. Finally, the performance of this paper was evaluated based on the implementation results compared to traditional methods.

최근 인터넷 미디어의 발달로 웹상에 수많은 문서자료들이 기하급수적으로 늘어나게 되었다. 이러한 자료들은 대부분 텍스트에 의해 그 내용이 무엇인지를 설명하고 있고 이에 따라 분류된다. 그러나 텍스트가 가지는 의미는 모호하게 해석되어질 여지가 많고 이를 정확히 해석하기 위해서는 다각도로 이를 살펴봐야 한다. 기존의 분류 방법에서는 단순히 텍스트의 출현만을 가지고 분류를 하였다. 따라서, 본 논문에서는 이를 유전자 알고리즘과 토픽추출을 기반으로 하여 용어 중의성을 분석하고 이를 단편화한 클러스터링 시스템을 구현하였다. 마지막으로 구현된 결과물을 토대로 기존의 방법과 비교하여 본 논문의 성능을 평가하였다.

Keywords

References

  1. Chang, J. Y., "Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia", Journal of IIBC, 2014, Vol.14, No.5, pp.173-178.
  2. Deng C, Xiaofei H, Jiwei H., "Document Clustering Using Locality Preserving Indexing," Jounal of IEEE Transaction on Knowledge and Engineering, 2005, Vol.17, No.12, pp.1624-1637. https://doi.org/10.1109/TKDE.2005.198
  3. Landeuer, T.K., Foltz, P.W., and Laham, D., "Introduction to Latent Semantic Analysis," Journal of Discourse Processes, 1998, Vol.25, No.2-3, pp.259-284. https://doi.org/10.1080/01638539809545028
  4. Salton G., Wong A., Yang C.S., "A Vector Space Model for Automatic Indexing," Journal of Communications of the ACM, 1975 Vol.18, No.11, pp.613-620, https://doi.org/10.1145/361219.361220
  5. Taiping Z., Yuan Y.T., Bin F., Yong X., "Document Clustering in Correlation Similarity Measure Space," Journal of IEEE Transaction on Knowledge and Data Engineereing, 2012, Vol.24, No.6, pp.391-407.
  6. Teng, G., Xia, Y., Camria, E., Jin, P., and Zheng, T.F., "Document representation with statistical word senses in cross-lingual document clustering," Journal of Pattern Recognition and Artificial Intelligence, Vol.29, No.2, 2015, 1559003(26pages). https://doi.org/10.1142/S021800141559003X
  7. Uysal, A.K., and Gunal, S., "Text Classification Using Genetic Algorithm Oriented Latent Semantic Features," Journal of Expert Systems with Applications, Vol.41, No.13, 2014, pp.5938-5947. https://doi.org/10.1016/j.eswa.2014.03.041
  8. Park D., C., Ronnel R. Atole "A Novel Multi-focus Image Fusion Scheme using Nested Genetic Algorithms with 'Gifted Genes'," Journal of IIBC, 2009, Vol. 9, No. 1, pp.75-87.
  9. Im S., J & Hwang H., J., "Design and Development of Simulation Framework for Processing Window Query in Wireless Spatial Data Broadcasting Environment," Journal of IIBC, 2014 Vol. 14, No. 5, pp.173-178.