DOI QR코드

DOI QR Code

인접성 벡터를 이용한 트리플 지식 그래프의 임베딩 모델 개선

Improving Embedding Model for Triple Knowledge Graph Using Neighborliness Vector

  • Cho, Sae-rom (School of Electrical and Computer Engineering, University of Seoul) ;
  • Kim, Han-joon (School of Electrical and Computer Engineering, University of Seoul)
  • 투고 : 2021.07.06
  • 심사 : 2021.08.18
  • 발행 : 2021.08.31

초록

그래프 표현 학습을 위한 노드 임베딩 기법은 그래프 마이닝에서 양질의 결과를 얻는 데 중요한 역할을 한다. 지금까지 대표적인 노드 임베딩 기법은 동종 그래프를 대상으로 연구되었기에, 간선 별로 고유한 의미를 갖는 지식 그래프를 학습하는 데 어려움이 있었다. 이러한 문제를 해결하고자, 기존 Triple2Vec 기법은 지식 그래프의 노드 쌍과 간선을 하나의 노드로 갖는 트리플 그래프를 학습하여 임베딩 모델을 구축한다. 하지만 Triple2Vec 임베딩 모델은 트리플 노드 간 관련성을 단순한 척도로 산정하기 때문에 성능을 높이는데 한계를 가진다. 이에 본 논문은 Triple2Vec 임베딩 모델을 개선하기 위한 그래프 합성곱 신경망 기반의 특징 추출 기법을 제안한다. 제안 기법은 트리플 그래프의 인접성 벡터(Neighborliness Vector)를 추출하여 트리플 그래프에 대해 노드 별로 이웃한 노드 간 관계성을 학습한다. 본 논문은 DBLP, DBpedia, IMDB 데이터셋을 활용한 카테고리 분류 실험을 통해, 제안 기법을 적용한 임베딩 모델이 기존 Triple2Vec 모델보다 우수함을 입증한다.

The node embedding technique for learning graph representation plays an important role in obtaining good quality results in graph mining. Until now, representative node embedding techniques have been studied for homogeneous graphs, and thus it is difficult to learn knowledge graphs with unique meanings for each edge. To resolve this problem, the conventional Triple2Vec technique builds an embedding model by learning a triple graph having a node pair and an edge of the knowledge graph as one node. However, the Triple2 Vec embedding model has limitations in improving performance because it calculates the relationship between triple nodes as a simple measure. Therefore, this paper proposes a feature extraction technique based on a graph convolutional neural network to improve the Triple2Vec embedding model. The proposed method extracts the neighborliness vector of the triple graph and learns the relationship between neighboring nodes for each node in the triple graph. We proves that the embedding model applying the proposed method is superior to the existing Triple2Vec model through category classification experiments using DBLP, DBpedia, and IMDB datasets.

키워드

과제정보

본 연구는 2018년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업(No. NRF-2018R1D1A1A02086148)이며, 또한 과학기술정보통신부 및 정보통신기술진흥센터의 대학 ICT 연구센터지원 사업의 연구결과로 수행되었음(IITP-2021-2018-0-01417).

참고문헌

  1. Dong, X., Chawla, N. V., and Swami, A., "metapath2vec: Scalable Representation Learning for Heterogeneous Networks," In Proc. of Int. Conference on Information and Knowledge Management, pp. 135- 144, 2017.
  2. Fionda, V. and Pirro, G., "Triple2Vec: Learning Triple Embeddings fromKnowledge Graphs," AAAI, 2020.
  3. Gao, Z., Fu, G., and Ouyang, C., "edge2vec: Representation learning using edge semantics for biomedical knowledge discovery," BMC Bioinformatics, 2019.
  4. Grover, A. and Leskovec, J., "node2vec: Scalable Feature Learning for Networks," In Proc. of Int. Conference on Knowledge Discovery and Data Mining, pp. 855-864, 2016.
  5. Hastie, T., Rosset, S., Zhu, J., and Zou, H., "Multi-class AdaBoost," Statistics and Its Interface, Vol. 2, No. 3, pp. 349-360, 2009. https://doi.org/10.4310/SII.2009.v2.n3.a8
  6. Hearst, M. A., "Support Vector Machines," IEEE Intelligent Systems, Vol. 13, pp. 18-28, 1998. https://doi.org/10.1109/5254.708428
  7. Hwang, S. H. and Kim, D. H., "BERT-based Classification Model for Korean Documents," The Journal of Society for e-Business Studies, Vol. 25, No. 1, 2020.
  8. Kipf, T. and Welling, M., "Semi-Supervised Classification with Graph Convolutional Networks," Proceedings of the 5th International Conference on Learning Representation, 2017.
  9. Krizhevsky, A., Sutskever, I., and Hinton, G. E., "ImageNet Classification with Deep Convolutional Neural Networks," NIPs, 2012.
  10. Lee, S.-E. and Kim, H.-J., "A New Ensemble Machine Learning Technique with Multiple Stacking," The Jounal of Society for e-Business Studies, Vol. 25, No. 3, pp. 1-13, 2020.
  11. Liaw, A. and Wiener, M., "Classification and Regression by randomForest," R News, Vol. 2/3, pp. 18-22, 2002.
  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J., "Distributed Representations of Words and Phrases and their Compositionality," NIPs, 2013.
  13. Patil, T. R. and Sherekar, S. S., "Performance analysis of Naive Bayes and J48 classification algorithm for data classification," International Journal of Computer Science and Applications, Vol. 6, No. 2, pp. 256-261, 2013.
  14. Perozzi, B., Al-Rfou, R., and Skiena, S., "Deepwalk: OnLine Learning of Social rRpresentations," In Proc. of KDD, pp. 701-710, 2014.
  15. Pirro, G., "Building relatedness explanations from knowledge graphs," ICAR-CNR, 2019.
  16. Pregibon, D., "Logistic Regression Diagnostics," The Annals of Statistics, Vol. 9, No. 4, pp. 705-724, 1981. https://doi.org/10.1214/aos/1176345513
  17. Scarsell, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G., "The Graph Neural Network Model," IEEE Transactions on Neural Networks, Vol. 20, No. 1, pp. 61-80, 2009. https://doi.org/10.1109/TNN.2008.2005605
  18. Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., and Welling, M., "Modeling Relational Data with Graph Convolutional Networks," ESWC, pp. 593-607, 2018.
  19. Song, Y.-Y. and Ying, L. U., "Decision tree methods: applications for classification and prediction," Shanghai Arch Psychiatry, Vol. 27, No. 2, pp. 130-135, 2015. https://doi.org/10.11919/j.issn.1002-0829.215044
  20. Zhang, M.-L. and Zhou, Z.-H., "ML-KNN: A lazy learning approach to multi-label learning," Pattern Recognition, Vol. 40, No. 7, pp. 2038-2048, 2007. https://doi.org/10.1016/j.patcog.2006.12.019