DOI QR코드

DOI QR Code

Language-based Classification of Words using Deep Learning

딥러닝을 이용한 언어별 단어 분류 기법

  • 듀크 (한양대학교 컴퓨터 소프트웨어공학과) ;
  • 다후다 (한양대학교 컴퓨터 소프트웨어공학과) ;
  • 조인휘 (한양대학교 컴퓨터 소프트웨어공학과)
  • Published : 2021.05.12

Abstract

One of the elements of technology that has become extremely critical within the field of education today is Deep learning. It has been especially used in the area of natural language processing, with some word-representation vectors playing a critical role. However, some of the low-resource languages, such as Swahili, which is spoken in East and Central Africa, do not fall into this category. Natural Language Processing is a field of artificial intelligence where systems and computational algorithms are built that can automatically understand, analyze, manipulate, and potentially generate human language. After coming to discover that some African languages fail to have a proper representation within language processing, even going so far as to describe them as lower resource languages because of inadequate data for NLP, we decided to study the Swahili language. As it stands currently, language modeling using neural networks requires adequate data to guarantee quality word representation, which is important for natural language processing (NLP) tasks. Most African languages have no data for such processing. The main aim of this project is to recognize and focus on the classification of words in English, Swahili, and Korean with a particular emphasis on the low-resource Swahili language. Finally, we are going to create our own dataset and reprocess the data using Python Script, formulate the syllabic alphabet, and finally develop an English, Swahili, and Korean word analogy dataset.

Keywords