• Title/Summary/Keyword: TnT Tagger

Search Result 1, Processing Time 0.013 seconds

Korean Part-Of-Speech Tagging by using Head-Tail Tokenization (Head-Tail 토큰화 기법을 이용한 한국어 품사 태깅)

  • Suh, Hyun-Jae;Kim, Jung-Min;Kang, Seung-Shik
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.17-25
    • /
    • 2022
  • Korean part-of-speech taggers decompose a compound morpheme into unit morphemes and attach part-of-speech tags. So, here is a disadvantage that part-of-speech for morphemes are over-classified in detail and complex word types are generated depending on the purpose of the taggers. When using the part-of-speech tagger for keyword extraction in deep learning based language processing, it is not required to decompose compound particles and verb-endings. In this study, the part-of-speech tagging problem is simplified by using a Head-Tail tokenization technique that divides only two types of tokens, a lexical morpheme part and a grammatical morpheme part that the problem of excessively decomposed morpheme was solved. Part-of-speech tagging was attempted with a statistical technique and a deep learning model on the Head-Tail tokenized corpus, and the accuracy of each model was evaluated. Part-of-speech tagging was implemented by TnT tagger, a statistical-based part-of-speech tagger, and Bi-LSTM tagger, a deep learning-based part-of-speech tagger. TnT tagger and Bi-LSTM tagger were trained on the Head-Tail tokenized corpus to measure the part-of-speech tagging accuracy. As a result, it showed that the Bi-LSTM tagger performs part-of-speech tagging with a high accuracy of 99.52% compared to 97.00% for the TnT tagger.