• 제목/요약/키워드: Khmer Language

검색결과 3건 처리시간 0.015초

Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence

  • Mao, Makara;Peng, Sony;Yang, Yixuan;Park, Doo-Soon
    • Journal of Information Processing Systems
    • /
    • 제18권4호
    • /
    • pp.549-561
    • /
    • 2022
  • In the Khmer writing system, the Khmer script is the official letter of Cambodia, written from left to right without a space separator; it is complicated and requires more analysis studies. Without clear standard guidelines, a space separator in the Khmer language is used inconsistently and informally to separate words in sentences. Therefore, a segmented method should be discussed with the combination of the future Khmer natural language processing (NLP) to define the appropriate rule for Khmer sentences. The critical process in NLP with the capability of extensive data language analysis necessitates applying in this scenario. One of the essential components in Khmer language processing is how to split the word into a series of sentences and count the words used in the sentences. Currently, Microsoft Word cannot count Khmer words correctly. So, this study presents a systematic library to segment Khmer phrases using the bi-directional maximal matching (BiMM) method to address these problematic constraints. In the BiMM algorithm, the paper focuses on the Bidirectional implementation of forward maximal matching (FMM) and backward maximal matching (BMM) to improve word segmentation accuracy. A digital or prefix tree of data structure algorithm, also known as a trie, enhances the segmentation accuracy procedure by finding the children of each word parent node. The accuracy of BiMM is higher than using FMM or BMM independently; moreover, the proposed approach improves dictionary structures and reduces the number of errors. The result of this study can reduce the error by 8.57% compared to FMM and BFF algorithms with 94,807 Khmer words.

Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

  • Sung, Thaileang;Hwang, Insoo
    • Journal of Information Technology Applications and Management
    • /
    • 제23권2호
    • /
    • pp.11-28
    • /
    • 2016
  • In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

해외 천문학 교육 프로그램 개발: 캄보디아 (DEVELOPMENT OF FOREIGN ASTRONOMY EDUCATION PROGRAMS : CAMBODIA)

  • 김상철;여아란;박창범;이정애;이강환;신용철;신나은;신지혜;최윤호;권순길;김태우;윤호섭;박순창;성언창;박수종
    • 천문학논총
    • /
    • 제34권2호
    • /
    • pp.17-28
    • /
    • 2019
  • The Korean Astronomical Society (KAS) Education & Public Outreach Committee has provided education services for children and school teachers in Cambodia over the past three years from 2016 to 2018. In the first year, 2016, one KAS member visited Pusat to teach astronomy to about 50 children, and in the following two years of 2017 and 2018, three and six KAS members, respectively, executed education workshops for ~ 20 (per each year) local school teachers in Sisophon. It turned out that it is desirable to include both teaching of astronomical knowledge and making experiments and observations in the education in order for the program to be more effective. Language barrier was the main obstacle in conveying concepts and knowledge, and having a good interpreter was very important. It happens that some languages, such as the Khmer of Cambodia, do not have astronomical terminologies, so that lecturers and even the education participants together are needed to communicate and create appropriate words. Handout hard-copies of the education materials (presentation files, lecture/experiment summaries, terminologies, etc.) are extremely helpful for the participants. Actual performing of assembling and using astronomical telescopes for night sky observations has been lifetime experience for some of the participants, which might promote zeal for knowledge and education. It is hoped that these education services for developing countries like Cambodia can be regularly continued in the future, and further extended to other countries such as Laos and Myanmar.