• Title/Summary/Keyword: Linguistic adaptation

Search Result 23, Processing Time 0.021 seconds

KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank (KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법)

  • Kim, Hyun-Joong;Cho, Sungzoon;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.18-33
    • /
    • 2014
  • A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.

Exploring the Study Experiences of Southeast Asian Students at a Korean University in Seoul (서울 A대학 동남아시아 유학생의 학업 경험에 대한 탐색적 연구)

  • KIM, Jeehun
    • The Southeast Asian review
    • /
    • v.23 no.3
    • /
    • pp.135-179
    • /
    • 2013
  • This study explores the study experiences of Southeast Asian students at a reputable Korean private university in Seoul. In particular, this study focuses on difficulties and coping strategies of both non-native speaker of English and native-speakers of English who are working for their undergraduate or postgraduate degrees. Interviews of fourteen students from five Southeast Asian countries were collected and analyzed by NVivo 9. Thematic analysis result shows that many students, particularly non-native speakers of English, had much more difficulties than their counterparts, in contemporary Korean university context, where internationalization indices-driven strategies including expanding courses conducted in English language. Also, this study observes and documents contrasting patterns of different degree of difficulties experienced by students, depending on their degree levels and majors. Undergraduate students in science and engineering majors had the greatest degree of difficulties among all. In contrast, their graduate counterparts seem to have less difficulties. This might be related to the fact that graduate students in science and engineering majors are mostly working with their peers in their own labs, which provides institutional support. Coping strategies of students show that international students, facing unfavorable or unfriendly treatments by their Korean peers, developed innovative strategies, including using the internet technology to catch up with the classes that they could not fully understand. As a whole, adaptation process of international students do not seem to be passive or one-way. This study also provides policy implications for international students, particularly, who can be categorized as linguistic and ethnic minorities.

End-to-end speech recognition models using limited training data (제한된 학습 데이터를 사용하는 End-to-End 음성 인식 모델)

  • Kim, June-Woo;Jung, Ho-Young
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.63-71
    • /
    • 2020
  • Speech recognition is one of the areas actively commercialized using deep learning and machine learning techniques. However, the majority of speech recognition systems on the market are developed on data with limited diversity of speakers and tend to perform well on typical adult speakers only. This is because most of the speech recognition models are generally learned using a speech database obtained from adult males and females. This tends to cause problems in recognizing the speech of the elderly, children and people with dialects well. To solve these problems, it may be necessary to retain big database or to collect a data for applying a speaker adaptation. However, this paper proposes that a new end-to-end speech recognition method consists of an acoustic augmented recurrent encoder and a transformer decoder with linguistic prediction. The proposed method can bring about the reliable performance of acoustic and language models in limited data conditions. The proposed method was evaluated to recognize Korean elderly and children speech with limited amount of training data and showed the better performance compared of a conventional method.