• Title/Summary/Keyword: perplexity

Search Result 37, Processing Time 0.023 seconds

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.

Automatic Generation of Concatenate Morphemes for Korean LVCSR (대어휘 연속음성 인식을 위한 결합형태소 자동생성)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.407-414
    • /
    • 2002
  • In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.

Part-Of-Speech Tagging using multiple sources of statistical data (이종의 통계정보를 이용한 품사 부착 기법)

  • Cho, Seh-Yeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.501-506
    • /
    • 2008
  • Statistical POS tagging is prone to error, because of the inherent limitations of statistical data, especially single source of data. Therefore it is widely agreed that the possibility of further enhancement lies in exploiting various knowledge sources. However these data sources are bound to be inconsistent to each other. This paper shows the possibility of using maximum entropy model to Korean language POS tagging. We use as the knowledge sources n-gram data and trigger pair data. We show how perplexity measure varies when two knowledge sources are combined using maximum entropy method. The experiment used a trigram model which produced 94.9% accuracy using Hidden Markov Model, and showed increase to 95.6% when combined with trigger pair data using Maximum Entropy method. This clearly shows possibility of further enhancement when various knowledge sources are developed and combined using ME method.

The phenomenological study on the psychological experience of mothers of sexually abused children (성폭력 피해 아동 어머니의 심리적 경험에 관한 현상학적 연구)

  • Yang, Sun-Wha
    • Korean Journal of Family Social Work
    • /
    • no.23
    • /
    • pp.183-224
    • /
    • 2008
  • This study is a phenomenological study on the experience of mothers of sexually abused children. The participants were 6 mothers whose children were sexually abused. The data was then analyzed according to Colaizzi's method. 253 sentences and phrases, 65 formulating meanings that were general, metaphysical, 27 theme, 13 theme clusters, 5 categories based on interview were selected. The following are the findings of study. 1. Shock : Perplexity, Confusion 2. Coping ; Settlement 3. Mental anguish: Anger, Anxiety, Suffering, Isolation 4. Conquest: Stabilization, Acceptance, Consolation, Change 5. Scar: Extant of anxiety, Concern The results show that the experience of the mother of sexually abused children changed over a period of time and they were willing to protect their children and resolve the issues within society. This research contributed to an understanding of the experience of mothers of sexually abused children, using phenomenological research method without the resource of previous study. This study suggests that social welfare services is needed for sexually abused children and their mothers.

Artist and History: Looking at the current problems of teaching art history in art school (미술가와 역사-미술사 교육의 한계와 전망)

  • Cho, Eun-Jung
    • The Journal of Art Theory & Practice
    • /
    • no.2
    • /
    • pp.49-74
    • /
    • 2004
  • It has been frequently pointed out that the established art history with the stylistic and iconographic interpretations and monographic analysis is fallen behind the currency of modern art. Among those who claimed the crisis in the discipline of art history, there is a suggestion that the art historical study should be fostered by other factors in the fields of the humanities. The so called New Art History or 'visual Culture Studies' insists that art history has to be restructured to integrate the broader study of culture and society, and by now, such an opinion is not a novelty at all. One of the most significant yet overlooked elements that induced the new currency of art history is properties of contemporary art that conflict the traditional claim of art historians. Although the idea that art is not purely aesthetic but that it has many other functions has been brought up by the art historians, it was the artists that provoked such a perception. When Arthur C. Danto and Hans Belting proclaimed the End of Art and Art History in the 1980s, the concept of art has been changed radically through the avant-garde tendency of Modernism and a new pluralism of Postmodernism. One dominant concern that strikes art historians is to find a new approach to art, since the traditional method and goal of analysis for past art and past art history seem unavailable. The perplexity arising from the situation is intensified in the field of teaching art, especially for those who teach art history in art school. Basically art history is a pursuit of learning of art in history, and its purpose is to reconcile the present with the past and the future as well. Since Modernism, as it is confusing sometimes because it implies the present state, somehow art became considered 'tradition-less'. It does not mean that a work of art stands aloof from the past attainments, hut modern art imposed itself on a task seeking after the new for its own sake, turning its back on the tradition. And now in the era of Postmodernism, an historians face the requirement to revaluate the whole history of art including modernism. The necessity of art history in art education is indisputable, but methods and contents in the academic courses should he reexamined now. Because artists' concept of history and past art has been altered, and art history as a humanistic discipline can only maintain its identity through incorporation with art itself. Academics teaching art history, or, strictly speaking, past works of art and history, to the student in art school, confront with the need to rethink the object of art history and its meaning to the artists.

  • PDF

A BERGPT-chatbot for mitigating negative emotions

  • Song, Yun-Gyeong;Jung, Kyung-Min;Lee, Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.12
    • /
    • pp.53-59
    • /
    • 2021
  • In this paper, we propose a BERGPT-chatbot, a domestic AI chatbot that can alleviate negative emotions based on text input such as 'Replika'. We made BERGPT-chatbot into a chatbot capable of mitigating negative emotions by pipelined two models, KR-BERT and KoGPT2-chatbot. We applied a creative method of giving emotions to unrefined everyday datasets through KR-BERT, and learning additional datasets through KoGPT2-chatbot. The development background of BERGPT-chatbot is as follows. Currently, the number of people with depression is increasing all over the world. This phenomenon is emerging as a more serious problem due to COVID-19, which causes people to increase long-term indoor living or limit interpersonal relationships. Overseas artificial intelligence chatbots aimed at relieving negative emotions or taking care of mental health care, have increased in use due to the pandemic. In Korea, Psychological diagnosis chatbots similar to those of overseas cases are being operated. However, as the domestic chatbot is a system that outputs a button-based answer rather than a text input-based answer, when compared to overseas chatbots, domestic chatbots remain at a low level of diagnosing human psychology. Therefore, we proposed a chatbot that helps mitigating negative emotions through BERGPT-chatbot. Finally, we compared BERGPT-chatbot and KoGPT2-chatbot through 'Perplexity', an internal evaluation metric for evaluating language models, and showed the superity of BERGPT-chatbot.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.