• Title/Summary/Keyword: Lexicon

Search Result 273, Processing Time 0.02 seconds

Morpheme-based Korean broadcast news transcription (형태소 기반의 한국어 방송뉴스 인식)

  • Park Young-Hee;Ahn Dong-Hoon;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.123-126
    • /
    • 2002
  • In this paper, we describe our LVCSR system for Korean broadcast news transcription. The main focus is to find the most proper morpheme-based lexical model for Korean broadcast news recognition to deal with the inflectional flexibilities in Korean. There are trade-offs between lexicon size and lexical coverage, and between the length of lexical unit and WER. In our system, we analyzed the training corpus to obtain a small 24k-morpheme-based lexicon with 98.8% coverage. Then, the lexicon is optimized by combining morphemes using statistics of training corpus under monosyllable constraint or maximum length constraint. In experiments, our system reduced the number of monosyllable morphemes from 52% to 29% of the lexicon and obtained 13.24% WER for anchor and 24.97% for reporter.

  • PDF

A Study of the Interface between Korean Sentence Parsing and Lexical Information (한국어 문장분석과 어휘정보의 연결에 관한 연구)

  • 최병진
    • Language and Information
    • /
    • v.4 no.2
    • /
    • pp.55-68
    • /
    • 2000
  • The efficiency and stability of an NLP system depends crucially on how is lexicon is orga- nized . Then lexicon ought to encode linguistic generalizations and exceptions thereof. Nowadays many computational linguists tend to construct such lexical information in an inheritance hierarchy DATR is good for this purpose In this research I will construct a DATR-lexicon in order to parse sentences in Korean using QPATR is implemented on the basis of a unification based grammar developed in Dusseldorf. In this paper I want to show the interface between a syntactic parser(QPATR) and DTAR-formalism representing lexical information. The QPATR parse can extract the lexical information from the DATR lexicon which is organised hierarchically.

  • PDF

Building n Domain-Specific French-Korean Lexicon

  • N, Aesun-Yoo
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.465-474
    • /
    • 2002
  • Korean government has adopted the French TGV as a high-speed transportation system and the first service is scheduled at the end of 2003. TGV-relevant documents are consisted of huge volumes, of which over than 76% has been translated in English. A large part of the English version is, however, incomprehensible without referring to the original French version. The goal of this paper is to demonstrate how DiET 2.5, a lexicon builder, makes it possible to build with ease domain-specific terminology lexicon that may contain multimedia and multilingual data with multi-layered logical information. We believe our wok shows an important step in enlarging the language scope and the development of electronic lexica, and in providing the flexibility of defining any type of the DTD and the interconnectivity among collaborators. As an application of DiET 2.5, we would like to build a TGV-relevant lexicon in the near future.

  • PDF

Text Mining and Sentiment Analysis for Predicting Box Office Success

  • Kim, Yoosin;Kang, Mingon;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4090-4102
    • /
    • 2018
  • After emerging online communications, text mining and sentiment analysis has been frequently applied into analyzing electronic word-of-mouth. This study aims to develop a domain-specific lexicon of sentiment analysis to predict box office success in Korea film market and validate the feasibility of the lexicon. Natural language processing, a machine learning algorithm, and a lexicon-based sentiment classification method are employed. To create a movie domain sentiment lexicon, 233,631 reviews of 147 movies with popularity ratings is collected by a XML crawling package in R program. We accomplished 81.69% accuracy in sentiment classification by the Korean sentiment dictionary including 706 negative words and 617 positive words. The result showed a stronger positive relationship with box office success and consumers' sentiment as well as a significant positive effect in the linear regression for the predicting model. In addition, it reveals emotion in the user-generated content can be a more accurate clue to predict business success.

Romanian-Lexicon-Based Sentiment Analysis for Assesing Teachers' Activity

  • Barila, Adina;Danubianu, Mirela;Gradinaru, Bogdanel
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.43-50
    • /
    • 2022
  • The students' feedback is important to measure and improve teaching performance. Many teacher performance evaluation systems are based on responses to closed question, but the free text answers can contain useful information which had to be explored. In this paper we present a lexicon-based sentiment analysis to explore students' text feedback. The data was collected from a system for the evaluation of teachers by students developed and used in our university. The students comments are in Romanian language so we built a Romanian sentiment word lexicon. We used this to categorize the feeback text as positive, negative or neutral. In addition, we added a new polarity - indifferent - in order to categorize blank and "I don't answer" responses.

A Study on the Analysis of Disaster Safety Lexicon Patterns in Social Media (소셜미디어를 통해 본 재난안전 분야 어휘 사용 양상 분석)

  • Kim, Tae-Young;Lee, Jung-Eun;Oh, Hyo-Jung
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.10
    • /
    • pp.85-93
    • /
    • 2017
  • Standardization of disaster safety lexicon is important as the most basic process for successful accident prevention and response. A lack of understanding of disaster safety lexicon leads lack of communication and information sharing, which can be a problem in communicating with appropriate responses in case of a disaster. Currently disaster and safety control agencies produce and manage heterogeneous information and they also develop and use word dictionaries individually. To solve this problem, identifying differences of disaster safety lexicon patterns by the user are essential for standardization. In this paper, we conducted lexicon patterns analysis based on social media and revealed the characteristics according to pattern types. At the result, we proposed the standardization and construction methods of disaster safety word dictionary.

Method for Spatial Sentiment Lexicon Construction using Korean Place Reviews (한국어 장소 리뷰를 이용한 공간 감성어 사전 구축 방법)

  • Lee, Young Min;Kwon, Pil;Yu, Ki Yun;Kim, Ji Young
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.2
    • /
    • pp.3-12
    • /
    • 2017
  • Leaving positive or negative comments of places where he or she visits on location-based services is being common in daily life. The sentiment analysis of place reviews written by actual visitors can provide valuable information to potential consumers, as well as business owners. To conduct sentiment analysis of a place, a spatial sentiment lexicon that can be used as a criterion is required; yet, lexicon of spatial sentiment words has not been constructed. Therefore, this study suggested a method to construct a spatial sentiment lexicon by analyzing the place review data written by Korean internet users. Among several location categories, theme parks were chosen for this study. For this purpose, natural language processing technique and statistical techniques are used. Spatial sentiment words included the lexicon have information about sentiment polarity and probability score. The spatial sentiment lexicon constructed in this study consists of 3 tables(SSLex_SS, SSLex_single, SSLex_combi) that include 219 spatial sentiment words. Throughout this study, the sentiment analysis has conducted based on the texts written about the theme parks created on Twitter. As the accuracy of the sentiment classification was calculated as 0.714, the validity of the lexicon was verified.

Wine Label Character Recognition in Mobile Phone Images using a Lexicon-Driven Post-Processing (사전기반 후처리를 이용한 모바일 폰 영상에서 와인 라벨 문자 인식)

  • Lim, Jun-Sik;Kim, Soo-Hyung;Lee, Chil-Woo;Lee, Guee-Sang;Yang, Hyung-Jung;Lee, Myung-Eun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.546-550
    • /
    • 2010
  • In this paper, we propose a method for the postprocessing of cursive script recognition in Wine Label Images. The proposed method mainly consists of three steps: combination matrix generation, character combination filtering, string matching. Firstly, the combination matrix generation step detects all possible combinations from a recognition result for each of the pieces. Secondly, the unnecessary information in the combination matrix is removed by comparing with bigram of word in the lexicon. Finally, string matching step decides the identity of result as a best matched word in the lexicon based on the levenshtein distance. An experimental result shows that the recognition accuracy is 85.8%.

An Approach for Efficient Handwritten Word Recognition Using Dynamic Programming Matching (동적 프로그래밍 정합을 이용한 효율적인 필기 단어 인식 방법)

  • 김경환
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.4
    • /
    • pp.54-64
    • /
    • 1999
  • This paper proposes an efficient handwritten English word recognition scheme which can be applied practical applications. To effectively use the lexicon which is available in most handwriting related applications, the lexicon entries are introduced in the early stage of the recognition. Dynamic programming is used for matching between over-segmented character segments and letters in the lexicon entries. Character segmentation statistics which can be obtained while the training is being performed are used to adjust the matching window size. Also, the matching results between the character segments and the letters in the lexicon entries are cached to avoid repeat of the same computation. In order to verify the effectiveness of the proposed methods, several experiments were performed using thousands of word images with various writing styles. The results show that the proposed methods significantly improve the matching speed as well as the accuracy.

  • PDF

News based Stock Market Sentiment Lexicon Acquisition Using Word2Vec (Word2Vec을 활용한 뉴스 기반 주가지수 방향성 예측용 감성 사전 구축)

  • Kim, Daye;Lee, Youngin
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.13-20
    • /
    • 2018
  • Stock market prediction has been long dream for researchers as well as the public. Forecasting ever-changing stock market, though, proved a Herculean task. This study proposes a novel stock market sentiment lexicon acquisition system that can predict the growth (or decline) of stock market index, based on economic news. For this purpose, we have collected 3-year's economic news from January 2015 to December 2017 and adopted Word2Vec model to consider the context of words. To evaluate the result, we performed sentiment analysis to collected news data with the automated constructed lexicon and compared with closings of the KOSPI (Korea Composite Stock Price Index), the South Korean stock market index based on economic news.