• Title/Summary/Keyword: lexicons

Search Result 41, Processing Time 0.022 seconds

Morphological Processing with LR Techniques (LR 테크닉을 이용한 형태소 분석)

  • 이강혁
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.2
    • /
    • pp.115-143
    • /
    • 1994
  • In this paper,I present an extended two-level model using LR parsing techniques.The LR-based two-level model not only guarantees effcient morphological processing but also achieves a higher degree of descriptive adequacy than Koskenniemi's original model.The two-level model is augmented with an independent morphosyntactic module based on feature-based CF word grammar.By adopting a CF word grammar,our model is capable of dealing with complex words with discontinuous dependencies without having duplicate lexicons.It is shown how LR predictions manifested in the parsing table can help the morphological processor to minimize the dictionary lookup process.

Lexical and Semantic Incongruities between the Lexicons of English and Korean

  • Lee, Yae-Sheik
    • Language and Information
    • /
    • v.5 no.2
    • /
    • pp.21-37
    • /
    • 2001
  • Pustejovsky (1995) rekindled debate on the dual problems of how to represent lexical meaning and on the information that is to be encoded in a lexicon. For natural language processing such as machine translation, these are important issues. When a lexical-conceptual mismatch occurs in translation of corresponding words from two different languages, the appropriate representation of their meanings is very important. This paper proposes a new formalism for representing lexical entries by first analysing observable mismatches in comparable pairs of nouns, verbs, and adjectives in English and Korean. Inherent mis-interpretations and mis-readings in each pair are identified. Then, concept theories such as those presented by Ganter and Wille (1996) and Priss (1998) are extended in order to reflect the cognitivist view that meaning resides in concept, and also to incorporate the propositions of the so-called ‘multiple inheritance’system. An alternative to the formalism of Pustejovsky (1995) and Pollard & Sag (1994) is then proposed. Finally, representative examples of lexical mismatches are analysed using the new model.

  • PDF

Rated Recall: Evaluation Method for Constructing Bilingual Lexicons (등급 재현율: 이중언어 사전 구축에 대한 평가 방법)

  • Seo, Hyeong-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.146-151
    • /
    • 2013
  • 이중언어 사전 구축 방법을 평가하는 방법에는 정확률, 재현율, MRR(Mean Reciprocal Rank) 등이 있다. 이들 방법들은 평가 집합에 있는 대역어를 정확하게 찾는 것에 초점을 맞추고 있다. 그러나 어떤 대역어가 얼마나 많이 사용되는지는 전혀 고려하지 않는다. 즉 자주 사용되는 대역어를 빨리 찾을 수 있는 방법이 좋은 방법이라고 말할 수 있다. 이와 같은 문제를 해결하기 위해서 본 논문에서는 이중언어 사전 구축의 새로운 평가 방법인 등급 재현율을 제안한다. 등급 재현율(rated recall)은 대역어가 학습 말뭉치에 나타난 정도를 반영하는 재현율이며, 자주 사용되는 대역어를 얼마나 정확하게 찾는지를 파악할 수 있는 좋은 측도이다. 본 논문에서는 문맥벡터와 중간언어를 이용한 이중언어 사전 구축 시스템의 성능을 평가하고 기존의 방법과 비교 분석하였다.

  • PDF

Study on the Standardization of Korean Distribution Terminology through its Usage Survey (유통분야 전문용어 사용실태 조사를 통한 용어 표준화 연구)

  • Han, Kyu-Chul;Lee, Sang-Youn
    • Journal of Distribution Science
    • /
    • v.13 no.4
    • /
    • pp.77-87
    • /
    • 2015
  • Purpose - This study aims to investigate the current state of distribution terminology usage by retailers and consumers nationwide, and to suggest a practical improvement plan for its standardization. The Korean distribution industry is closely related to consumers' daily lives. However, in reality, there exists a gap among producers, distributors, and consumers in terms of the definition, understanding, and perception of the terminology. Therefore, standardizing this terminology is essential for more smooth communication. This paper suggests the necessity of committing overall research and survey activities to the actual conditions of using Korean distribution terminology by organizations and their respective management situations, and further, the necessity of probing the problem and its measures in line with the objective and mission of the "Fundamental Law of the Korean Language." Research design, data, and methodology - This study's scope is limited to wholesale and retail including some information systems. First, the study covers most written material including lexicons and glossary of distribution terminology, university textbooks and teaching material for national certificate of qualification, and related laws and ordinances. Second, the survey covers retailers' management situations by store format. The retailers used as the sample for the survey include department stores, discount stores, SSM, and convenience stores. Altogether, 20 specialists were interviewed in their respective sectors or retail formats. Finally, the project team surveyed a sample of 1,300 consumers nationwide on 50 distribution terms mainly used by consumers, including those about awareness, understanding, usage, and attitude. Results - In total, 1,249 terms are drawn through literature research including distribution terminology used in the related literature, glossary and lexicons, distribution terminology in textbooks, and legal terminology. A classified table comprises four large categories including general distribution, distribution marketing, distribution information, and merchandise. The results of the three-step research including literature survey, field survey of retailers, and consumer survey were advised to be screened by academia (retail associations, faculty etc.), retailers (major retail management by store format), retail specialists and consultants, consumers, and Korean linguists. In total, 1,300 questionnaires for 50 terms of the distribution terminology closely associated with consumers were distributed to subjects nationwide. Conclusions - The desired and expected results from this study are summarized from three perspectives as follows: First, from retailers' perspective, a new concept, or coinage of new terms of the distribution industry stems from advanced countries such as America and Europe. However, the original meaning and definition are diluted and distorted with changes in the language users' situations and context. This study provides basic guidelines for standardization of distribution terms used among various retail formats in most daily life situations that consumers encounter. Second, from the nation's perspective, this study suggests optimal choices of distribution terminology in the context of laws and ordinances regarding concerned Ministries. Last, from the consumers' perspective, this paper enables consumers to understand and use distribution terms properly in their daily life.

Speech Coarticulation Database of Korean and English ($\cdot$ 영 동시조음 데이터베이스의 구축)

  • ;Stephen A. Dyer;Dwight D. Day
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.17-26
    • /
    • 1999
  • We present the first speech coarticulation database of Korean, English and Konglish/sup 3)/ named "SORIDA"/sup 4)/, which is designed to cover the maximum number of representations of coarticulation in these languages [1]. SORIDA features a compact database which is designed to contain a maximum number of triphones in a minimum number of prompts. SORIDA contains all consonantal triphones and vowel allophones in 682 Korean prompts of word length and in 717 English prompt words, spoken five times by speakers of balanced genders, dialects and ages. Korean prompts are synthesized lexicons which maximize their coarticulation variation disregarding any stress phenomena, while English prompts are natural words that fully reflect their stress effects with respect to the coarticulation variation. The prompts are designed differently because English phonology has stress while Korean does not. An intermediate language, Konglish has also been modeled by two Korean speakers reading 717 English prompt words. Recording was done in a controlled laboratory environment with an AKG Model C-100 microphone and a Fostex D-5 digital-audio-tape (DAT) recorder. The total recording time lasted four hours. SORIDA CD-ROM is available in one disk of 22.05 kHz sampling rate with a 16 bit sample size. SORIDA digital audio-tapes are available in four 124-minute-tapes of 48 kHz sampling rate. SORIDA′s list of phonetically-rich-words is also available in English and Korean.

  • PDF

A Method for User Sentiment Classification using Instagram Hashtags (인스타그램 해시태그를 이용한 사용자 감정 분류 방법)

  • Nam, Minji;Lee, EunJi;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.11
    • /
    • pp.1391-1399
    • /
    • 2015
  • In recent times, studies sentiment analysis are being actively conducted by implementing natural language processing technologies for analyzing subjective data such as opinions and attitudes of users expressed on the Web, blogs, and social networking services (SNSs). Conventionally, to classify the sentiments in texts, most studies determine positive/negative/neutral sentiments by assigning polarity values for sentiment vocabulary using sentiment lexicons. However, in this study, sentiments are classified based on Thayer's model, which is psychologically defined, unlike the polarity classification used in opinion mining. In this paper, as a method for classifying the sentiments, sentiment categories are proposed by extracting sentiment keywords for major sentiments by using hashtags, which are essential elements of Instagram. By applying sentiment categories to user posts, sentiments can be determined through the similarity measurement between the sentiment adjective candidates and the sentiment keywords. The test results of the proposed method show that the average accuracy rate for all the sentiment categories was 90.7%, which indicates good performance. If a sentiment classification system with a large capacity is prepared using the proposed method, then it is expected that sentiment analysis in various fields will be possible, such as for determining social phenomena through SNS.

Extended pivot-based approach for bilingual lexicon extraction

  • Seo, Hyeong-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.38 no.5
    • /
    • pp.557-565
    • /
    • 2014
  • This paper describes the extended pivot-based approach for bilingual lexicon extraction. The basic features of the approach can be described as follows: First, the approach builds context vectors between a source (or target) language and a pivot language like English, respectively. This is the same as the standard pivot-based approach which is useful for extracting bilingual lexicons between low-resource languages such as Korean-French. Second, unlike the standard pivot-based approach, the approach looks for similar context vectors in a source language. This is helpful to extract translation candidates for polysemous words as well as lets the translations be more confident. Third, the approach extracts translation candidates from target context vectors through the similarity between source and target context vectors. Based on these features, this paper describes the extended pivot-based approach and does various experiments in a language pair, Korean-French (KR-FR). We have observed that the approach is useful for extracting the most proper translation candidate as well as for a low-resource language pair.

Evaluating English Loanwords and Their Usage for Professional Translation, Focusing on News Texts

  • Bokyung Noh
    • International Journal of Advanced Culture Technology
    • /
    • v.12 no.2
    • /
    • pp.161-166
    • /
    • 2024
  • As globalization has accelerated, the use of English loanwords is increasing in South Korea. In this paper, we have analyzed news stories from four Korean quality newspapers-Chosun Ilbo, Dong-A Ilbo, KyungHyang Sinmun, and Chung-Ang Ilbo to investigate the usage of English loanwords in news texts. Thirty-eight news stories on life, politics, business and IT were collected from the four newspapers and then analyzed based on the five types of loanwords-Direct, Mixed Code Combination, Clipping and Neologism and Double Notation, partly following Lee's and Rudiger's classification. As a result, the followings were revealed: first, the use of the category Direct was overwhelming the others with 90%, indicating that English loanwords were not translated from its source language and introduced into Korean directly with little modification; second, the use of English loanwords was significantly higher in the sections of business and IT than in other sectors, implying that English loanwords function in a similar way as a lingua franca does within those fields. Furthermore, the linguistic trends can provide a basic guide for translators to make an informed decision between the use of English loanwords and its translated Korean version in English-into Korean translation.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

A Study on the Meaning and interpretation of Urban Landscape in Architecture of Robert Venturi and Aldo Rossi (로버트 벤투리와 알도 로시 건축에서 도시 경관의 의미와 해석에 관한 연구)

  • Park, Hyung-Jin;Lee, Jong-Suk;Lee, Sang-Yeon
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.2
    • /
    • pp.23-34
    • /
    • 2012
  • After the modern age, the rapid urbanizationhad a big impact on the then architecture. R. Venturi and A. Rossi are two of the leading architects, developing architecture in cities in the US and Europe respectively. This study shed light on a tangible and intangible meaning and interpretation of urban landscapes through their architectural thoughts and architectures. The followings are the physical and intangible meaning and interpretation in architectural thoughts and works of those two architects. Venturi understood that iconological landscapes at the roadside in large citiesare the nature of physical landscapes. To Venturi, the façades of buildings at the roadside are a part of signage such as traffic lights and road signs, and those façades have the meaning of symbolic systems beyond simple physical landscapes. To A. Rossi, types of buildings as physical townscapes are a key role supporting raw data of classification in architecture. And also, those types have significance of the basic data shedding light on the principles and history of cities. For intangible factors in R. Venturi's architecture, daily routine, function and use, time, a use for a building and others form complex architecture. And also, those factors describe shared values of the same period as the façades of buildings and complex symbols and formative lexicons in metaphorical terms. For A. Rossi's intangible factors, 'collective memory' is buried in inhabitants of the city, and with that, the city is a place for memory to its inhabitants. What is more, cities' monuments have intangible landscapes like 'sustainability', 'permanence' and so on. With lots of events happening throughoutcities, those monuments are the whole images of cities giving the value to the urban buildings that reside in cities. Finally, R. Venturi's all-encompassing complex architecture concept was extended on a tangible and intangible point of townscapes. It was found that A. Rossi's tangible thought was formed from the whole landscape of historic cities in then Italy as the background of time and place. Also, With types of urban buildings and 'collective memory', A. Rossi drew architectural norms and formats of unchangeable types.

  • PDF