• Title/Summary/Keyword: Lexicon

Search Result 273, Processing Time 0.029 seconds

Construction of Korean Linguistic Information for the Korean Generation on KANT (Kant 시스템에서의 한국어 생성을 위한 언어 정보의 구축)

  • Yoon, Deok-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3539-3547
    • /
    • 1999
  • Korean linguistic information for the generation modulo of KANT(Knowledge-based Accurate Natural language Translation) system was constructed. As KANT has a language-independent generation engine, the construction of Korean linguistic information means the development of the Korean generation module. Constructed information includes concept-based mapping rules, category-based mapping rules, syntactic lexicon, template rules, grammar rules based on the unification grammar, lexical rules and rewriting rules for Korean. With these information in sentences were successfully and completely generated from the interlingua functional structures among the 118 test set prepared by the developers of KANT system.

  • PDF

Korean speech recognition based on grapheme (문자소 기반의 한국어 음성인식)

  • Lee, Mun-hak;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.601-606
    • /
    • 2019
  • This paper is a study on speech recognition in the Korean using grapheme unit (Cho-sumg [onset], Jung-sung [nucleus], Jong-sung [coda]). Here we make ASR (Automatic speech recognition) system without G2P (Grapheme to Phoneme) process and show that Deep learning based ASR systems can learn Korean pronunciation rules without G2P process. The proposed model is shown to reduce the word error rate in the presence of sufficient training data.

Extracting and Clustering of Story Events from a Story Corpus

  • Yu, Hye-Yeon;Cheong, Yun-Gyung;Bae, Byung-Chull
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3498-3512
    • /
    • 2021
  • This article describes how events that make up text stories can be represented and extracted. We also address the results from our simple experiment on extracting and clustering events in terms of emotions, under the assumption that different emotional events can be associated with the classified clusters. Each emotion cluster is based on Plutchik's eight basic emotion model, and the attributes of the NLTK-VADER are used for the classification criterion. While comparisons of the results with human raters show less accuracy for certain emotion types, emotion types such as joy and sadness show relatively high accuracy. The evaluation results with NRC Word Emotion Association Lexicon (aka EmoLex) show high accuracy values (more than 90% accuracy in anger, disgust, fear, and surprise), though precision and recall values are relatively low.

Cyberbullying Detection by Sentiment Analysis of Tweets' Contents Written in Arabic in Saudi Arabia Society

  • Almutairi, Amjad Rasmi;Al-Hagery, Muhammad Abdullah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.112-119
    • /
    • 2021
  • Social media has become a global means of communication in people's lives. Most people are using Twitter for communication purposes and its inappropriate use, which has negative effects on people's lives. One of the widely common misuses of Twitter is cyberbullying. As the resources of dialectal Arabic are rare, so for cyberbullying most people are using dialectal Arabic. For this reason, the ultimate goal of this study is to detect and classify cyberbullying on Twitter in the Arabic context in Saudi Arabia. To help in the detection and classification of tweets, Pointwise Mutual Information (PMI) to generate a lexicon, and Support Vector Machine (SVM) algorithms are used. The evaluation is performed on both methods in terms of the F1-score. However, the F1-score after applying the PMI is 50%, while after the SVM application on the resampling data it is 82%. The analysis of the results shows that the SVM algorithm outperforms better.

Clothing-Recommendation system based on emotion and weather information (감정과 날씨 정보에 따른 의상 추천 시스템)

  • Ugli, Sadriddinov Ilkhomjon Rovshan;Park, Doo-Soon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.528-531
    • /
    • 2021
  • Nowadays recommendation systems are so ubiquitous, where our many decisions are being done by the means of them. We can see recommendation systems in all areas of our daily life. Therefore the research of this sphere is still so active. So far many research papers were published for clothing recommendations as well. In this paper, we propose the clothing-recommendation system according to user emotion and weather information. We used social media to analyze users' 6 basic emotions according to Paul Eckman theory and match the colour of clothing. Moreover, getting weather information using visualcrossing.com API to predict the kind of clothing. For sentiment analysis, we used Emotion Lexicon that was created by using Mechanical Turk. And matching the emotion and colour was done by applying Hayashi's Quantification Method III.

Understanding a Mathematics Teacher Community through a Computational Text Analysis: Review of Changes in Mathematics Pedagogical Lexicons by Lee & Kim (2022)

  • Sunghwan Hwang;Eunhye Flavin
    • Research in Mathematical Education
    • /
    • v.26 no.1
    • /
    • pp.31-38
    • /
    • 2023
  • Mathematics educators have emphasized the importance of language use in mathematics education. However, previous studies have predominantly focused on the spoken language used in mathematics classrooms, which provides limited information on the written language used by mathematics teachers. The written language reflects the characteristics of the teacher community and social, cultural, and political contexts. Moreover, the written language affects teachers' instructional practices and their students' mathematics learning experiences. Therefore, this study aims to review a study conducted by Lee and Kim (2022) investigating changes in mathematics teachers' pedagogical lexicons.

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.1-27
    • /
    • 2017
  • It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation (자질 보강과 양방향 LSTM-CNN-CRF 기반의 한국어 개체명 인식 모델)

  • Lee, DongYub;Yu, Wonhee;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.55-62
    • /
    • 2017
  • The Named Entity Recognition system is a system that recognizes words or phrases with object names such as personal name (PS), place name (LC), and group name (OG) in the document as corresponding object names. Traditional approaches to named entity recognition include statistical-based models that learn models based on hand-crafted features. Recently, it has been proposed to construct the qualities expressing the sentence using models such as deep-learning based Recurrent Neural Networks (RNN) and long-short term memory (LSTM) to solve the problem of sequence labeling. In this research, to improve the performance of the Korean named entity recognition system, we used a hand-crafted feature, part-of-speech tagging information, and pre-built lexicon information to augment features for representing sentence. Experimental results show that the proposed method improves the performance of Korean named entity recognition system. The results of this study are presented through github for future collaborative research with researchers studying Korean Natural Language Processing (NLP) and named entity recognition system.

The Lexical Access of Regular and Irregular Korean Verbs in the Mental Lexicon (한국어 규칙 동사와 불규칙 동사의 심성 어휘집 접근 과정)

  • Park, Hee-Jin;Koo, Min-Mo;Nam, Ki-Chun
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.1
    • /
    • pp.1-23
    • /
    • 2012
  • This study investigated the lexical access processing of inflected Korean verbs in the mental lexicon. In Korean, verbs can be classified into two main types of inflections, which are regular and irregular inflections, which can be further divided into three types of regular inflections and two types of irregular inflections. A masked priming lexical decision task was used and the priming effects were compared. Experiments were carried out using the five different types of verbal inflections in Korean: (1) No change-regularity (regular verbs with no orthographical or phonological changes), (2) Phonological change-regularity (regular verbs with phonological changes to the stem only), (3) Orthographical change-regularity (regular verbs that only undergo orthographical changes), (4) Stem change-irregularity (the stem is omitted or alternated with the other phoneme of the stem in irregular verbs), (5) Ending change-irregularity (irregular verbs with changes in the endings by phoneme substitution). The first three types are regarded as regular verbal inflections whereas the latter two types are regarded as irregular verbal inflections. The infinitive forms of the verb were presented as target words and three different conditions were presented as prime words. The three conditions included regular verbal inflection, irregular verbal inflection, and a control condition in which morphologically and semantically unrelated primes were presented. In addition, different stimulus onset asynchrony (SOA) were manipulated (43ms, 72ms, 230ms) to examine the time frame of the morphological decomposition process in word recognition. The results revealed that there were significant priming effects in all three SOAs across conditions. Hence, there was no significant differences in priming effects between regular and irregular verbal inflection conditions. This may suggest that Korean verb processing does not adopt different processing routes for regular and irregular inflections, which can also be an indication of earlier morphological information processing for Korean verbs.

  • PDF

MR Findings of Papillary Neoplasms of the Breast (유두 종양의 자기공명 영상소견)

  • Jo, Yeseul;Kim, Sung Hun;Kang, Bong Joo;Choi, Byung Gil
    • Investigative Magnetic Resonance Imaging
    • /
    • v.18 no.1
    • /
    • pp.43-51
    • /
    • 2014
  • Purpose : To review MR imaging finding of papillary lesion identified as additional suspicious lesion on MR image in women with biopsy-proven breast cancer and to evaluate upgrading rates after subsequent surgical histopathological diagnosis. Materials and Methods: Among 1729 preoperative MR image of women with biopsy proven breast cancer, US-guided CNB-proven 22 papillary lesions from 21 patients, which showed additional suspicious contrast enhancement other than index cancer on MR image, were subjected to the study. Some of these lesions underwent surgery, thus the comparisons between the histopathologic results were able to be compared to the results of US-guided CNB. Also retrospective analysis was done for MR findings of these lesions by BI-RADS MRI lexicon. Results: On MR imaging, 8 mass lesions, 7 non-mass lesions, 7 focus lesions were detected. All of the focus lesion (100%, 7/7) was diagnosed as benign lesion and showed plateau and washout pattern in dynamic MR image. After excisional biopsy, one of 9 benign papilloma (11.1%), 3 of 3 papillary neoplasm with atypia component (100%), 3 of 5 papillary neoplasm (60%) were upgraded to malignancy such as ductal carcinoma in situ (DCIS), invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC). Conclusion: The MR images of papillary lesions diagnosed by US-guided CNB exhibit no significant differences between malignancy and benign lesion. Also 41.2% of the lesion (7/17) was upgraded after subsequent surgery. Thus all of the papillary lesions require excisional biopsy for definite diagnosis and the MR imaging, it's just not enough by itself.