• Title/Summary/Keyword: Linguistic Text Analysis

Search Result 76, Processing Time 0.023 seconds

A Hybrid Approach for the Morpho-Lexical Disambiguation of Arabic

  • Bousmaha, Kheira Zineb;Rahmouni, Mustapha Kamel;Kouninef, Belkacem;Hadrich, Lamia Belguith
    • Journal of Information Processing Systems
    • /
    • v.12 no.3
    • /
    • pp.358-380
    • /
    • 2016
  • In order to considerably reduce the ambiguity rate, we propose in this article a disambiguation approach that is based on the selection of the right diacritics at different analysis levels. This hybrid approach combines a linguistic approach with a multi-criteria decision one and could be considered as an alternative choice to solve the morpho-lexical ambiguity problem regardless of the diacritics rate of the processed text. As to its evaluation, we tried the disambiguation on the online Alkhalil morphological analyzer (the proposed approach can be used on any morphological analyzer of the Arabic language) and obtained encouraging results with an F-measure of more than 80%.

A Structural Analysis of Dictionary Text for the Construction of Lexical Data Base (어휘정보구축을 위한 사전텍스트의 구조분석 및 변환)

  • 최병진
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.33-55
    • /
    • 2002
  • This research aims at transforming the definition tort of an English-English-Korean Dictionary (EEKD) which is encoded in EST files for the purpose of publishing into a structured format for Lexical Data Base (LDB). The construction of LDB is very time-consuming and expensive work. In order to save time and efforts in building new lexical information, the present study tries to extract useful linguistic information from an existing printed dictionary. In this paper, the process of extraction and structuring of lexical information from a printed dictionary (EEKD) as a lexical resource is described. The extracted information is represented in XML format, which can be transformed into another representation for different application requirements.

  • PDF

Comparative Study of Sentiment Analysis Model based on Korean Linguistic Characteristics (한국어 언어학적 특성 기반 감성분석 모델 비교 분석)

  • Kim, Gyeong-Min;Park, Chanjun;Jo, Jaechoon;Lim, Heui-Seok
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.149-152
    • /
    • 2019
  • 감성분석이란 입력된 텍스트의 감성을 분류하는 자연어처리의 한 분야로, 최근 CNN, RNN, Transformer등의 딥러닝 기법을 적용한 다양한 연구가 있다. 한국어 감성분석을 진행하기 위해서는 형태소, 음절 등의 추가 자질을 활용하는 것이 효과적이며 성능 향상을 기대할 수 있는 방법이다. 모델 생성에 있어서 아키텍쳐 구성도 중요하지만 문맥에 따른 언어를 컴퓨터가 표현할 수 있는 지식 표현 체계 구성도 상당히 중요하다. 이러한 맥락에서 BERT모델은 문맥을 완전한 양방향으로 이해할 수있는 Language Representation 기반 모델이다. 본 논문에서는 최근 CNN, RNN이 융합된 모델과 Transformer 기반의 한국어 KoBERT 모델에 대해 감성분석 task에서 다양한 성능비교를 진행했다. 성능분석 결과 어절단위 한국어 KoBERT모델에서 90.50%의 성능을 보여주었다.

  • PDF

Detection of Protein Subcellular Localization based on Syntactic Dependency Paths (구문 의존 경로에 기반한 단백질의 세포 내 위치 인식)

  • Kim, Mi-Young
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.375-382
    • /
    • 2008
  • A protein's subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and WordNet thesaurus. In the first step, we constructed syntactic dependency paths from each protein to its location candidate, and then converted the syntactic dependency paths into dependency trees. In the second step, we retrieved root information of the syntactic dependency trees. In the final step, we extracted syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extracted syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees from the training data, we extracted (protein, localization) pairs from the test sentences. Even with no biomolecular knowledge, our method showed reasonable performance in experimental results using Medline abstract data. Our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12-25%.

Social Roles of Child Sexual Crime Faction Films: Text Mining Analysis of Audiences' Emotional Reactions (아동·청소년 대상 성범죄 팩션영화의 사회적 역할 탐색: 텍스트 마이닝 기법을 활용한 수용자 감정반응 분석)

  • Kim, Ho-Kyung;Kwon, Ki-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.6
    • /
    • pp.662-672
    • /
    • 2017
  • Child sexual crimes have increased, but there has been no effective plan to combat this. Films reporting problems, amplify the attentions and propose countermeasures, which leads to changes. The current study examined the audiences' reactions to child sexual crime faction films using text-mining. The analysis of Naver's 2,727 blogs showed realistic words while 3,000 review comments' analysis demonstrated emotional responses. The positive and negative emotional category and degree were also different. In , the higher degree of negative emotions, such as 'angry' and 'unpleasant' appeared frequently. In , only negative emotional worlds were used. On the other hand, 'sad' was the highest ranked word, and the negative level was weak. In , 'good' a positive emotional word solely emerged. The audiences perceived the accidents objectively before release while they expressed their emotions and feelings after watching the movies. caused explosive anger and organized the participating citizens for changes. This movie provided an opportunity to enforce a legislative bill intensifying heavy punishments. The present study is significant in scrutinizing the audiences' diverse emotional reactions and discusses the future direction of society prosecution movies. Based on the text analysis of the audiences' linguistic expressions, a future study will be needed to hierarchically classify the diverse emotional expressions.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Effective Text Question Analysis for Goal-oriented Dialogue (목적 지향 대화를 위한 효율적 질의 의도 분석에 관한 연구)

  • Kim, Hakdong;Go, Myunghyun;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.48-57
    • /
    • 2019
  • The purpose of this study is to understand the intention of the inquirer from the single text type question in Goal-oriented dialogue. Goal-Oriented Dialogue system means a dialogue system that satisfies the user's specific needs via text or voice. The intention analysis process is a step of analysing the user's intention of inquiry prior to the answer generation, and has a great influence on the performance of the entire Goal-Oriented Dialogue system. The proposed model was used for a daily chemical products domain and Korean text data related to the domain was used. The analysis is divided into a speech-act which means independent on a specific field concept-sequence and which means depend on a specific field. We propose a classification method using the word embedding model and the CNN as a method for analyzing speech-act and concept-sequence. The semantic information of the word is abstracted through the word embedding model, and concept-sequence and speech-act classification are performed through the CNN based on the semantic information of the abstract word.

The Effect of Expert Reviews on Consumer Product Evaluations: A Text Mining Approach (전문가 제품 후기가 소비자 제품 평가에 미치는 영향: 텍스트마이닝 분석을 중심으로)

  • Kang, Taeyoung;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.63-82
    • /
    • 2016
  • Individuals gather information online to resolve problems in their daily lives and make various decisions about the purchase of products or services. With the revolutionary development of information technology, Web 2.0 has allowed more people to easily generate and use online reviews such that the volume of information is rapidly increasing, and the usefulness and significance of analyzing the unstructured data have also increased. This paper presents an analysis on the lexical features of expert product reviews to determine their influence on consumers' purchasing decisions. The focus was on how unstructured data can be organized and used in diverse contexts through text mining. In addition, diverse lexical features of expert reviews of contents provided by a third-party review site were extracted and defined. Expert reviews are defined as evaluations by people who have expert knowledge about specific products or services in newspapers or magazines; this type of review is also called a critic review. Consumers who purchased products before the widespread use of the Internet were able to access expert reviews through newspapers or magazines; thus, they were not able to access many of them. Recently, however, major media also now provide online services so that people can more easily and affordably access expert reviews compared to the past. The reason why diverse reviews from experts in several fields are important is that there is an information asymmetry where some information is not shared among consumers and sellers. The information asymmetry can be resolved with information provided by third parties with expertise to consumers. Then, consumers can read expert reviews and make purchasing decisions by considering the abundant information on products or services. Therefore, expert reviews play an important role in consumers' purchasing decisions and the performance of companies across diverse industries. If the influence of qualitative data such as reviews or assessment after the purchase of products can be separately identified from the quantitative data resources, such as the actual quality of products or price, it is possible to identify which aspects of product reviews hamper or promote product sales. Previous studies have focused on the characteristics of the experts themselves, such as the expertise and credibility of sources regarding expert reviews; however, these studies did not suggest the influence of the linguistic features of experts' product reviews on consumers' overall evaluation. However, this study focused on experts' recommendations and evaluations to reveal the lexical features of expert reviews and whether such features influence consumers' overall evaluations and purchasing decisions. Real expert product reviews were analyzed based on the suggested methodology, and five lexical features of expert reviews were ultimately determined. Specifically, the "review depth" (i.e., degree of detail of the expert's product analysis), and "lack of assurance" (i.e., degree of confidence that the expert has in the evaluation) have statistically significant effects on consumers' product evaluations. In contrast, the "positive polarity" (i.e., the degree of positivity of an expert's evaluations) has an insignificant effect, while the "negative polarity" (i.e., the degree of negativity of an expert's evaluations) has a significant negative effect on consumers' product evaluations. Finally, the "social orientation" (i.e., the degree of how many social expressions experts include in their reviews) does not have a significant effect on consumers' product evaluations. In summary, the lexical properties of the product reviews were defined according to each relevant factor. Then, the influence of each linguistic factor of expert reviews on the consumers' final evaluations was tested. In addition, a test was performed on whether each linguistic factor influencing consumers' product evaluations differs depending on the lexical features. The results of these analyses should provide guidelines on how individuals process massive volumes of unstructured data depending on lexical features in various contexts and how companies can use this mechanism from their perspective. This paper provides several theoretical and practical contributions, such as the proposal of a new methodology and its application to real data.

Preliminary Analysis of Language Styles between South and North Korean Broadcastings (남북한 방송언어의 차이에 대한 기초 분석)

  • Lee, Chang-H.;Kim, Kyung-Il;Park, Jong-Min
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3311-3317
    • /
    • 2010
  • This study compared South and North Korean broadcasting languages to measure the language differences due to the long segregation. This study would provide fundamental database on the language uses between South and North Korea. The KLIWC analyzed the text that was selected from news clips of South and North Korean broadcasting agencies. The results showed that North Korean languages were significantly different from South in terms of affective, cognitive, and social words. In addition, North Korean broadcasting used more person pronoun and a part of speech than South Korean broadcasting. Psychological interpretations were provided based on the language differences.

An analysis of the writing tasks in high school English textbooks: Focusing on genre, rhetorical structure, task types, and authenticity (고등학교 1학년 영어교과서 쓰기활동 과업 분석: 장르, 텍스트 전개구조, 활동 유형, 진정성을 중심으로)

  • Choi, Sunhee;Yu, Ho-Jung
    • English Language & Literature Teaching
    • /
    • v.16 no.4
    • /
    • pp.267-290
    • /
    • 2010
  • The purpose of this study is to analyze the writing tasks included in the newly developed high school English textbooks in the aspects of genre, rhetorical structure, task type, and authenticity in order to find out whether these tasks could contribute to improving Korean EFL students' writing skills. A total of nine textbooks were selected for the study and every writing task in each textbook was analyzed. The results show that various types of genres were incorporated in the tasks, but very few opportunities were provided for students to acquire characteristics of specific genres. In terms of rhetorical structure of text, narration, illustration, and transaction were required most, whereas not a single writing task asked students to use classification or cause and effect. Many of the writing tasks analyzed offered linguistic and/or content support through the use of models, which displays traces of the product-based approach to teaching writing. Lastly, most of the tasks lacked authenticity represented by explicit discussion of purpose and audience. Implications for L2 writing task development and writing instruction in the Korean EFL context are discussed.

  • PDF