• 제목/요약/키워드: short texts

검색결과 57건 처리시간 0.018초

Use of Word Clustering to Improve Emotion Recognition from Short Text

  • Yuan, Shuai;Huang, Huan;Wu, Linjing
    • Journal of Computing Science and Engineering
    • /
    • 제10권4호
    • /
    • pp.103-110
    • /
    • 2016
  • Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.

Investigating Predictive Features for Authorship Verification of Arabic Tweets

  • Alqahtani, Fatimah;Dohler, Mischa
    • International Journal of Computer Science & Network Security
    • /
    • 제22권6호
    • /
    • pp.115-126
    • /
    • 2022
  • The goal of this research is to look into different techniques to solve the problem of authorship verification for Arabic short writings. Despite the widespread usage of Twitter among Arabs, short text research has so far focused on authorship verification in languages other than Arabic, such as English, Spanish, and Greek. To the best of the researcher's knowledge, no study has looked into the task of verifying Arabic-language Twitter texts. The impact of Stylometric and TF-IDF features of very brief texts (Arabic Twitter postings) on user verification was explored in this study. In addition, an analytical analysis was done to see how meta-data from Twitter tweets, such as time and source, can help to verify users perform better. This research is significant on the subject of cyber security in Arabic countries.

2-포아송 모형을 이용한 한글 주제어 선정에 관한 연구 (A Study on the Applicability of 2-Poisson Model for Selecting Korean Subject Words)

  • 정영미;최대식
    • 정보관리학회지
    • /
    • 제17권1호
    • /
    • pp.129-148
    • /
    • 2000
  • 최근 구축된 한글 실험문헌 집단을 대상으로 2-포아송 모형의 Z값의 주제어 식별력을 측정하였으며, 역문헌빈도와 2 포아송 모형간의 상관관계를 분석하였다. 이를 위해 Z와 수정 $\beta$값 및 IDF와 수정 TF.IDF 가중치를 하위 실험집단별로 각기 산출, 비교하였다. 실험 결과 Z값의 주제어 선정능력은 3개의 하위 실험집단 가운데 과학기술분야에서만 확인되었다. 2-포아송 모형의 Z값과 역문헌빈도 가중치간의 상관관계 분석에서는 전문(full text)인 여성학분야 실험집단에 비해 초록 및 신문기사와 같이 단문(short text)으로 구성된 과학기술분야 및 일반사회 분야 실험집단에서 상관관계가 더 크게 나타 났다.

  • PDF

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

  • Na Li;Cheng Li;Honglie Zhang
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.631-640
    • /
    • 2023
  • Short-text similarity calculation is one of the hot issues in natural language processing research. The conventional keyword-overlap similarity algorithms merely consider the lexical item information and neglect the effect of the word order. And some of its optimized algorithms combine the word order, but the weights are hard to be determined. In the paper, viewing the keyword-overlap similarity algorithm, the short English text similarity algorithm based on lexical chunk theory (LC-SETSA) is proposed, which introduces the lexical chunk theory existing in cognitive psychology category into the short English text similarity calculation for the first time. The lexical chunks are applied to segment short English texts, and the segmentation results demonstrate the semantic connotation and the fixed word order of the lexical chunks, and then the overlap similarity of the lexical chunks is calculated accordingly. Finally, the comparative experiments are carried out, and the experimental results prove that the proposed algorithm of the paper is feasible, stable, and effective to a large extent.

The Shifts of Power in Gender Discourse: Approaching Bao Ninh's Short Stories and Svetlana Alexievich's Unwomanly Face of War from Feminist Narratology

  • Cao, Kim Lan
    • 수완나부미
    • /
    • 제14권1호
    • /
    • pp.133-160
    • /
    • 2022
  • This paper examines narratives of women's marginal position in Bao Ninh's Short Stories and Svetlana Alexievich's Unwomanly Face of War from a feminist narratological approach. In analyzing voices of marginalized women, direct and indirect descriptions of women's beauty and pain, and private-public narratives of women's love stories, this paper aims to identify presentations of women's real authority in the text written by a male author, Bao Ninh, and in the one by a female author. The paper argues that juxtaposing these texts reveals an overturn of the traditional conception of sexual and gender differences. Specifically, distinguishing between male/female discourse does not show powerful /nonpowerful language, but recognizes the real authority of each type of discourse based on sexual differences. The writing also illustrates that masculine language becomes powerless and deficient in the women's world; meanwhile, in writing about herself, woman establishes a type of a powerful feminine discourse, which blends both emotional, enthusiastic, and gossipy characteristics of female language and direct, rational, and strong ones of male language. Thus, the feminists' radical segregation on male/female discourses to overturn masculine authority and create a language for women at par with men has been clearly shifted when comparing the two writers' texts based on the juxtapositional model of the comparative literature.

Predicting Stock Prices Based on Online News Content and Technical Indicators by Combinatorial Analysis Using CNN and LSTM with Self-attention

  • Sang Hyung Jung;Gyo Jung Gu;Dongsung Kim;Jong Woo Kim
    • Asia pacific journal of information systems
    • /
    • 제30권4호
    • /
    • pp.719-740
    • /
    • 2020
  • The stock market changes continuously as new information emerges, affecting the judgments of investors. Online news articles are valued as a traditional window to inform investors about various information that affects the stock market. This paper proposed new ways to utilize online news articles with technical indicators. The suggested hybrid model consists of three models. First, a self-attention-based convolutional neural network (CNN) model, considered to be better in interpreting the semantics of long texts, uses news content as inputs. Second, a self-attention-based, bi-long short-term memory (bi-LSTM) neural network model for short texts utilizes news titles as inputs. Third, a bi-LSTM model, considered to be better in analyzing context information and time-series models, uses 19 technical indicators as inputs. We used news articles from the previous day and technical indicators from the past seven days to predict the share price of the next day. An experiment was performed with Korean stock market data and news articles from 33 top companies over three years. Through this experiment, our proposed model showed better performance than previous approaches, which have mainly focused on news titles. This paper demonstrated that news titles and content should be treated in different ways for superior stock price prediction.

Understanding recurrent neural network for texts using English-Korean corpora

  • Lee, Hagyeong;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • 제27권3호
    • /
    • pp.313-326
    • /
    • 2020
  • Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well.

미완의 만보자 -도시의 시선으로 『군중의 남자』 읽기 (The Unfulfilled Journey of a Flâneur: Reading "The Man of the Crowd" through the Eyes of the City)

  • 남수영
    • 영어영문학
    • /
    • 제56권4호
    • /
    • pp.617-635
    • /
    • 2010
  • This paper argues that what Edgar Allan Poe pursues in ;The Man of the Crowd" (1840) is not a story that can be told but an active reading that must be mediated. It is not only because the subject of the pursuit, the secret of the flaneur, remains veiled until the end, but also because the story proves itself to be a reading of various kinds of other texts: that is, the contemporary urban texts as well as the city itself. Although the 'man of the crowd' and his double (i.e. the narrator) embrace the figure of a modern flaneur, it is highly questionable whether the image of flaneur in the story fully qualifies itself as that of an ideal stroller, who can represent the free spirit of a detached collector. Rather, the narrator's flaneur reflects a panoptic perspective, systematically hierarchizing the constituents of the city. Still, it should be noted that ;The Man of the Crowd" raises questions about the idea of creation and appropriation, observation and originality, and reading and storytelling by ascertaining the impossibility of reading and through assimilating to the contemporary texts not without subtle acknowledgement. In short, this novella tries a new way of storytelling, of which meaning is not to be found in creation but to be mediated in modern experiences.

Analysis of the Pedagogical Perspectives Represented in the Movie Dangerous Minds: Based on the Constructivist Framework

  • Jeong, Kyeong-Ouk
    • International Journal of Contents
    • /
    • 제9권4호
    • /
    • pp.45-51
    • /
    • 2013
  • The purpose of this paper is to analyze educational theories and practices represented in the movie Dangerous Minds. This paper begins by giving the overview of the movie. Then this paper makes an analysis of the pedagogical methods and practices used by the teacher in the movie, which can encourage students to fulfill their academic success and social mobility. The lives of students at risk are transformed through the teacher's beliefs and pedagogical practices based on the constructivism, leading students on a path of selfdiscovery and self-empowerment. What is imperative in the students' lives here is their intrinsic motivation and self-efficacy toward the self and their educational system. By providing constructivist pedagogical paradigms and viewing these media texts within the context of an urban school, this paper intends to introduce educational theories and methods which can create better educational environment for students. In short, this study explores teaching theories and methods represented in the movie based on the constructivist perspectives, which are supposed to fully cultivate the potential of students.

구문 정보와 비용기반 중심화 이론에 기반한 자연스러운 지시어 생성 (Generation of Natural Referring Expressions by Syntactic Information and Cost-based Centering Model)

  • 노지은;이종혁
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권12호
    • /
    • pp.1649-1659
    • /
    • 2004
  • 텍스트 생성(text generation)은 언어가 아닌 다양한 지식원으로부터 텍스트를 생성해 내는 언어 처리의 한 분야로, 여러 가지 복합적이고 단계적인 과정을 통해 이루어진다. 본 논문에서는 자연스러운 텍스트 생성을 위한 여러 과정 중, 한번 언급된 대상(entity)을 자연스럽게 지시(refer)하기 위한 지시어 생성(referring expression generation), 특히 한국어에 두드러진 영형(zero pronoun)에 의한 대용화(pronominalization) 과정에 초점을 맞춘다. 이를 위해, 구문 정보와 비용기반 중심화 이론(cost-based centering model)을 바탕으로, 한국어에 적합한 지시어 특히 영형의 생성에 영향을 미치는 다양한 자질(feature)들온 규명하고, 기계 학습을 통해 지시어 생성 모델을 구축하였다. 세 개의 장르 - 묘사문(설명문), 뉴스, 짧은 우화 - 에서 총 95개의 텍스트로부터 학습이 이루어 졌으며 이론 대상으로, 제안된 자질들이 지시어의 생성, 특히 영형의 생성에 효율적으로 적용될 수 있음을 보였다. 또한, 지시어 생성과 관련된 기존의 방법론들과 본 논문에서 제안한 모델을 비교하여 성능이 크게 향상되었음을 보이고, T-test를 통해 99.9%의 신뢰 구간에서 그 성능 향상이 통계적으로 의미가 있음을 확인하였다.