• Title/Summary/Keyword: short texts

Search Result 57, Processing Time 0.022 seconds

Use of Word Clustering to Improve Emotion Recognition from Short Text

  • Yuan, Shuai;Huang, Huan;Wu, Linjing
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.4
    • /
    • pp.103-110
    • /
    • 2016
  • Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.

Investigating Predictive Features for Authorship Verification of Arabic Tweets

  • Alqahtani, Fatimah;Dohler, Mischa
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.115-126
    • /
    • 2022
  • The goal of this research is to look into different techniques to solve the problem of authorship verification for Arabic short writings. Despite the widespread usage of Twitter among Arabs, short text research has so far focused on authorship verification in languages other than Arabic, such as English, Spanish, and Greek. To the best of the researcher's knowledge, no study has looked into the task of verifying Arabic-language Twitter texts. The impact of Stylometric and TF-IDF features of very brief texts (Arabic Twitter postings) on user verification was explored in this study. In addition, an analytical analysis was done to see how meta-data from Twitter tweets, such as time and source, can help to verify users perform better. This research is significant on the subject of cyber security in Arabic countries.

A Study on the Applicability of 2-Poisson Model for Selecting Korean Subject Words (2-포아송 모형을 이용한 한글 주제어 선정에 관한 연구)

  • 정영미;최대식
    • Journal of the Korean Society for information Management
    • /
    • v.17 no.1
    • /
    • pp.129-148
    • /
    • 2000
  • Experiments were performed on three subsets of a Korean test collection in order to determine whether 2-Poisson model's Z value is a good measure for selecting subject words from a document to be indexed. It was found that subject word selection based on the Z value was effective for only one subset with short texts, i.e., the Science and Technology subset. Correlation analyses between 2-Poisson model's Z and TF.IDF weight for the three subsets showed that the correlation was relatively high for two test subsets with short texts, i.e., the Science and Technology subset and the Newspaper subset.

  • PDF

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

  • Na Li;Cheng Li;Honglie Zhang
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.631-640
    • /
    • 2023
  • Short-text similarity calculation is one of the hot issues in natural language processing research. The conventional keyword-overlap similarity algorithms merely consider the lexical item information and neglect the effect of the word order. And some of its optimized algorithms combine the word order, but the weights are hard to be determined. In the paper, viewing the keyword-overlap similarity algorithm, the short English text similarity algorithm based on lexical chunk theory (LC-SETSA) is proposed, which introduces the lexical chunk theory existing in cognitive psychology category into the short English text similarity calculation for the first time. The lexical chunks are applied to segment short English texts, and the segmentation results demonstrate the semantic connotation and the fixed word order of the lexical chunks, and then the overlap similarity of the lexical chunks is calculated accordingly. Finally, the comparative experiments are carried out, and the experimental results prove that the proposed algorithm of the paper is feasible, stable, and effective to a large extent.

The Shifts of Power in Gender Discourse: Approaching Bao Ninh's Short Stories and Svetlana Alexievich's Unwomanly Face of War from Feminist Narratology

  • Cao, Kim Lan
    • SUVANNABHUMI
    • /
    • v.14 no.1
    • /
    • pp.133-160
    • /
    • 2022
  • This paper examines narratives of women's marginal position in Bao Ninh's Short Stories and Svetlana Alexievich's Unwomanly Face of War from a feminist narratological approach. In analyzing voices of marginalized women, direct and indirect descriptions of women's beauty and pain, and private-public narratives of women's love stories, this paper aims to identify presentations of women's real authority in the text written by a male author, Bao Ninh, and in the one by a female author. The paper argues that juxtaposing these texts reveals an overturn of the traditional conception of sexual and gender differences. Specifically, distinguishing between male/female discourse does not show powerful /nonpowerful language, but recognizes the real authority of each type of discourse based on sexual differences. The writing also illustrates that masculine language becomes powerless and deficient in the women's world; meanwhile, in writing about herself, woman establishes a type of a powerful feminine discourse, which blends both emotional, enthusiastic, and gossipy characteristics of female language and direct, rational, and strong ones of male language. Thus, the feminists' radical segregation on male/female discourses to overturn masculine authority and create a language for women at par with men has been clearly shifted when comparing the two writers' texts based on the juxtapositional model of the comparative literature.

Predicting Stock Prices Based on Online News Content and Technical Indicators by Combinatorial Analysis Using CNN and LSTM with Self-attention

  • Sang Hyung Jung;Gyo Jung Gu;Dongsung Kim;Jong Woo Kim
    • Asia pacific journal of information systems
    • /
    • v.30 no.4
    • /
    • pp.719-740
    • /
    • 2020
  • The stock market changes continuously as new information emerges, affecting the judgments of investors. Online news articles are valued as a traditional window to inform investors about various information that affects the stock market. This paper proposed new ways to utilize online news articles with technical indicators. The suggested hybrid model consists of three models. First, a self-attention-based convolutional neural network (CNN) model, considered to be better in interpreting the semantics of long texts, uses news content as inputs. Second, a self-attention-based, bi-long short-term memory (bi-LSTM) neural network model for short texts utilizes news titles as inputs. Third, a bi-LSTM model, considered to be better in analyzing context information and time-series models, uses 19 technical indicators as inputs. We used news articles from the previous day and technical indicators from the past seven days to predict the share price of the next day. An experiment was performed with Korean stock market data and news articles from 33 top companies over three years. Through this experiment, our proposed model showed better performance than previous approaches, which have mainly focused on news titles. This paper demonstrated that news titles and content should be treated in different ways for superior stock price prediction.

Understanding recurrent neural network for texts using English-Korean corpora

  • Lee, Hagyeong;Song, Jongwoo
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.313-326
    • /
    • 2020
  • Deep Learning is the most important key to the development of Artificial Intelligence (AI). There are several distinguishable architectures of neural networks such as MLP, CNN, and RNN. Among them, we try to understand one of the main architectures called Recurrent Neural Network (RNN) that differs from other networks in handling sequential data, including time series and texts. As one of the main tasks recently in Natural Language Processing (NLP), we consider Neural Machine Translation (NMT) using RNNs. We also summarize fundamental structures of the recurrent networks, and some topics of representing natural words to reasonable numeric vectors. We organize topics to understand estimation procedures from representing input source sequences to predict target translated sequences. In addition, we apply multiple translation models with Gated Recurrent Unites (GRUs) in Keras on English-Korean sentences that contain about 26,000 pairwise sequences in total from two different corpora, colloquialism and news. We verified some crucial factors that influence the quality of training. We found that loss decreases with more recurrent dimensions and using bidirectional RNN in the encoder when dealing with short sequences. We also computed BLEU scores which are the main measures of the translation performance, and compared them with the score from Google Translate using the same test sentences. We sum up some difficulties when training a proper translation model as well as dealing with Korean language. The use of Keras in Python for overall tasks from processing raw texts to evaluating the translation model also allows us to include some useful functions and vocabulary libraries as well.

The Unfulfilled Journey of a Flâneur: Reading "The Man of the Crowd" through the Eyes of the City (미완의 만보자 -도시의 시선으로 『군중의 남자』 읽기)

  • Nam, Soo-Young
    • Journal of English Language & Literature
    • /
    • v.56 no.4
    • /
    • pp.617-635
    • /
    • 2010
  • This paper argues that what Edgar Allan Poe pursues in ;The Man of the Crowd" (1840) is not a story that can be told but an active reading that must be mediated. It is not only because the subject of the pursuit, the secret of the flaneur, remains veiled until the end, but also because the story proves itself to be a reading of various kinds of other texts: that is, the contemporary urban texts as well as the city itself. Although the 'man of the crowd' and his double (i.e. the narrator) embrace the figure of a modern flaneur, it is highly questionable whether the image of flaneur in the story fully qualifies itself as that of an ideal stroller, who can represent the free spirit of a detached collector. Rather, the narrator's flaneur reflects a panoptic perspective, systematically hierarchizing the constituents of the city. Still, it should be noted that ;The Man of the Crowd" raises questions about the idea of creation and appropriation, observation and originality, and reading and storytelling by ascertaining the impossibility of reading and through assimilating to the contemporary texts not without subtle acknowledgement. In short, this novella tries a new way of storytelling, of which meaning is not to be found in creation but to be mediated in modern experiences.

Analysis of the Pedagogical Perspectives Represented in the Movie Dangerous Minds: Based on the Constructivist Framework

  • Jeong, Kyeong-Ouk
    • International Journal of Contents
    • /
    • v.9 no.4
    • /
    • pp.45-51
    • /
    • 2013
  • The purpose of this paper is to analyze educational theories and practices represented in the movie Dangerous Minds. This paper begins by giving the overview of the movie. Then this paper makes an analysis of the pedagogical methods and practices used by the teacher in the movie, which can encourage students to fulfill their academic success and social mobility. The lives of students at risk are transformed through the teacher's beliefs and pedagogical practices based on the constructivism, leading students on a path of selfdiscovery and self-empowerment. What is imperative in the students' lives here is their intrinsic motivation and self-efficacy toward the self and their educational system. By providing constructivist pedagogical paradigms and viewing these media texts within the context of an urban school, this paper intends to introduce educational theories and methods which can create better educational environment for students. In short, this study explores teaching theories and methods represented in the movie based on the constructivist perspectives, which are supposed to fully cultivate the potential of students.

Generation of Natural Referring Expressions by Syntactic Information and Cost-based Centering Model (구문 정보와 비용기반 중심화 이론에 기반한 자연스러운 지시어 생성)

  • Roh Ji-Eun;Lee Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.12
    • /
    • pp.1649-1659
    • /
    • 2004
  • Text Generation is a process of generating comprehensible texts in human languages from some underlying non-linguistic representation of information. Among several sub-processes for text generation to generate coherent texts, this paper concerns referring expression generation which produces different types of expressions to refer to previously-mentioned things in a discourse. Specifically, we focus on pronominalization by zero pronouns which frequently occur in Korean. To build a generation model of referring expressions for Korean, several features are identified based on grammatical information and cost-based centering model, which are applied to various machine learning techniques. We demonstrate that our proposed features are well defined to explain pronominalization, especially pronominalization by zero pronouns in Korean, through 95 texts from three genres - Descriptive texts, News, and Short Aesop's Fables. We also show that our model significantly outperforms previous ones with a 99.9% confidence level by a T-test.