Search | Korea Science

Use of Word Clustering to Improve Emotion Recognition from Short Text

Yuan, Shuai;Huang, Huan;Wu, Linjing
- Journal of Computing Science and Engineering
- /
- v.10 no.4
- /
- pp.103-110
- /
- 2016
Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.
https://doi.org/10.5626/JCSE.2016.10.4.103 인용 PDF KSCI

The Application way on Semiotic Structure of Knowledge Classification (지식 분류의 기호학적 체계 응용 방안)

Yoon, Jeng-Giy
- Journal of Korean Library and Information Science Society
- /
- v.43 no.2
- /
- pp.273-292
- /
- 2012
This study unpackes semiotic character of knowledge classification and wants to know how sign structure of classification effects on canon and banned book etc, and by this impact stems from the semoitic structure structurally, discusses coidentity between banned book and internet in social and cultural structure aspect. and proposes way for understanding and interpretation text like mass media using structuralism theory.
PDF KSCI

Purchase Information Extraction Model From Scanned Invoice Document Image By Classification Of Invoice Table Header Texts (인보이스 서류 영상의 테이블 헤더 문자 분류를 통한 구매 정보 추출 모델)

Shin, Hyunkyung
- Journal of Digital Convergence
- /
- v.10 no.11
- /
- pp.383-387
- /
- 2012
Development of automated document management system specified for scanned invoice images suffers from rigorous accuracy requirements for extraction of monetary data, which necessiate automatic validation on the extracted values for a generative invoice table model. Use of certain internal constraints such as "amount = unit price times quantity" is typical implementation. In this paper, we propose a noble invoice information extraction model with improved auto-validation method by utilizing table header detection and column classification.
https://doi.org/10.14400/JDPM.2012.10.11.383 인용 PDF

An Efficient Block Segmentation and Classification of a Document Image Using Edge Information (문서영상의 에지 정보를 이용한 효과적인 블록분할 및 유형분류)

박창준;전준형;최형문
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.10
- /
- pp.120-129
- /
- 1996
This paper presents an efficient block segmentation and classification using the edge information of the document image. We extract four prominent features form the edge gradient and orientaton, all of which, and thereby the block clssifications, are insensitive to the background noise and the brightness variation of of the image. Using these four features, we can efficiently classify a document image into the seven categrories of blocks of small-size letters, large-size letters, tables, equations, flow-charts, graphs, and photographs, the first five of which are text blocks which are character-recognizable, and the last two are non-character blocks. By introducing the clumn interval and text line intervals of the document in the determination of th erun length of CRLA (constrained run length algorithm), we can obtain an efficient block segmentation with reduced memory size. The simulation results show that the proposed algorithm can rigidly segment and classify the blocks of the documents into the above mentioned seven categories and classification performance is high enough for all the categories except for the graphs with too much variations.
PDF

Implementation of Annotation and Thesaurus for Remote Sensing

Chae, Gee-Ju;Yun, Young-Bo;Park, Jong-Hyun
- Proceedings of the KSRS Conference
- /
- 2003.11a
- /
- pp.222-224
- /
- 2003
Many users want to add some their own information to data which was on the web and computer without actually needing to touch data. In remote sensing, the result data for image classification consist of image and text file in general. To overcome these inconvenience problems, we suggest the annotation method using XML language. We give the efficient annotation method which can be applied to web and viewing of image classification. We can apply the annotation for web and image classification with image and text file. The need for thesaurus construction is the lack of information for remote sensing and GIS on search engine like Empas, Naver and Google. In search engine, we can’t search the information for word which has many different names simultaneously. We select the remote sensing data from different sources and make the relation between many terms. For this process, we analyze the meaning for different terms which has similar meaning.
PDF

Meme Analysis using Image Captioning Model and GPT-4

Marvin John Ignacio;Thanh Tin Nguyen;Jia Wang;Yong-Guk Kim
- Proceedings of the Korea Information Processing Society Conference
- /
- 2023.11a
- /
- pp.628-631
- /
- 2023
We present a new approach to evaluate the generated texts by Large Language Models (LLMs) for meme classification. Analyzing an image with embedded texts, i.e. meme, is challenging, even for existing state-of-the-art computer vision models. By leveraging large image-to-text models, we can extract image descriptions that can be used in other tasks, such as classification. In our methodology, we first generate image captions using BLIP-2 models. Using these captions, we use GPT-4 to evaluate the relationship between the caption and the meme text. The results show that OPT_6.7B provides a better rating than other LLMs, suggesting that the proposed method has a potential for meme classification.
https://doi.org/10.3745/PKIPS.y2023m11a.628 인용 PDF

An Improved Text Classification (향상된 텍스트 분류)

Wang, Guangxing;Shin, Seong-Yoon;Shin, Kwang-Weong;Lee, Hyun-Chang
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2019.01a
- /
- pp.125-126
- /
- 2019
In this paper, we propose an improved kNN classification method. Through improved the mothed and normalizing the data, the purpose of improving the accuracy is achieved. Then we compared the three classification algorithms and the improved algorithm by experimental data.
PDF

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.8 no.6
- /
- pp.2186-2196
- /
- 2014
Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.
https://doi.org/10.3837/tiis.2014.06.022 인용 PDF KSCI KPUBS HTML

A Study on Fine-Tuning and Transfer Learning to Construct Binary Sentiment Classification Model in Korean Text (한글 텍스트 감정 이진 분류 모델 생성을 위한 미세 조정과 전이학습에 관한 연구)

JongSoo Kim
- Journal of Korea Society of Industrial Information Systems
- /
- v.28 no.5
- /
- pp.15-30
- /
- 2023
Recently, generative models based on the Transformer architecture, such as ChatGPT, have been gaining significant attention. The Transformer architecture has been applied to various neural network models, including Google's BERT(Bidirectional Encoder Representations from Transformers) sentence generation model. In this paper, a method is proposed to create a text binary classification model for determining whether a comment on Korean movie review is positive or negative. To accomplish this, a pre-trained multilingual BERT sentence generation model is fine-tuned and transfer learned using a new Korean training dataset. To achieve this, a pre-trained BERT-Base model for multilingual sentence generation with 104 languages, 12 layers, 768 hidden, 12 attention heads, and 110M parameters is used. To change the pre-trained BERT-Base model into a text classification model, the input and output layers were fine-tuned, resulting in the creation of a new model with 178 million parameters. Using the fine-tuned model, with a maximum word count of 128, a batch size of 16, and 5 epochs, transfer learning is conducted with 10,000 training data and 5,000 testing data. A text sentiment binary classification model for Korean movie review with an accuracy of 0.9582, a loss of 0.1177, and an F1 score of 0.81 has been created. As a result of performing transfer learning with a dataset five times larger, a model with an accuracy of 0.9562, a loss of 0.1202, and an F1 score of 0.86 has been generated.
https://doi.org/10.9723/jksiis.2023.28.5.015 인용 PDF

A Study on Negation Handling and Term Weighting Schemes and Their Effects on Mood-based Text Classification (감정 기반 블로그 문서 분류를 위한 부정어 처리 및 단어 가중치 적용 기법의 효과에 대한 연구)

Jung, Yu-Chul;Choi, Yoon-Jung;Myaeng, Sung-Hyon
- Korean Journal of Cognitive Science
- /
- v.19 no.4
- /
- pp.477-497
- /
- 2008
Mood classification of blog text is an interesting problem, with a potential for a variety of services involving the Web. This paper introduces an approach to mood classification enhancements through the normalized negation n-grams which contain mood clues and corpus-specific term weighting(CSTW). We've done experiments on blog texts with two different classification methods: Enhanced Mood Flow Analysis(EMFA) and Support Vector Machine based Mood Classification(SVMMC). It proves that the normalized negation n-gram method is quite effective in dealing with negations and gave gradual improvements in mood classification with EMF A. From the selection of CSTW, we noticed that the appropriate weighting scheme is important for supporting adequate levels of mood classification performance because it outperforms the result of TF*IDF and TF.
PDF

Search Result 720, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)