• 제목/요약/키워드: Semantic feature

검색결과 259건 처리시간 0.022초

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

Research on Community Knowledge Modeling of Readers Based on Interest Labels

  • Kai, Wang;Wei, Pan;Xingzhi, Chen
    • Journal of Information Processing Systems
    • /
    • 제19권1호
    • /
    • pp.55-66
    • /
    • 2023
  • Community portraits can deeply explore the characteristics of community structures and describe the personalized knowledge needs of community users, which is of great practical significance for improving community recommendation services, as well as the accuracy of resource push. The current community portraits generally have the problems of weak perception of interest characteristics and low degree of integration of topic information. To resolve this problem, the reader community portrait method based on the thematic and timeliness characteristics of interest labels (UIT) is proposed. First, community opinion leaders are identified based on multi-feature calculations, and then the topic features of their texts are identified based on the LDA topic model. On this basis, a semantic mapping including "reader community-opinion leader-text content" was established. Second, the readers' interest similarity of the labels was dynamically updated, and two kinds of tag parameters were integrated, namely, the intensity of interest labels and the stability of interest labels. Finally, the similarity distance between the opinion leader and the topic of interest was calculated to obtain the dynamic interest set of the opinion leaders. Experimental analysis was conducted on real data from the Douban reading community. The experimental results show that the UIT has the highest average F value (0.551) compared to the state-of-the-art approaches, which indicates that the UIT has better performance in the smooth time dimension.

SIFT 기반 카피-무브 위조 검출에 대한 타켓 카운터-포렌식 기법 (A Targeted Counter-Forensics Method for SIFT-Based Copy-Move Forgery Detection)

  • ;이경현
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제3권5호
    • /
    • pp.163-172
    • /
    • 2014
  • Scale Invariant Feature Transform (SIFT)은 높은 매칭 능력과 회전이나 스케일 조정 시 안정성으로 인해 이미지 특징 매칭을 위해 많은 응용에서 사용되어지고 있으며, 이러한 특성으로 인해 카피-무브 위조 검출을 위한 핵심 알고리즘으로 각광받고 있다. 하지만 SIFT 변환은 이미지 조작의 증거를 감출 수 있는 안티포렌식의 가능성이 높음에도 불구하고 이에 대한 연구는 거의 없으므로, 본 논문에서는 의미론적으로 허용될 수 있는 왜곡을 적용하여 SIFT 기반 카피-무브 위조 검출을 방해하기 위한 타켓 카운터-포렌식 기법을 제안한다. 제안 기법은 공격자가 유사성 매칭 절차를 속일 수 있는 동시에 SIFT 키포인트의 변형을 통한 추적을 방해하여 이미지 조작의 증거를 숨길 수 있는 방안을 제공한다. 또한 제안 기법은 의미론적 제약 하에서 가공된 이미지와 원본 이미지 간의 높은 충실도를 유지하는 특성을 가진다. 한편, 다양한 조건의 테스트 이미지에 대한 실험을 통해 제안 기법의 효율성을 확인하였다.

위키피디아를 이용한 분류자질 선정에 관한 연구 (An Experimental Study on Feature Selection Using Wikipedia for Text Categorization)

  • 김용환;정영미
    • 정보관리학회지
    • /
    • 제29권2호
    • /
    • pp.155-171
    • /
    • 2012
  • 텍스트 범주화에 있어서 일반적인 문제는 문헌을 표현하는 핵심적인 용어라도 학습문헌 집합에 나타나지 않으면 이 용어는 분류자질로 선정되지 않는다는 것과 형태가 다른 동의어들은 서로 다른 자질로 사용된다는 점이다. 이 연구에서는 위키피디아를 활용하여 문헌에 나타나는 동의어들을 하나의 분류자질로 변환하고, 학습문헌 집합에 출현하지 않은 입력문헌의 용어를 가장 유사한 학습문헌의 용어로 대체함으로써 범주화 성능을 향상시키고자 하였다. 분류자질 선정 실험에서는 (1) 비학습용어 추출 시 범주 정보의 사용여부, (2) 용어의 유사도 측정 방법(위키피디아 문서의 제목과 본문, 카테고리 정보, 링크 정보), (3) 유사도 척도(단순 공기빈도, 정규화된 공기빈도) 등 세 가지 조건을 결합하여 실험을 수행하였다. 비학습용어를 유사도 임계치 이상의 최고 유사도를 갖는 학습용어로 대체하여 kNN 분류기로 분류할 경우 모든 조건 결합에서 범주화 성능이 0.35%~1.85% 향상되었다. 실험 결과 범주화 성능이 크게 향상되지는 못하였지만 위키피디아를 활용하여 분류자질을 선정하는 방법이 효과적인 것으로 확인되었다.

Unsupervised feature learning for classification

  • Abdullaev, Mamur;Alikhanov, Jumabek;Ko, Seunghyun;Jo, Geun Sik
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2016년도 제54차 하계학술대회논문집 24권2호
    • /
    • pp.51-54
    • /
    • 2016
  • In computer vision especially in image processing, it has become popular to apply deep convolutional networks for supervised learning. Convolutional networks have shown a state of the art results in classification, object recognition, detection as well as semantic segmentation. However, supervised learning has two major disadvantages. One is it requires huge amount of labeled data to get high accuracy, the second one is to train so much data takes quite a bit long time. On the other hand, unsupervised learning can handle these problems more cheaper way. In this paper we show efficient way to learn features for classification in an unsupervised way. The network trained layer-wise, used backpropagation and our network learns features from unlabeled data. Our approach shows better results on Caltech-256 and STL-10 dataset.

  • PDF

비음수 행렬 인수분해를 이용한 일반적 문서 요약 (Generic Text Summarization Using Non-negative Matrix Factorization)

  • 박선;이주홍;안찬민;박태수;김재우;김덕환
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2006년도 춘계학술발표대회
    • /
    • pp.469-472
    • /
    • 2006
  • 본 논문은 비음수 행렬 인수분해(NMF, non-negative matrix factorization)를 이용하여 문장을 추출하여 문서를 요약하는 새로운 방법을 제안하였다. 제안된 방법은 문장추출에 사용되는 의미 특징(semantic feature)이 비 음수 값을 갖기 때문에 잠재의미분석에 비해 문서의 내용을 정확하게 요약한다. 또한, 적은 계산비용을 통하여 쉽게 요약 문장을 추출할 수 있는 장점을 갖는다.

  • PDF

한국어 문장내 체언류 조응대용어의 해결방안 (A method of the the substantives anaphora resolution in korean intra-sentential)

  • 김정해;이상국;이상조
    • 전자공학회논문지B
    • /
    • 제33B권4호
    • /
    • pp.183-190
    • /
    • 1996
  • The purpose of this paper is to show that the solutions of the problem for the anaphor ocured in korean senstence, by means of one-direction activated chart parsing leaded by a head. This is the phenomenon frequently occured in the conversation of natural language and the part necessarily required in the construction of natural language processing system for the practical use. To solve the problem of anaphor in the korean language, we have computerized definition and the management conditions necessary in the semantic classification between the anaphor and its antecedent and index are added in the feature structure in lexicon. To deal with anaphor in parser and algorithm is proposed to solve the problem for anaphor. The range of management of pareser is extended to solve the problem for anaphor of the indeclinable parts of speech in korean occured in all the sentences the parser HPSG developed previously manages.

  • PDF

세종계획 언어자원 기반 한국어 명사은행 (Korean Nominal Bank, Using Language Resources of Sejong Project)

  • 김동성
    • 한국언어정보학회지:언어와정보
    • /
    • 제17권2호
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF

Parsing the Wh-Interrogative Construction in Korean

  • Yang, Jaehyung;Kim, Jong-Bok
    • 한국언어정보학회지:언어와정보
    • /
    • 제17권2호
    • /
    • pp.51-66
    • /
    • 2013
  • Korean is a wh-in-situ language where the wh-expression stays in situ with an obligatory Q-particle marking its interrogative scope. This paper briefly reviews some basic properties of the wh-question construction in Korean and shows how a typed feature structure grammar, HPSG (Pollard and Sag 1994, Sag et al. 2003), together with the notions of 'type hierarchy' and 'constructions', can provide a robust basis for parsing the wh-construction in the language. We show that this system induces robust syntactic structures as well as enriched semantic representations for real-time applications such as machine translation, which require deep processing of the phenomena concerned.

  • PDF

Symmetric and Asymmetric Properties in Korean Verbal Coordination: A Computational Implementation

  • Kim, Jong-Bok;Yang, Jae-Hyung
    • 한국언어정보학회지:언어와정보
    • /
    • 제15권2호
    • /
    • pp.1-21
    • /
    • 2011
  • Of the coordination structures in Korean, the symmetric and asymmetric properties of verbal coordination have challenged both theoretical and computational approaches. This paper shows how a typed feature structure grammar, HPSG, together with the notions of 'type hierarchy' and 'constructions', can provide a robust basis for parsing (un)tensed verbal coordination as well as pseudo-coordination found in the language. We show that the analysis sketched here and computationally implemented in the existing resource grammar for Korean, Korean Resource Grammar (KRG), can yield proper syntactic structures as well as enriched semantic representations for real-time applications such as machine translation.

  • PDF