• Title/Summary/Keyword: 자질 결합

Search Result 65, Processing Time 0.026 seconds

A Fast Text Classifier with feature Value Voting and Document-Side Feature Selection (자질값투표 기법과 문서측 자질 선정을 이용한 고속 문서 분류기)

  • Lee, Jae-Yun
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2005.08a
    • /
    • pp.71-78
    • /
    • 2005
  • 빠르면서도 정확한 문서 자동분류를 위해서 자질값투표 기법과 문서측 자질선정 방식의 결합을 제안하였다. 자질값은 미리 학습된 분류자질과 분류범주간의 연관성을 뜻하는 것으로서, 자질값투표 기법은 분류대상 문서에 나타난 자질들의 자질값을 후보범주마다 합산하여 가장 높은 범주로 분류하는 것이다. 문서측 자질선정은 일반적인 분류자질선정과 달리 학습집단이 아닌 분류대상 문서의 자질 중 일부만을 선택하여 분류에 이용하는 방식이다. 이들을 결합하여 사용한 결과 실험환경에서는 나이브베이즈 분류기만큼 간단하고 빠르면서 SVM 분류기보다 좋은 성능을 보였다.

  • PDF

Feature Filtering Methods for Web Documents Clustering (웹 문서 클러스터링에서의 자질 필터링 방법)

  • Park Heum;Kwon Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.489-498
    • /
    • 2006
  • Clustering results differ according to the datasets and the performance worsens even while using web documents which are manually processed by an indexer, because although representative clusters for a feature can be obtained by statistical feature selection methods, irrelevant features(i.e., non-obvious features and those appearing in general documents) are not eliminated. Those irrelevant features should be eliminated for improving clustering performance. Therefore, this paper proposes three feature-filtering algorithms which consider feature values per document set, together with distribution, frequency, and weights of features per document set: (l) features filtering algorithm in a document (FFID), (2) features filtering algorithm in a document matrix (FFIM), and (3) a hybrid method combining both FFID and FFIM (HFF). We have tested the clustering performance by feature selection using term frequency and expand co link information, and by feature filtering using the above methods FFID, FFIM, HFF methods. According to the results of our experiments, HFF had the best performance, whereas FFIM performed better than FFID.

Noun Link Relation Research Of Verb '-Kata (가다)' for Korean Syntactic Analysis (한국어 구문 해석을 위한 동사 '가다'의 명사 결합 관계 연구)

  • Park, Keon-Sook
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.207-216
    • /
    • 1998
  • 본 논문에서는 한국어 구문 해석을 위해 동사 중심의 구문 틀 정보를 구축하고, 나아가 결합 빈도가 높은 명사와의 결합 관계를 하나의 네트워크로 구성하는 구문 해석의 방법을 제안한다. 동사 중심의 구문 틀과 명사의 의미 자질은 구문 해결에서 아주 중요한 역할을 하는 것으로, 구문의 비문 여부를 가리는 데 도움을 준다. 그러나 명사의 의미 자질은 경계가 모호하여 구문의 적격성(wellformedness)을 가리기에는 부족한 점이 많다. 따라서 동사와 명사의 결합 관계를 이용하면 구문의 의미적 적격성을 좀 더 명시적으로 가릴 수 있다. 한국어에서 기본 동사이고, 초등학교 교과서에서 사용된 빈도가 아주 높은 동사 '가다'를 가지고 구체적으로 구문 틀 정보와 결합 명사의 의미 자질 및 결합 관계를 정리하였다.

  • PDF

Study on the Juvenile Hormone Binding Protein in the Hemolymph of the Silkworm Larva, Bombyx mori. (누에 체액의 유약호르몬 결합단자질(Juvenile hormone hinding protein)에 관한 연구)

  • 손흥대
    • Journal of Sericultural and Entomological Science
    • /
    • v.30 no.1
    • /
    • pp.25-32
    • /
    • 1988
  • In order to examine a physiological role of juvenile(JH) binding proteins in the hemolymph of the silkworm larva, Bombyx mori, [3H] JH I incubated hemolymph was separated by polyacrylamide gel electrophoresis in the fifth-instar larva and the activity of the binding protein was analyzed using charcoal binding assay. The results obtained were as follows; 1. The JH was bound by two protein fractions in the hemolymph of the fifth-instar larva; One was JH binding lipoprotein(JH-LP), the other was JH speific binding protein(JHBP). Their relative mobility values(Rm) were 0.3∼0.33 and 0.81∼0.84, respectively. There were no valid differences in those values from developmental stages of both male and female silkworms. 2. Total protein contents of the hemolymph were gradually increased during the fifth-instar larva, while at the prepupa decreased. The maximum ones were observed at the spinning period and the contents from female were much higher than those from the male. 3. JH binding activity per ml of the hemolymph was low in the early stage of the fifth-instar larva and its activity was maximized at the psinning period and at the prepupa slightly decreased. 4. There was a similar pattern between changes of the JH binding activity per ml of the hemolymph and of the total protein contents of the hemolymph. 5. The JH binding activity per mg of the hemolymph proteins was high in the early stage of the fifth-instar larva, while from the 6th day of the fifth-instar larva to the prepupa its activity showed the lowest levels.

  • PDF

LKB (Linguistic Knowledge Building) 시스템을 이용한 한국어 구문분석기 구축 -한국어의 동사성/형용사성 명사 구문의 전산처리를 중심으로-

  • 류병래;은광희
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2003.06a
    • /
    • pp.79-99
    • /
    • 2003
  • 한국어의 동사성 명사와 형용사성 명사는 경동사와 결합하여 문장의 서술어 역할을 하는데 이때에 명사는 보어 자질을 경동사에 전달하고 이렇게 결합한 후에 생성되는 서술어 복합체가 술어로 역할 한다. 이번 구문분석 시스템 연구에서는 LKB 시스템을 통해 한국어에서 체언과 결합하는 격조사의 처리와 용언과 결합하는 어미의 처리 및 동사/형용사성 명사가 경동사에 보어 자질을 전달하여 술어 복합체를 이루는 현상을 집중적으로 다룬다.

  • PDF

Conceptual Differences between the Relation-Based Approach and the Feature-Based Approach in Noun-Noun Conceptual Combination (개념결합 처리과정에 대한 관계 - 기반 접근과 차원- 기반 접근의 조망 차이)

  • Choi, Min-Gyung;Shin, Hyun-Jung
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.199-231
    • /
    • 2010
  • This study tried to contrast the relation-based and the dimension-based explanations and to suggest its implications on the noun-noun conceptual combination. In experiment 1, we investigated whether the dimension-based approach and intra-conceptual explanation can explain both thematic relational and property interpretations of conceptual combinations based upon the intrinsic and extrinsic features of constituent concepts. We defined intrinsic(or extrinsic) concepts according to the degree of dependency on intrinsic(or extrinsic) features. Property interpretation was facilitated when modifiers were the intrinsic concepts. This result implies that processing of conceptual combination can be influenced by the structures and information of constituent concepts. In experiment 2, exocentricity of the concepts used in Gagne(2000) was examined to reanalyze her data according to the dimension-based approach. The exocentricity was higher when the concepts were combined by their relational connections. Results of experiment 1 and 2 suggest the possibility that both approaches can be integrated through the diversities of information involved during interpreting conceptual combination. Implications and future directions of this study were discussed.

  • PDF

Effective Feature Selection for Patent Classification (특허 분류를 위한 효과적인 자질 선택)

  • Jung Ha-Yong;Huang Jin-Xia;Shin Sa-Im;Choi Key-Sun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.670-672
    • /
    • 2005
  • 자질 선택은 문서 분류와 같이 않은 자질을 사용하는 지도식 기계학습에 관한 연구에서 날로 중요성이 커지고 있다. 특히 특허문서 분류와 같은 작업은 기존의 문서 분류보다도 훨씬 많은 자질과 분류 범주를 가지기 때문에 전체 문서의 특징을 드러내는 적절한 부분집합을 선택해 학습하는 것이 절실하다. 전통적인 자질선택 방법은 필터라는 방법으로서 빠르지만 임계값을 정하기가 어렵다는 문제가 있다. 한편 최근에 많이 연구되는 래퍼는 일반적으로 필터보다. 좋은 성능을 보이지만 자질의 개수가 많을수록 시간이 오래 걸린다는 단점이 있다. 본 연구에서는 필터와 래퍼를 상호 보완적으로 결합하여 최적의 필터를 자동적으로 찾는 래퍼를 제안한다. 실험 결과, 제안한 방법이 효과적으로 자질 집합을 선택하는 것을 확인할 수 있었다.

  • PDF

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Experimental Study for Effective Combination of Opinion Features (효과적인 의견 자질 결합을 위한 실험적 연구)

  • Han, Kyoung-Soo
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.227-239
    • /
    • 2010
  • Opinion retrieval is to retrieve items which are relevant to the user information need topically and include opinion about the topic. This paper aims to find a method to represent user information need for effective opinion retrieval and to analyze the combination methods for opinion features through various experiments. The experiments are carried out in the inference network framework using the Blogs06 collection and 100 TREC test topics. The results show that our suggested representation method based on hidden 'opinion' concept is effective, and the compact model with very small opinion lexicon shows the comparable performance to the previous model on the same test data set.

Construction of Research Fronts Using Factor Graph Model in the Biomedical Literature (팩터그래프 모델을 이용한 연구전선 구축: 생의학 분야 문헌을 기반으로)

  • Kim, Hea-Jin;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.1
    • /
    • pp.177-195
    • /
    • 2017
  • This study attempts to infer research fronts using factor graph model based on heterogeneous features. The model suggested by this study infers research fronts having documents with the potential to be cited multiple times in the future. To this end, the documents are represented by bibliographic, network, and content features. Bibliographic features contain bibliographic information such as the number of authors, the number of institutions to which the authors belong, proceedings, the number of keywords the authors provide, funds, the number of references, the number of pages, and the journal impact factor. Network features include degree centrality, betweenness, and closeness among the document network. Content features include keywords from the title and abstract using keyphrase extraction techniques. The model learns these features of a publication and infers whether the document would be an RF using sum-product algorithm and junction tree algorithm on a factor graph. We experimentally demonstrate that when predicting RFs, the FG predicted more densely connected documents than those predicted by RFs constructed using a traditional bibliometric approach. Our results also indicate that FG-predicted documents exhibit stronger degrees of centrality and betweenness among RFs.