• Title/Summary/Keyword: 자질 선정

Search Result 70, Processing Time 0.021 seconds

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

Evaluation of the Feature Selection function of Latent Semantic Indexing(LSI) Using a kNN Classifier (잠재의미색인(LSI) 기법을 이용한 kNN 분류기의 자질 선정에 관한 연구)

  • Park, Boo-Young;Chung, Young-Mee
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2004.08a
    • /
    • pp.163-166
    • /
    • 2004
  • 텍스트 범주화에 관한 선행연구에서 자주 사용되면서 좋은 성능을 보인 자질 선정 기법은 문헌빈도와 카이제곱 통계량 등이다. 그러나 이들은 단어 자체가 갖고 있는 모호성은 제거하지 못한다는 단점이 있다. 본 연구에서는 kNN 분류기를 이용한 범주화 실험에서 단어간의 상호 관련성이 자동적으로 유도됨으로써 단어 자체 보다는 단어의 개념을 분석하는 잠재의미색인 기법을 자질 선정 방법으로 제안한다.

  • PDF

Emerging Agents Discovery based on Big Data Analysis (문헌 빅데이터 분석 기반의 유망주체 선정)

  • Kim, Jin-Hyung;Hwang, Myung-Gwon;Jeong, Do-Heon;Cho, Min-Hee;Jung, Han-Min
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.89-91
    • /
    • 2012
  • 유망 주체의 선정은 기업협력 및 경쟁 관계에 있어 매우 중요하며 연구, 정부정책 및 기업전략의 수립에 있어 반드시 필요한 일이나 엄청나게 많은 정보의 양으로 인하여 많은 노력과 시간이 소요된다. 따라서 본 논문에서는 객관적으로 문헌 빅데이터를 분석하고 이를 통해 유망 주체를 선정해 내기 위한 통계적 문헌 분석 기반의 유망주체 선정 모델을 제안한다. 유망주체 선정을 위해서는 다양한 자질값들을 분석하여 기술 및 주체에 대한 통합 자질값을 구하고 이를 유망주체 선정에 활용한다. 또한 유망주체 선정에 세가지 기준(주체의 비전, 실행력, 활동력)을 통계적으로 분석하여 최종적으로 유망주체를 선정한다.

An Experimental Study on Feature Ranking Schemes for Text Classification (텍스트 분류를 위한 자질 순위화 기법에 관한 연구)

  • Pan Jun Kim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.1-21
    • /
    • 2023
  • This study specifically reviewed the performance of the ranking schemes as an efficient feature selection method for text classification. Until now, feature ranking schemes are mostly based on document frequency, and relatively few cases have used the term frequency. Therefore, the performance of single ranking metrics using term frequency and document frequency individually was examined as a feature selection method for text classification, and then the performance of combination ranking schemes using both was reviewed. Specifically, a classification experiment was conducted in an environment using two data sets (Reuters-21578, 20NG) and five classifiers (SVM, NB, ROC, TRA, RNN), and to secure the reliability of the results, 5-Fold cross-validation and t-test were applied. As a result, as a single ranking scheme, the document frequency-based single ranking metric (chi) showed good performance overall. In addition, it was found that there was no significant difference between the highest-performance single ranking and the combination ranking schemes. Therefore, in an environment where sufficient learning documents can be secured in text classification, it is more efficient to use a single ranking metric (chi) based on document frequency as a feature selection method.

A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning (토픽모델링과 딥 러닝을 활용한 생의학 문헌 자동 분류 기법 연구)

  • Yuk, JeeHee;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.63-88
    • /
    • 2018
  • This research evaluated differences of classification performance for feature selection methods using LDA topic model and Doc2Vec which is based on word embedding using deep learning, feature corpus sizes and classification algorithms. In addition to find the feature corpus with high performance of classification, an experiment was conducted using feature corpus was composed differently according to the location of the document and by adjusting the size of the feature corpus. Conclusionally, in the experiments using deep learning evaluate training frequency and specifically considered information for context inference. This study constructed biomedical document dataset, Disease-35083 which consisted biomedical scholarly documents provided by PMC and categorized by the disease category. Throughout the study this research verifies which type and size of feature corpus produces the highest performance and, also suggests some feature corpus which carry an extensibility to specific feature by displaying efficiency during the training time. Additionally, this research compares the differences between deep learning and existing method and suggests an appropriate method by classification environment.

Text Categorization Based on the Maximum Entropy Principle (최대 엔트로피 기반 문서 분류기의 학습)

  • 장정호;장병탁;김영택
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.57-59
    • /
    • 1999
  • 본 논문에서는 최대 엔트로피 원리에 기반한 문서 분류기의 학습을 제안한다. 최대 엔트로피 기법은 자연언어 처리에서 언어 모델링(Language Modeling), 품사 태깅 (Part-of-Speech Tagging) 등에 널리 사용되는 방법중의 하나이다. 최대 엔트로피 모델의 효율성을 위해서는 자질 선정이 중요한데, 본 논문에서는 자질 집합의 선택을 위한 기준으로 chi-square test, log-likelihood ratio, information gain, mutual information 등의 방법을 이용하여 실험하고, 전체 후보 자질에 대한 실험 결과와 비교해 보았다. 데이터 집합으로는 Reuters-21578을 사용하였으며, 각 클래스에 대한 이진 분류 실험을 수행하였다.

  • PDF

An Experimental Study on the Automatic Classification of Korean Journal Articles through Feature Selection (자질선정을 통한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.69-90
    • /
    • 2022
  • As basic data that can systematically support and evaluate R&D activities as well as set current and future research directions by grasping specific trends in domestic academic research, I sought efficient ways to assign standardized subject categories (control keywords) to individual journal papers. To this end, I conducted various experiments on major factors affecting the performance of automatic classification, focusing on feature selection techniques, for the purpose of automatically allocating the classification categories on the National Research Foundation of Korea's Academic Research Classification Scheme to domestic journal papers. As a result, the automatic classification of domestic journal papers, which are imbalanced datasets of the real environment, showed that a fairly good level of performance can be expected using more simple classifiers, feature selection techniques, and relatively small training sets.

Automatic Text Categorization Using Term Information of Anchor Text (Anchor Text의 단어 정보를 이용한 자동 문서 범주화)

  • Heo, Hee-keun;Han, Gi-deok;Jung, Sung-won;Lim, Sung-shin;Kwon, Hyuk-chul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.665-668
    • /
    • 2004
  • 최근의 웹 문서는 텍스트뿐만 아니라 이미지, 사운드 등 다른 여러 형태로 표현되고 있어서 텍스트의 비중이 낮아지고 있다. 그래서 문서 내에서 일정량 이상의 단어 추출이 어려운 문서들에 대해서 기존의 단어 정보만을 이용한 문서 범주화 방법은 좋은 성능을 기대할 수 없다. 그래서 본 논문은 Anchor Text 단어 정보의 자질 적합성 판단에 의한 새로운 자동 문서 범주화 모델을 제안한다. 문서 범주화 모델로는 베이지언 확률 모델을 이용하였으며, 카이제곱 통계량을 사용하여 자질을 선정하였다. 문서 내에서 추출된 단어 자질들이 해당 문서를 판단하는데 부족하다고 판단되면 문서의 링크정보를 이용하여 연결된 문서의 단어 자질과 Anchor Text의 단어 자질을 반영함으로써 성능을 향상시킨다.

  • PDF

An Analysis of Teacher Training Programs focusing on the Reflect Qualities of teachers in Gifted Education (영재교육 담당교사의 자질 반영을 중심으로 한 교사 연수 프로그램 분석)

  • Cho, Kyu-Seong;Chung, Duk-Ho;Park, Kyeong-Jin;Kim, Hee-Jin;Park, Seon-Ok
    • Journal of Gifted/Talented Education
    • /
    • v.24 no.4
    • /
    • pp.543-559
    • /
    • 2014
  • The purpose of this study was to analyze the teacher training programs focusing on reflect qualities of teachers in gifted education. A total of 20 teacher training programs were collected from the office of education, the teacher training center of university and the remote training center. These teacher training programs were analyzed using a semantic network analysis. The analysis showed that 'curriculum', 'teaching and learning' and 'development of curriculum' were emphasized in teacher training programs. Therefore, teacher training programs are operated with an emphasis on teacher's professional qualities. The analysis also revealed that many of the teacher training programs were dealt with professional and teaching faculty's qualities more than affective qualities. Therefore, it is necessary to reorganize the teacher training programs to be diversified and balanced. Furthermore, in order to improve teacher's quality equally, we suggest a systematic training program should be pot in place.

Qualitative research director for the field of early childhood education (유아교육 현장의 원장에 대한 질적 연구)

  • Lim, Nan Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.12
    • /
    • pp.8243-8248
    • /
    • 2015
  • The purpose of this study was to find out what this ledger to ensure that they have in early childhood education Director with more than 10 years of experience working in child care centers for this purpose was selected as study participants 10 people. Education director and explore their experience in the field as a necessary qualifications and roles, attitudes and images were evaluated what would. and Revealing how it affects teachers was conducted in-depth discussions and interviews. The results of this study are as follows. The results of this study are as follows. First, the qualities required of a director of early childhood education in the field of personality qualities than professional qualifications and leadership qualities were required by priority. Second, raise the capacity of teachers and to activate the Education, responsible attitude and professional attitude to training was required as an important element to ensure that they have this ledger.