• Title/Summary/Keyword: 범주

Search Result 3,907, Processing Time 0.031 seconds

Hierarchical Automatic Classification of News Articles based on Association Rules (연관규칙을 이용한 뉴스기사의 계층적 자동분류기법)

  • Joo, Kil-Hong;Shin, Eun-Young;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.730-741
    • /
    • 2011
  • With the development of the internet and computer technology, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The conventional document categorization method used only the keywords of related documents for document classification. However, this paper proposed keyword extraction method of based on association rule. This method extracts a set of related keywords which are involved in document's category and classifies representative keyword by using the classification rule proposed in this paper. In addition, this paper proposed the preprocessing method for efficient keywords creation and predicted the new document's category. We can design the classifier and measure the performance throughout the experiment to increase the profile's classification performance. When predicting the category, substituting all the classification rules one by one is the major reason to decrease the process performance in a profile. Finally, this paper suggested automatically categorizing plan which can be applied to hierarchical category architecture, extended from simple category architecture.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

Analysis of Subject Category on Artificial Intelligence Discourse in Newspaper Articles (신문기사에 나타난 인공지능 담론에 대한 주제범주 분석)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.4
    • /
    • pp.21-47
    • /
    • 2017
  • This study aims to analyze features of topics about AI(Artificial Intelligence) which is gaining a massive attention these days. Newspaper articles published from 2016 to June, 2017 were selected to analyze key subjects. The reason why the period was selected is people started to get attention on AI since 2016 as AlphaGo came out and gave a shock. The number of coded main message was 1,210 in 525 newspaper articles in total. The messages were categorized as three subject categories: the seven major categories, 62 middle categories. and minor categories. The seven major categories contains issues such as AI research, AI application, AI business, AI era, AI argument, AlphaGo, and other topics. The first features of issues about AI found in the major subject categories is that they are various and complicate. Second, it is important that social and policy-level issues related AI, such as job losses, misuse, and error should be dealt with to utilize AI safely. Last, issues related the role of human and revolution of education system in the AI era were shown as subjects which are important but hard to discuss.

Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction (기업부실 예측 데이터의 불균형 문제 해결을 위한 앙상블 학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.1-15
    • /
    • 2009
  • In a classification problem, data imbalance occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. This paper proposes a Geometric Mean-based Boosting (GM-Boost) to resolve the problem of data imbalance. Since GM-Boost introduces the notion of geometric mean, it can perform learning process considering both majority and minority sides, and reinforce the learning on misclassified data. An empirical study with bankruptcy prediction on Korea companies shows that GM-Boost has the higher classification accuracy than previous methods including Under-sampling, Over-Sampling, and AdaBoost, used in imbalanced data and robust learning performance regardless of the degree of data imbalance.

  • PDF

A Study of the Elements Analysis of Metadata for Electronic Resource Management (전자자원 관리용 메타데이터의 요소 분석에 관한 연구)

  • Nam, Young-Joon;Jang, Bo-Seong
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.241-264
    • /
    • 2006
  • This study suggested indispensable elements of metadata for electronic resource management and effectively manage of electronic resource in library. Therefore, this research analyzed into the data elements of DLF ERMI's ERMS data structure, foreign three universities. The Data elements are verified by domestic ERM specialist. As the result, trial categories are 12 elements, consortium categories are 15 elements, license categories are 33 elements, electronic resource information categories are 21 elements, access/administrative information categories are 20 elements, usage statistics categories are 13 elements, workflow categories are 14 elements, contact information categories are 18 elements.

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

A Grounded Theory Approach to the Process of Conflict between Early Childhood Teacher and Parent on the Perspectives of Teachers (유아교사의 관점에서 본 교사와 학부모의 갈등과정 : 근거이론적 접근)

  • Kim, Young Ju;Lee, Kyeong Hwa
    • Korean Journal of Childcare and Education
    • /
    • v.11 no.5
    • /
    • pp.237-260
    • /
    • 2015
  • This study sought to explain the process of conflict between early childhood teacher and parent (T-P conflict) and was guided by the following three questions: (a) how does a T-P conflict begin? (b) how does a T-P conflict develop over time? and (c) how does a T-P conflict end? One hundred cases were provided by private kindergarten teachers with experiences of T-P conflict. A qualitative grounded theory design was used for analysis of the data. Open coding and axial coding resulted in six categories: (a) "causes of conflict" (b) "conditional context of conflict" (c) "state of conflict" (d) "amplification of conflict" (e) "problem solving strategies of conflict", and (f) "cease of conflict". The stage of selective coding drew out three core categories: (a) "prelude with tuneless instruments" (b) "duet for discords and concords, and (c) "splendid finale vs. unplanned intermission". Additionally the study raised the doubts about current early childhood education policies based on neo-liberalism and their impacts on relationships between teachers and parents.

Categorical time series clustering: Case study of Korean pro-baseball data (범주형 시계열 자료의 군집화: 프로야구 자료의 사례 연구)

  • Pak, Ro Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.621-627
    • /
    • 2016
  • A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is 'time series clustering', or more specifically 'categorical time series clustering'. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.

Factor Analysis of the Korean-Child Behavior Checklist in Children with Autism Spectrum Disorders (자폐 범주성 장애 아동에서 아동·청소년 행동평가척도의 요인분석)

  • Park, Eun-Young
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.8
    • /
    • pp.221-230
    • /
    • 2011
  • The purpose of this study was to examine validity of the Korean-Child Behavior Checklist: K-CBCL) as measures for emotional and behavioral problems for use with children with autism spectrum disorders. In present study, the factor of the K-CBCL was investigated, using data of 248 children with autism spectrum disorders, with 11.17 mean ages. The two factor model of Internalizing problems (Withdrawn, Somatic Complaints, Anxious/Depressed) and Externalizing problems (Delinquent Behavior, Aggressive Behavior) was investigated by the confirmatory factor analysis. The two factor model of K-CBCL was adequate for children with autism spectrum disorders. The inter-item consistency for the sub-factor of K-CBCL demonstrated on adequate reliability of the measure. Although the inter-item consistency of Withdraw, Social problems, Delinquent Behavior was not acceptable, the inter-item consistency of Internalizing, Externalizing and total problems were good. This results supported validity and reliability and suggested that K-CBCL is used to assess for emotional and behavioral problems in children with autism spectrum disorders.

Perception and Attitude about Risk from Science & Technology-Focused on Risk from Electromagnetic Wave- (과학기술 위험에 대한 인지 및 태도 -전자파 위험을 중심으로-)

  • Song, Hae-Ryong;Kim, Won-Je;Jung, Se-Il
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.436-445
    • /
    • 2010
  • The purpose of this research is to know what factors have impacts on their risk perception and attitude in risk communications. This research shows the research findings that the determinant factors of risk perception are the possibility to control the risk, benefits of recognition, the specialty of risk management, and the usefulness of information about the risk. And also the results have shown that the determinant factors of risk attitudes are the possibility to control the risk, the understanding of science and technology, the familiarity with the risk, the usefulness information about the risk, the accuracy of information, and the initiative in the protection of citizens from the risk. As the results have indicated, common determinant factors are the usefulness of information about the risk and the possibility to control the risk. Both of them that affect risk perception and attitudes on electromagnetic waves are important factors in risk communication research. Therefore this study shows that what factors suppose to be considered important in risk communication process about risk of electromagnetic waves.