• Title/Summary/Keyword: 문헌분류

Search Result 1,229, Processing Time 0.025 seconds

A Comparative Study of Notes in KDC and DDC (한국십진분류법과 듀이십진분류법에 나타난 주기의 다양성에 관한 비교 연구)

  • Chung, Yeon-Kyoung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.19 no.2
    • /
    • pp.129-146
    • /
    • 2008
  • Notes in library classification systems are inevitable tools for creating and building of classification numbers. The purposes of this study are to make better notes in Korean Decimal Classification(KDC) by analyzing and comparing notes in other library classification system and to assign the most appropriate classification numbers based upon the better notes. In order to achieve these purposes, analyzing notes in Dewey Decimal Classification(DDC) and KDC was carried. And the comparison of notes used in 000 Computer science, information, general works in DDC and KDC was done. Based upon these analysis, additional notes and their various forms were suggested.

Feature Selection for Bio Named Entity Recognition from Biological Literature (바이오 문헌에서의 단백질, 유전자 객체 인식을 위한 특징 추출)

  • Kim, Tae-Wook;Li, Meijing;Tsendsuren, Munkhdalai;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.166-168
    • /
    • 2012
  • 바이오 문헌으로부터의 의미 있는 객체 추출 및 상호작용 관계 추출은 수 많은 바이오 문헌으로부터 유용한 정보를 얻기 위한 필수적인 과정이다. 특히 문헌으로부터 유전자 또는 단백질 이름과 같은 바이오 객체를 정확하게 인지하는 것은 새로운 객체인식의 어려움과 객체를 찾기 위한 특징 패턴의 다양성으로 인해 도전적인 과제로 남아있다. 본 논문에서는 전처리 과정을 거친 문헌 데이터로부터 12개의 의미 있는 속성들을 선택하였다. 선택된 속성에 데이터마이닝 기법중 하나인 속성 추출 기법을 적용하여 객체를 분류하는데 있어 의미 있는 속성들을 추출하였다. 특징 추출 방법과 분류 알고리즘이 분류 성능에 미치는 영향을 평가하기 위해 각 방법의 정확도를 사용하여 분류 성능을 비교였으며, Gain Ratio Attribute Evaluation과 Symmetrical Uncertainty Attribute Evaluation 기법에 의해 추출된 속성이 가장 정확한 분류 성능을 보여주었다.

Optimization of Number of Training Documents in Text Categorization (문헌범주화에서 학습문헌수 최적화에 관한 연구)

  • Shim, Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.4 s.62
    • /
    • pp.277-294
    • /
    • 2006
  • This paper examines a level of categorization performance in a real-life collection of abstract articles in the fields of science and technology, and tests the optimal size of documents per category in a training set using a kNN classifier. The corpus is built by choosing categories that hold more than 2,556 documents first, and then 2,556 documents per category are randomly selected. It is further divided into eight subsets of different size of training documents : each set is randomly selected to build training documents ranging from 20 documents (Tr-20) to 2,000 documents (Tr-2000) per category. The categorization performances of the 8 subsets are compared. The average performance of the eight subsets is 30% in $F_1$ measure which is relatively poor compared to the findings of previous studies. The experimental results suggest that among the eight subsets the Tr-100 appears to be the most optimal size for training a km classifier In addition, the correctness of subject categories assigned to the training sets is probed by manually reclassifying the training sets in order to support the above conclusion by establishing a relation between and the correctness and categorization performance.

A Study on the Developing Standard Classsification of the National Knowledge and Information Resources (국가지식정보 자원 분류 체계 표준화 연구)

  • Ko Young-Man;Seo Tae-Sul;Cho Sun-Yeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.40 no.3
    • /
    • pp.151-173
    • /
    • 2006
  • The purpose of this study is to make out a draft for the standard classification of the National Knowledge and Information Resources. As the result of the Study the standard classification system of the national knowledge and information resources, named "Knowledge Classification 'KC' is suggested. KC consists of 3 classification systems classification by subject, type of resources and type of media. The classification by subject has 12 main classes, and each main class has divisions. Main classes consist each of major discipline or group of related disciplines. The type of resources is classified by 10 types of content, likewise numbered 0-9, and the media of knowledge are classified by 8 types. likewise 0-7. In the Practice the notation always consists of 2 characters and 2 digits. The first character designate main class and the second character designate division. The first number designate the type of resources and the second number designate the type of media.

A Study on Patent Literature Classification Using Distributed Representation of Technical Terms (기술용어 분산표현을 활용한 특허문헌 분류에 관한 연구)

  • Choi, Yunsoo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.2
    • /
    • pp.179-199
    • /
    • 2019
  • In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.

A Comparative Study on the Bacon의s Knowledge Classification and SAGOJEONSEO Classification (지식분류에 대한 동서양의 비교 - 베이컨의 분류와 사고전서를 중심으로 -)

  • 이명규
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.11 no.2
    • /
    • pp.25-38
    • /
    • 2000
  • A knowledge classification is based on different types of knowledge system. which depends on classified objects, purposes, times, regions, and scholars. The classified contents. however, do not show any significant difference in any times or regions, though there are differences in representation methods and in arrangement priority of representing knowledge, Knowledge or library classification reflects the structure of a contemporary society and is decided by social philosophy of the time. The basic structure of knowledge system in the past was formed in the ancient time, and since then, it has been continuously developed. In the course of this process, the development of studies has generated other branches of studies, playing a significant role in changing the whole system of studies. This kind of development will continue to occur and many new branches of information will appear. resulting in taking each category of knowledge classification.

  • PDF

Developing an Automatic Classification System for Botanical Literatures (식물학문헌을 위한 자동분류시스템의 개발)

  • 김정현;이경호
    • Journal of Korean Library and Information Science Society
    • /
    • v.32 no.4
    • /
    • pp.99-117
    • /
    • 2001
  • This paper reports on the development of an automatic book classification system using the faced classification principles of CC(Colon Classification). To conduct this study, some 670 words in the botanical field were selected, analyzed in terms [P], [M], [E], [S], [T] employed in CC 7, and included in a database for classification. The principle of an automatic classification system is to create classification numbers automatically through automatic subject recognition and processing of key words in titles through the facet combination method of CC. Particularly, a classification database was designed along with a matrix-principle specifying the subject field for each word, which can allow automatic subject recognition possible.

  • PDF

A Study on the Features of the <Classification-Search Term Dictionary>, the Library Classification Scheme in North Korea (북한 문헌분류표 <분류-검색어사전>의 특징 분석)

  • Jae-Hwang Choi
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.4
    • /
    • pp.123-142
    • /
    • 2022
  • In 2000, North Korea developed and published a two-volume, <Classification-Search Term Dictionary> and is currently used throughout North Korea. The purpose of this study is to examine the development process of the classification schemes of the North Korea after liberation and to understand the contents, composition, and principles of the <Classification-Search Term Dictionary> published in 2000 and revised in 2014. Until now, all the studies of the North Korean classification schemes were studies on the <Book Classification Scheme> published in North Korea in 1964, and there has been no discussion on North Korea's classification schemes since then. The first volume of the <Classification-Search Term Dictionary> consists of 'classification symbols - search terms', and the second volume consists of 'search terms - classification symbols'. Volume 1 is based on the <Books and Bibliography Classification Scheme (1996)>, and there are a total of 41 main classes in five categories. Volume 1 allocates 1 main class (11/19) to 'revolutionary ideas and theories', 8 main classes (20~27) to 'natural sciences', 19 main classes (30~69) to 'engineering technology and applied sciences', 12 main classes (70~85) to 'social sciences', and 1 main class (90) to 'total sciences'. Volume 2 is similar to subject-headings. North Korea's <Classification-Search Term Dictionary> is the first classification scheme introduced in South Korea and is expected to be the starting point for future studies on the establishment of the standard unification classification schemes.

A Study on the Library Classification System of North Korean (북한의 군중도서관용 '도서분류표' 연구)

  • Nam, Tae-Woo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.34 no.1
    • /
    • pp.71-92
    • /
    • 2000
  • This study aims to content analysis of Library Classification System in the North Korean. Also This paper is analyze and overview to conceptual framework. Notational system. Principle of hierarchy in the North Korean's Library Classification System. Libraries usually arrange their collections according to the systematic structure of the library classification. A decimal point follows the third digit. After which division by ten continues to the specific degree of classification needed. This system is based on the social and communism thought. The libraries in the South and the North has different concepts, goals, information resources, classification system and the different ways of using them. Considering the practical aspects of the libraries and the reasons for their existence, they must structure the mutual cooperative system so as to minimize the shock when confronting the social changes, so-cold the national unification.

  • PDF

A Study on the Feature Selection for Automatic Document Categorization (자동문헌분류를 위한 대표색인어 추출에 관한 연구)

  • 황재영;이응봉
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2003.08a
    • /
    • pp.55-64
    • /
    • 2003
  • 인터넷 학술정보자원이 급증하고 있는 가운데 자동문헌분류에 대한 관심과 필요성도 늘어가고 있다. 자동문헌분류에 관한 실험은 전처리 단계인 대표색인어 추출과 추출된 대표색인어의 분류성능 평가 실험으로 구분 할 수 있는데, 본 연구에서는 우선 대표색인어 추출을 위해 다양한 대표색인어(자질) 추출 방법에 따른 색인어 성능평가 실험 및 최적의 대표색인어 개수 선정 실험을 수행하였다.

  • PDF