Search | Korea Science

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

Kwak, Chul Wan
- Journal of the Korean BIBLIA Society for library and Information Science
- /
- v.32 no.1
- /
- pp.133-150
- /
- 2021
The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.
https://doi.org/10.14699/kbiblia.2021.32.1.133 인용 PDF KSCI

The Design of Index System for Encyclopedia Database (백과사전 데이타베이스를 위한 색인시스템 설계)

추윤미;최석두
- Proceedings of the Korean Society for Information Management Conference
- /
- 1994.12a
- /
- pp.37-40
- /
- 1994
백과사전 데이타베이스의 효과적인 검색을 위한 색인시스템을 설계하였다. 여기에서는 항목에 대한 각종 속성정보와 본문정보를 모두 포함한 색인표제어파일을 작성하고, 각 항목에 대한 참조항목을 별도로 두지 않고 시소러스파일의 BT, NT, RT, UF를 사용하여 그 항목과 연관된 항목을 참조하도록 한다. 시소러스파일은 각 색인표제어에 부여한 주제분류기호(DDC, 또는 KDC)의 계층구조를 이용하여 자동생성한 후 색인자의 수작업을 거쳐 작성된다. 이 색인시스템을 통해 백과사전에 포함되어 있는 모든 정보를 이용한 다양한 접근이 가능하며 시소러스를 사용하여 관련항목을 브라우징을 할 수 있어 포괄적인 검색이 가능하다.
PDF

The Computational Extraction of Semantic Hierarchies for Korean Adjectives (한국어 형용사 의미계층의 전산적 추출)

Song, Sang-Houn;Choe, Jae-Woong
- Annual Conference on Human and Language Technology
- /
- 2006.10e
- /
- pp.109-116
- /
- 2006
자연 언어의 각 어휘는 서로 관계를 가지고 계층적 입체적 모델로 존재한다. 이러한 전제에서 출발한 연구 가운데 대표적인 것이 의미 계층이다. 본고에서는 한국어 형용사의 의미 계층을 추출하는 것을 목표로 하여, 형식적 객관적 방법론을 정립하고, 결과를 비교적 신속하고 정확하게 이끌어 낼 수 있는 전산적 처리 도입하였다. 우선 전체 구축에 필요한 절차를 세우고 각 단계에서 필요한 방법과 휴리스틱을 정리하였다. 이를 바탕으로 사전 뜻풀이말을 이용하여 반자동으로 작업하였으며, 일부 코퍼스를 활용하였다 최종 알고리즘으로는 Top-Down 방식을 택하였다. 이렇게 추출된 한국어 형용사 의미 계층은 226개의 최상위어에서 시작하여 총 3,792개의 표제어를 망라한다. 또한 수직적 계열 관계만을 명시했을 경우 나타날 수 있는 한계를 보완하기 위해, 동의어 반의어와 같은 수평적 의미 관계와 공기 명사와 같은 결합 관계 등을 함께 기술하였다. 한편 표제항을 뜻풀이말의 공기 명사를 이용하여 의미별로 분류하고 각 분류마다 별도의 의미 계층을 수립하였다.
PDF

DB강좌(1) - 메타데이터의 개요

An, Gye-Seong
- Digital Contents
- /
- no.9 s.64
- /
- pp.63-69
- /
- 1998
메타데이터는 일반적으로 데이터에 관한 데이터로서 자원의 속성을 기술하는 데이터를 의미한다. 표제, 저자, 주제명 분류 기호 등이 포함되는 기존 도서관의 목록 레코드, 초록, 색인에 의해 생성된 데이터베이스 레코드는 이러한 의미에서 메타데이터라고 할 수 있다.
PDF

Design and Implementation of Educational Newspaper Information Gathering Agent for NIE (NIE를 위한 교육 정보 수집 에이전트의 설계 및 구현)

Lee, Chul-Hwan;Han, Sun-Gwan
- The Journal of Korean Association of Computer Education
- /
- v.3 no.1
- /
- pp.169-176
- /
- 2000
This paper presents ENIG Agent can gather distributed educational newspaper information in the web as well as provide teachers and student those information for the NIE. ENIG Agent gleans newspaper headline of appropriate educational news portal site for real-time provision of those information. The optimized extraction of headline is performed through the pre-process of educational news site, information noise filtering, pattern matching. The educational newspaper headline information that is gotten through previous process will be shown to students by web-browser. To increase the usage of those information, intelligent education methods and visualized classification techniques are used. By experiment, the performance of this ENIG Agent was evaluated.
PDF

Developing an Automatic Classification System for Botanical Literatures (식물학문헌을 위한 자동분류시스템의 개발)

김정현;이경호
- Journal of Korean Library and Information Science Society
- /
- v.32 no.4
- /
- pp.99-117
- /
- 2001
This paper reports on the development of an automatic book classification system using the faced classification principles of CC(Colon Classification). To conduct this study, some 670 words in the botanical field were selected, analyzed in terms [P], [M], [E], [S], [T] employed in CC 7, and included in a database for classification. The principle of an automatic classification system is to create classification numbers automatically through automatic subject recognition and processing of key words in titles through the facet combination method of CC. Particularly, a classification database was designed along with a matrix-principle specifying the subject field for each word, which can allow automatic subject recognition possible.
PDF

화상 정보의 DB 구축과 검색 요소

안용남
- Journal of the Korean Society for information Management
- /
- v.8 no.2
- /
- pp.108-124
- /
- 1991
정보량이 많은 사진과 같은 화상 정보는 대용량을 갖고 있는 광 디스크에 축적시켜 DB를 구축하고 이는 컴퓨터를 이용해 고속 검색할 수 있다. 사진 DB 구축은 사진의 양, 구 축 목적, 이용 대상, 활용 방법 등에 따라 방법을 달리할 수 있으며 이의 검색 요소에는 촬 영 행위, 촬영 조건, 표제, 주제의 4가지 요소가 있고 그 중 가장 중요시되는 주제 요소에는 감각 정보, 주제 분류, 키워드가 있다.
PDF

A Study on the Utility of Relevance/Non-relevance Information in Homogeneous Documents (유사문헌집단에서 적합/부적합정보의 유용성에 관한 연구)

Moon, Sung-Been
- Journal of the Korean Society for information Management
- /
- v.32 no.3
- /
- pp.277-293
- /
- 2015
This study examined the relative retrieval effectiveness after relevance feedback between two systems (Title/Abstract and Full-text) using four different sets of relevance judgment. Four relevance levels (not relevant, marginally relevant, relevant, highly relevant) are also used, each of which is determined by referees giving a relevance score to documents. This study also investigated how much the average precision was improved after relevance feedback when "marginally relevant" documents are included in the relevant class with the Title/Abstract system, and with the Full-text retrieval system as well. It is found that the Title/Abstract system benefited from relevance feedback with the marginally relevant documents. In case of the Title/Abstract system, the higher percentage of improvement was consistently obtained when including the marginally relevant documents in the relevance class, however the result was vice versa in case of the Full-text retrieval system. It implied that the marginally relevant documents in the relevant class had caused noises in the Full-text retrieval system.
https://doi.org/10.3743/KOSIM.2015.32.3.277 인용 PDF KSCI

A Study on Jo Bok-seong's Insect-related Books Published in 1948: Focused on Story of Insects and About Insects (1948년에 출간된 조복성의 곤충 관련 저작에 관한 연구 - 『곤충이야기』와 『곤충기』를 중심으로 -)

Jin, Na-Young
- Journal of the Korean Society for Library and Information Science
- /
- v.53 no.2
- /
- pp.267-294
- /
- 2019
This study conducted analysis on forms and contents of Story of Insects (Gonchung Iyagi) and About Insects (Gonchung-gi), writings of biologist Jo Bok-seong published in 1948 to examine characteristics of two books and compare them. Story of Insects was made in the form of front cover-title page-foreword-table of contents-main text-copyright clause-advertisement-back cover, with the book size being A5 format. Contents of the book were divided into nine groups according to the characteristics of 65 species insects, to describe their characteristics. While, About Insects was made in the form of cover-title page-foreword-table of contents-main text-copyright clause-publication message of Eulyoo Mungo-advertisement-back cover, with the book size of A6 format. Contents of the book were divided into the author's own 11 groups according to the characteristics of 56 species insects, to describe their characteristics. About Insects being Eulyoo Publishing Co. and Story of Insects being Association of Joseon Children's Culture (abbreviated as Ahyeop) - sister company of Eulyoo Publishing Co. - but with the same basis.
https://doi.org/10.4275/KSLIS.2019.53.2.267 인용 PDF KSCI HTML

Tag Information Search based on Ontoloty (온톨로지 기반의 태그 정보 검색)

Ki-Dong Han;Chang-Hun Lee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2008.11a
- /
- pp.757-759
- /
- 2008
기존의 웹 서비스가 수동적이고, 단방향 통신을 축으로 뒀다면 현재의 웹 서비스는 점차 능동적이고 변화되었으며, 양방향 통신 환경을 지향하게 되었다. 이러한 웹 서비스 변화의 흐름을 일컬어 웹 2.0이라 한다. 웹 2.0 세대를 살아가는 사용자들은 기존과 다른 다양한 정보의 홍수에 노출되게 되었다. 이들은 일방적이고, 제한적인 정보를 얻는 기존 환경에서 탈피, 스스로 가치 있는 정보를 생산해 내기 시작했고, 이렇게 생산된 정보는 인터넷을 통해 다른 사용자와 교류하며 더욱 가치 있는 정보를 창출해 나가고 있다. 이런 발전 과정에서 지속적으로 더욱 더 커져가는 정보를 더 빠르고 정확하게 공유하는 기술이 필요하게 되었고, 현재 이런 필요성을 충족시키는데 유용한 기술의 한 갈래로 나온 것이 태그와 시맨틱 웹으로 대표되는 온톨로지 이다. 태그는 정보의 주제나 표제를 나타내는 단어를 해당 컨텐츠 정보를 제공하는 사이트에서 정보 분류 단위로 사용, 이를 통한 더 빠른 정보 공유를 할 수 있게 되었다. 시맨틱 웹은 현재의 인터넷과 같은 다양한 리소스에 대한 정보와 자원 사이의 관계-의미 정보를 기계(컴퓨터)가 처리할 수 있는 온톨로지 형태로 표현하고, 이를 자동화된 기계(컴퓨터)가 처리하도록 하는 기술이다. 이 논문에서는 웹 2.0의 대표기술이라 할 수 있는 온톨로지 기법을 이용, 기존 태그의 정보 분류 효율을 높이기 위한 태그와 태그의 의미관계 형성을 제안하였다.
https://doi.org/10.3745/PKIPS.y2008m011a.757 인용 PDF

Search Result 17, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)