• Title/Summary/Keyword: Text Collection

Search Result 300, Processing Time 0.032 seconds

A Keyword Analysis of Collection Development Policies of University and Public Libraries Using Text Mining (텍스트 마이닝을 활용한 대학도서관과 공공도서관의 장서개발 정책 키워드 분석)

  • Da-Hyeon Lee;Dong-Hee Shin
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.285-302
    • /
    • 2024
  • For this article, we conducted frequency analysis, topic modeling, and network analysis on eleven texts related to collection development policy found in the National Library of Korea. We deduced the main keywords related to collection development policies and analyzed the relationship between them. We subsequently conducted a pie coefficient analysis to identify the characteristics of collection development policies of university libraries and public libraries by category. The results showed that keywords such as "material," "library," "collection development," "user," and "collection" were the main keywords in frequency analysis and network centrality. Meanwhile, the pie coefficient analysis revealed that keywords such as "university," "construction," "student," "target," and "cost" were prevalent in university libraries, indicating that the academic needs of users and the discussion of digital resources were primary issues, while keywords related to the information needs of various user groups-including "adults," "survey," "feature," and "religion" -appeared in public libraries.

A Preliminary Study on Clinical Decision Support System based on Classification Learning of Electronic Medical Records

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.817-824
    • /
    • 2003
  • We employed a hierarchical document classification method to classify a massive collection of electronic medical records(EMR) written in both Korean and English. Our experimental system has been learned from 5,000 records of EMR text data and predicted a newly given set of EMR text data over 68% correctly. We expect the accuracy rate can be improved greatly provided a dictionary of medical terms or a suitable medical thesaurus. The classification system might play a key role in some clinical decision support systems and various interpretation systems for clinical data.

  • PDF

A Korean Text Summarization System Using Aggregate Similarity (도합유사도를 이용한 한국어 문서요약 시스템)

  • 김재훈;김준홍
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.35-42
    • /
    • 2001
  • In this paper. a document is represented as a weighted graph called a text relationship map. In the graph. a node represents a vector of nouns in a sentence, an edge completely connects other nodes. and a weight on the edge is a value of the similarity between two nodes. The similarity is based on the word overlap between the corresponding nodes. The importance of a node. called an aggregate similarity in this paper. is defined as the sum of weights on the links connecting it to other nodes on the map. In this paper. we present a Korean text summarization system using the aggregate similarity. To evaluate our system, we used two test collection, one collection (PAPER-InCon) consists of 100 papers in the field of computer science: the other collection (NEWS) is composed of 105 articles in the newspapers and had built by KOROlC. Under the compression rate of 20%. we achieved the recall of 46.6% (PAPER-InCon) and 30.5% (NEWS) and the precision of 76.9% (PAPER-InCon) and 42.3% (NEWS).

  • PDF

Future and Directions for Research in Full Text Databases (본문 데이타베이스 연구에 관한 고찰과 그 전망)

  • Ro Jung Soon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.17
    • /
    • pp.49-83
    • /
    • 1989
  • A Full text retrieval system is a natural language document retrieval system in which the full text of all documents in a collection is stored on a computer so that every word in every sentence of every document can be located by the machine. This kind of IR System is recently becoming rapidly available online in the field of legal, newspaper, journal and reference book indexing. Increased research interest has been in this field. In this paper, research on full text databases and retrieval systems are reviewed, directions for research in this field are speculated, questions in the field that need answering are considered, and variables affecting online full text retrieval and various role that variables play in a research study are described. Two obvious research questions in full text retrieval have been how full text retrieval performs and how to improve the retrieval performance of full text databases. Research to improve the retrieval performance has been incorporated with ranking or weighting algorithms based on word occurrences, combined menu-driven and query-driven systems, and improvement of computer architectures and record structure for databases. Recent increase in the number of full text databases with various sizes, forms and subject matters, and recent development in computer architecture artificial intelligence, and videodisc technology promise new direction of its research and scholarly growth. Studies on the interrelationship between every elements of the full text retrieval situation and the relationship between each elements and retrieval performance may give a professional view in theory and practice of full text retrieval.

  • PDF

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

  • Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.165-180
    • /
    • 2009
  • The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.