Search | Korea Science

Optimization of Number of Training Documents in Text Categorization (문헌범주화에서 학습문헌수 최적화에 관한 연구)

Shim, Kyung
- Journal of the Korean Society for information Management
- /
- v.23 no.4 s.62
- /
- pp.277-294
- /
- 2006
This paper examines a level of categorization performance in a real-life collection of abstract articles in the fields of science and technology, and tests the optimal size of documents per category in a training set using a kNN classifier. The corpus is built by choosing categories that hold more than 2,556 documents first, and then 2,556 documents per category are randomly selected. It is further divided into eight subsets of different size of training documents : each set is randomly selected to build training documents ranging from 20 documents (Tr-20) to 2,000 documents (Tr-2000) per category. The categorization performances of the 8 subsets are compared. The average performance of the eight subsets is 30% in $F_1$ measure which is relatively poor compared to the findings of previous studies. The experimental results suggest that among the eight subsets the Tr-100 appears to be the most optimal size for training a km classifier In addition, the correctness of subject categories assigned to the training sets is probed by manually reclassifying the training sets in order to support the above conclusion by establishing a relation between and the correctness and categorization performance.
https://doi.org/10.3743/KOSIM.2006.23.4.277 인용 PDF

A Study on Design Of Cataloging Expert System Using Pattern Recognition Techniques (패턴인식기법을 이용한 편목전문가시스템 설계에 관한 연구)

김현희;곽병희
- Journal of the Korean Society for information Management
- /
- v.11 no.2
- /
- pp.131-164
- /
- 1994
This study shows the design and implementation of cataloging expert system using pattern recognition techniques. This system attemps to demonstrate the feasibility of cataloging in KORMARC format from title page and copyright page without the intervention of humans. The prototype was implemented as a rule-based system in Turbo C. To demonstrate the function and capability of the system, experimental document-group and control document-group was analyzed. The hit ratio of experimental document-group is 94%. On the other hand, the hit ratio of control document-group is 93%, a little bit lower than the experimental group.
PDF

The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance (학습문헌집합에 기 부여된 범주의 정확성과 문헌 범주화 성능)

Shim, Kyung;Chung, Young-Mee
- Journal of the Korean Society for information Management
- /
- v.23 no.2
- /
- pp.265-285
- /
- 2006
In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in $F_1$ value. On the other hand, the Recat-1 set scores $F_1$ value of 61%, which is 3.6 times higher than that of the Initial set.
https://doi.org/10.3743/KOSIM.2006.23.2.265 인용 PDF

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation (기계번역을 이용한 교차언어 문서 범주화의 분류 성능 분석)

Lee, Yong-Gu
- Journal of the Korean Society for Library and Information Science
- /
- v.43 no.1
- /
- pp.313-332
- /
- 2009
Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.
https://doi.org/10.4275/KSLIS.2009.43.1.313 인용 PDF

A Study on Effect of Reading Guidance Program based on Enneagram of Personality (에니어그램 성격유형을 적용한 독서지도의 효과 연구)

Paek, Jin-Hwan;Han, Yoon-Ok
- Journal of the Korean Society for Library and Information Science
- /
- v.48 no.2
- /
- pp.45-64
- /
- 2014
The present study aims to examine the effect of the reading guidance program that applied the Enneagram of Personality to a group of 6th year elementary school students. In order to verify the effect of the program, three variables with the highest correlation-self-encouragement, self-efficacy and sociality-were selected as the assessment tools for the present experimental study. Based on a group of 6th year elementary students as participants, the reading guidance instructions were given applying the Enneagram of Personality, and in order to verify the effect of those instructions, the students were divided into two groups and assigned to either an experimental or a control group.
https://doi.org/10.4275/KSLIS.2014.48.2.045 인용 PDF KSCI

A Study on the Improvement of Retrieval Performance Query Expansion in Passage-based Retrieval (질의확장에 의한 단락검색의 성능 향상에 관한 연구)

박지연;정영미
- Proceedings of the Korean Society for Information Management Conference
- /
- 2001.08a
- /
- pp.143-148
- /
- 2001
본 연구에서는 공기기반 질의-용어간 유사도를 이용한 질의확장을 통해 단락검색의 성능을 향상시키는 방안을 제시하고자 하였다 실험을 통해 전체 문헌집단에 출현한 용어들의 공기정보에 기반한 전역적 질의확장과 이용자의 피드백 없이 초기검색 결과 중 상위 10개 문헌에 출현한 용어들의 공기정보에 기반한 지역적 질의확장의 성능을 비교하고 각각의 성능을 향상시키는 방법을 모색하였다. 마지막으로 문헌집단의 전역 정보와 지역 정보를 함께 이용하는 방안을 제시하고 그 성능을 평가하였다.
PDF

A Case Study on the Developmental Bibliotherapy for Self-Actualization (자아실현을 위한 발달적 독서치료의 사례연구)

Nam Tae Woo;Lee Wone Jee
- Journal of the Korean Society for Library and Information Science
- /
- v.39 no.2
- /
- pp.321-346
- /
- 2005
This research is a case study with the subject of 5th grade elementary students, practice a developmental bibliotherapy, and check the influence on the self-actualization. The subjects of this research is 20 students of 5th grade C elementary school in Gyeonggi Province, (including experimental group.10 students, control group: 10 students) and we practiced 12 times for 6 weeks. The result of the experimental group represents meaningful effects in the scales of inner-directed, self-actualization value, feeling reactivity, spontaneity, self-regard, self-acceptance. Then, we found in the lasting scale that the effect was lasting after 3 weeks. Therefore, we conclude that a developmental bibliotherapy can gives elementary students a positive influence, and help them form their proper notion as human being.
https://doi.org/10.4275/KSLIS.2005.39.2.321 인용 PDF

Hierarchic Document Clustering in OPAC (OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구)

노정순
- Journal of the Korean Society for information Management
- /
- v.21 no.1
- /
- pp.93-117
- /
- 2004
This study is to develop a hierarchic clustering model fur document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean). and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.
https://doi.org/10.3743/KOSIM.2004.21.1.093 인용 PDF

A Study on the Effect of Book-Trailers As a After Reading Activity (독후활동으로써 북트레일러의 효과 연구)

Choi, Yong-hoon;Cho, Hyun-Yang
- Journal of the Korean Society for Library and Information Science
- /
- v.49 no.3
- /
- pp.15-36
- /
- 2015
This research aims at executing a after reading activity using book trailers for the teenagers, and verifying its effects. For the investigation into its effectiveness, 6 classes on the first grade in a middle school were divided into 3 classes of an experimental group (104 students) and the other 3 classes of a comparative group (100 students), and a preliminary and a post test were conducted using the measurement tool on creativity and reading attitudes. As the result, the creativity, reading attitudes and understanding of the selected books of the experimental group were found to be better than those of the comparative group.
https://doi.org/10.4275/KSLIS.2015.49.3.015 인용 PDF KSCI

Inverse Document Frequency Weighting Revisited (역문헌빈도 가중치의 재검토)

이재윤
- Proceedings of the Korean Society for Information Management Conference
- /
- 2003.08a
- /
- pp.253-261
- /
- 2003
역문헌빈도 가중치는 문헌 집단에서 출현빈도가 낮을수록 색인어의 중요도가 높다는 가정에 근거하고 있다. 이 연구에서는 역문헌빈도 가중치의 가정에 의문을 제기하고, 이를 보완하는 새로운 문헌빈도 가중치 공식을 제안하였다. 제안한 가중치 공식은 저빈도어가 아닌 중간빈도어가 더 중요하다는 가정에 근거한 것으로서 역시 문헌빈도를 이용한 함수이다. 문헌빈도에 의한 가중치를 문헌의 색인어에 부여하는 경우와 질의어에 부여하는 경우로 나누어서 실험을 수행하고, 두 경우의 차이점을 논하였다.
PDF

Search Result 83, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)