A Study on Classification of Medical Information Documents using Word Correlation

Lim, Hyeong-Geon;Jang, Duk-Sung;

The KIPS Transactions:PartB (정보처리학회논문지B)

Volume 8B Issue 5
/
Pages.469-476
/
2001
/
1598-284X(pISSN)

Korea Information Processing Society (한국정보처리학회)

A Study on Classification of Medical Information Documents using Word Correlation

색인어 연관성을 이용한 의료정보문서 분류에 관한 연구

Lim, Hyeong-Geon ;
Jang, Duk-Sung (Dept. of Computer Engineering, Keimyung University)

임형근 ((주)아이큐패트) ;
장덕성 (계명대학교 컴퓨터공학과)

Published : 2001.10.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

As the service of information through web system increases in modern society, many questions and consultations are going on through Home page and E-mail in the hospital. But there are some burdens for the management and postponements for answering the questions. In this paper, we investigate the document classification methods as a primary research of the auto-answering system. On the basis of 1200 documents which are questions of patients, 66% are used for the learning documents and 34% for test documents. All of are also used for the document classification using NBC (Naive Bayes Classifier), common words and coefficient of correlation. As the result of the experiments, the two methods proposed in this paper, that is, common words and coefficient of correlation are higher as much as 3% and 5% respectively than the basic NBC methods. This result shows that the correlation between indexes and categories is more effective than the word frequency in the document classification.

현대사회에서 웹을 통한 정보 제공 서비스가 늘어나면서 병원에서도 홈페이지와 E-mail을 통하여 많은 질문과 상담이 진행되고 있다. 그러나, 이것은 관리자에 대한 업무부담과 답변에 대한 응답시간 지연의 문제가 있다. 본 논문에서는 이런 질의문서에 대한 자동응답시스템의 기초연구로 문서 분류 방법을 연구하였다. 실험방법으로 1200개의 환자질의문서를 대상으로 66%는 학습문서로, 34%는 테스트문서로 활용하여 이것을 NBC(Naive Bayes Classifier), 공통색인어, 연관계수를 이용한 문선분류에 사용하였다. 문서 분류 결과, 기본적인 NBC방법 보다는 본 논문에서 제안한 두 방법이 각각 3%, 5% 정도 더 높게 나타났다. 이러한 색인어의 빈도보다, 색인어와 카테고리간의 연관성이 문서 분류에 더 효과적이라는 것을 의미한다.

Keywords

References

강승헌, 유재수, '문자열 부분검색을 위한 색인기법의 설계 및 성능평가', 한국정보처리학회논문지, 제6권 제6호, pp.1458-1467, 1999
강호관, '한국어 의존관계 파싱에 적합한 구문단위의 정의', 포항공과대학교 석사학위논문, 1998
신진섭, 이창훈, '단어의 연관성을 이용한 문서의 자동분류', 한국정보처리학회 논문지, 제6권 제9호, pp.2422-2430, 1999
우종원, 윤승현, 유재수, '문서관리시스템을 위한 질의처리기 설계 및 구현', 한국정보처리학회논문지, 제6권 제6호, pp.1419-1432, 1999
이운재, '한국어 문서 태깅 시스템의 설계 및 구현', 한국과학기술원 석사 학위 논문, 1993
한국어 분석 라이브러리, http://ham.hansung.ac.kr/ham/ham.html/
L. Breiman, 'Bagging predicators,' Machine Learning 24(2), pp.123-140, 1996 https://doi.org/10.1023/A:1018054314350
C. Buckley, G. Salton and J. Allan, 'The Effect of Adding Relevance Information in a Relevance Feedback Environment,' Proc. 17th ACM SIGIR International Conference on Research and Development in Information Retrieval, pp.292-298, 1994
W. Cohen and Y. Singer, 'Context-sensitive learning methods for text categorization,' SIGIR-96, 1996 https://doi.org/10.1145/243199.243278
H. Drucker, 'Improving regressors using boosting techniques,' Proc. 14th Conf. on Machine Learning, Nashville TN, pp.107-115, 1997
T.Joachims, 'A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization' Proc. 14th Conf. on Machine Learning, Nashville TN, pp.143-146, 1997
G. H. John, and P. Langley, 'Estimating continuous distributions in Bayesian classifiers,' Proc. 11th Conf. on Uncertainty in Artificial Intelligence, Montreal Canada, pp.338-345, 1995
R. Kohavi, and M. Sahami, 'Error-based and entrophy-based discretization of continuous features,' Proc. 2nd Conf. on Knowledge Discovery and Data Mining, Portland OR, AAAI Press, pp.114-119, 1996
T. Mitchell, Machine Learning, McGraw-Hill, 1997
J. R. Quinlan, 'Induction of decision trees,' Machine Learning 1(1), pp.81-106, 1986 https://doi.org/10.1007/BF00116251
C. J. van Rijsbergen, Information Retrieval, Butterworths, London. 2nd Edition, 1979
G. Salton, and C. Buckely, 'Improving Retrieval Performance by Relevance Feedback,' Journal of the American Society for Information Science, 41(4), pp.288-297, 1990 https://doi.org/10.1002/(SICI)1097-4571(199006)41:4<288::AID-ASI8>3.0.CO;2-H
G. Salton, and C. Buckely, 'Term weighting approaches in automatic text retrieval,' Technical Report pp.87-881, Cornell University, Department of Computer Science 1987
G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York. 1983
R. E. Schapire, Y. Freund, P. Bartlett, and W.S Lee, 'Boosting the margin : A New explanation for the effectiveness of voting methods,' Proc. 14th Conf. on Machine Learning, Nashville TN, pp.322-330, 1997
K. M. Ting, and I.H. Witten, 'Stacked generalization : When does it work?,' Proc. 15th Joint Conf. on Artificial Intelligence, Nagoya Japan, pp.866-871, 1997
I. H. Witten, and E. Frank, Data Mining : Practical Machine Learning Tools and Techniques with Java Implementations, Academic Press, 2000

The KIPS Transactions:PartB (정보처리학회논문지B)

A Study on Classification of Medical Information Documents using Word Correlation

색인어 연관성을 이용한 의료정보문서 분류에 관한 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)