Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets

Han, Jeong-Gi;Park, Min-Gyu;Jo, Gwang-Je;Kim, Jun-Tae;

한국정보처리학회논문지 (The Transactions of the Korea Information Processing Society)

제7권4호
/
Pages.1150-1159
/
2000
/
1226-9190(pISSN)

한국정보처리학회 (Korea Information Processing Society)

구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상

Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets

한정기 (동국대학교 대학원 컴퓨터공학과) ;
박민규 (웹 패턴 테크놀로지 개발실) ;
조광제 (서울시스템 DTP 사업본부) ;
김준태 (동국대학교 컴퓨터공학과)

Han, Jeong-Gi (Dept.of Computer Engineering, Dongguk University) ;
Park, Min-Gyu ;
Jo, Gwang-Je ;
Kim, Jun-Tae (Dept.of Computer Engineering, Dongguk University)

발행 : 2000.04.01

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.

키워드

참고문헌

M Blosseville. G. Hebrail, M Monteil, and N Penot., 'Automatic document ciassification . natural language processing. statistical analysis, and expert system techniques used together,' SIGIR'92, 1992 https://doi.org/10.1145/133160.133175
W. Frakes. and R Baeza- Yates, Information Retrieval, Prentice Hall, 1992
N. Fuhr, 'Models for retrieval with probabilistic indexing,' Information Processing and Management, 25(1), 1989 https://doi.org/10.1016/0306-4573(89)90091-5
K. Hamill and A Zamora. 'The Use of Titles for Automatic Document Classification,' Journal of the ？American Society for Information Science, 1980 https://doi.org/10.1002/asi.4630310603
D. Harman, 'Ranking algorithms,' in Information Retrieval Data Structures and Algorithms, Prentice Hall, 1992
P. Hayes and S. Wernstem, 'CONSTRUE/TIS . A system for content-based indexing of a database of news stories,' Second Annual Conference on Innovative Applicatins of Artificial Intelligence, 1990
P. Hayes, P. Anderson, I. Nirenburg, and L. Schmandt 'TCS A Shell for Content-based Text Categonzation,' Proceedings qf the 6th IEEE Conference on Artiticial lntelligence Applications. Santa Monica, March 1990
J. R. Hobbs, D. Appelt, M. Tyson, J. Bear and D. Israel, 'FASTUS : System summary,'Proceedings of Fourth Message Understanding Conference, 1992
R Hoch., 'Using IR techniques for text classification in document analysis,'SIGIR'94, 1994
P. Jacobs, Text-Based Intelligent Systems. Lawrence Erlbaum, 1992
P. Jacobs., 'Using statistical methods to Improve knowledge-based news categorization,'IEEE Expert, April, 1993 https://doi.org/10.1109/64.207425
J. Hobbs, D. Appelt, J. Bear, D. Israel, and M. Tyson 'FASTUS : A System for Extracting Information from Natural-Language Text'
L. Larkey and W. Croft, 'Combining classifiers in text categorization,' SIGIR'96, 1996 https://doi.org/10.1145/243199.243276
D. Lewis, 'An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task,' SIGIR'92 https://doi.org/10.1145/133160.133172
D. Lewis, 'Evaluation and optimizing autonomous text classification system,' SIGIR'.95 https://doi.org/10.1145/215206.215366
D. Lewis., R. Schapire, and J. Callan, 'Training algorithms for Imear text classifiers,'SIGIR'96
B. Masand, 'Classifying News Stories using Memory Based Reasoning.' SIGlR`92 https://doi.org/10.1145/133160.133177
M. Maron. 'Automatic indexing . An experimental inquiry.' Journal of the ACM, 1961 https://doi.org/10.1145/321075.321084
Proceedings of the Fourth Message Understanding Conference. Morgan Kaufmann. CA 1992
G Salton Automatic Text Processing : The Trans- formation, Analysis, and Retrieval of information by Computer. Addison Wesley Publishing Co., 1989
강승식,이하규,'한국어 형태소 분석기HAM의 형태소분석 및 철자 검사 기능',한글 및 한국어 정보처리 학술 발표논문집.1996
김재군,김영환,김성혁,'한국어 정보검색 연구를 위한 시험용 데이터 모음 KTSET 개발', 한글 및 한국어 정보처리 학술 발표논문집,1996
박미경,김민정,'부분 파싱을 이용한 한국어 명사구,술이구와 접사의 색인 기법',정보과학회 학술발표 논문집,4,1997
송재관,홍성웅,박찬곤'기계 번역을 위한 한국어 문장 패턴에 관한 연구', 정보과학회 학술발표 논문집,10.1996
엄미현,신대규,나동필,'한국어의 구조적 예매성', 정보과학회 학술발표 논문집,4.1996
임해창,임희석,윤보현,'자연어처리 연구 동향:통계 기반의 자연어 처리',한국정보과학회지,Vol.12 No.9,pp.20-30,1994
정영미,정보검색론,구미무역 출판부,1993
조광재,김준태,'역 카테고리 빈도에 의한 계층적 분류체계에서의 문서의 자동 분류', 정보과학회 학술발표 논문집.4,1997
최동시,정경택.'카테고리와 키워드의 밀접성 정보에 의한 문서 자동 분류 시스템 설계 및 구현', 정보 과학회 학술 발표 논문집,10.1995

한국정보처리학회논문지 (The Transactions of the Korea Information Processing Society)

구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상

Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)