Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis

Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 26 Issue 10
/
Pages.1219-1229
/
1999
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis

구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성

박미화 (포항공과대학교 정보통신대학원) ;
원형석 (포항공과대학 전자계산과) ;
이근배 (포항공과대학교 전자계산학과)

Published : 1999.10.01

⟨ Previous Next ⟩

Abstract

일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

Keywords

References

CACM v.26 Extended Boolean Information Retrieval Gerald Salton;Edward A. Fox;Harry Wu
Proceedings of 14th ACM SIGIR Conference The Use of Phrases and Structured Quries in Information Retrieval W. Bruce Croft;Howard R. Turtle;David D. Lewis
Journal of the American Society For Information Science v.34 Automatic Query Formulation in Information Retrieval Gerald Salton;C.Buckley;E.A.Fox
Aspects of the P-NORM model of Information Retrieval : Syntactic Query Generation, Efficiency and Theoretical Properties Smith, M.E.
한국정보과학회 추계 학술발표논문집(A) 옥서에서의 표제어와 자연어 검색의 설계 및 구현 강현규;장호욱;이승률;박세영
HCI'95 학술대회 발표논문집 v.2 자연언어에 대한 정보검색 불리언 질의문의 생성 박의규;양성일;나동렬;김영환
한국어 정보과학회논문지 v.21 no.7 상호정보에 기반한 한국어 텍스트의 복합어 자동색인 김판구;조유근
단일화 기반 범주문법에 기반한 음성 한국어 처리 이원일
Proceedings of the Sixth Workshop on Very Large Corpora Generalized Unknown Morpheme Guessing for Hybrid POS Tagging of Korean Jeongwon Cha;Geunbae Lee;Jong-Hyeok Lee
Extending the Boolean and Vector Space Models of Information Retrieval with P-norm Queries and Multiple Concepts Types E.A. Fox
제6회 한글 및 한국어 정보처리 학술발표 논문집 한국어 정보검색 연구를 위한 시험용 데이타 모음(KTSET)개발 김재군;김영환;김성혁

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis

구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)