• Title/Summary/Keyword: Query categorization

Search Result 8, Processing Time 0.018 seconds

A Study on Automatic Text Categorization of Web-Based Query Using Synonymy List (유사어 사전을 이용한 웹기반 질의문의 자동 범주화에 관한 연구)

  • Nam, Young-Joon;Kim, Gyu-Hwan
    • Journal of Information Management
    • /
    • v.35 no.4
    • /
    • pp.81-105
    • /
    • 2004
  • In this study, the way of the automatic text categorization on web-based query was implemented. X2 methods based on the Supported Vector Machine were used to test the efficiency of text categorization on queries. This test is carried out by the model using the Synonymy List. 713 synonyms were extracted manually from the tested documents. As the result of this test, the precision ratio and the recall ratio were decreased by -0.01% and by 8.53%, respectively whether the synonyms were assigned or not. It also shows that the Value of F1 Measure was increased by 4.58%. The standard deviation between the recall and precision ratio was improve by 18.39%.

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

Web Search Behavior Analysis Based on the Self-bundling Query Method (웹검색 행태 연구 - 사용자가 스스로 쿼리를 뭉치는 방법으로 -)

  • Lee, Joong-Seek
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.45 no.2
    • /
    • pp.209-228
    • /
    • 2011
  • Web search behavior has evolved. People now search using many diverse information devices in various situations. To monitor these scattered and shifting search patterns, an improved way of learning and analysis are needed. Traditional web search studies relied on the server transaction logs and single query instance analysis. Since people use multiple smart devices and their searching occurs intermittently through a day, a bundled query research could look at the whole context as well as penetrating search needs. To observe and analyze bundled queries, we developed a proprietary research software set including a log catcher, query bundling tool, and bundle monitoring tool. In this system, users' daily search logs are sent to our analytic server, every night the users need to log on our bundling tool to package his/her queries, a built in web survey collects additional data, and our researcher performs deep interviews on a weekly basis. Out of 90 participants in the study, it was found that a normal user generates on average 4.75 query bundles a day, and each bundle contains 2.75 queries. Query bundles were categorized by; Query refinement vs. Topic refinement and 9 different sub-categories.

A Study of Designing the Intelligent Information Retrieval System by Automatic Classification Algorithm (자동분류 알고리즘을 이용한 지능형 정보검색시스템 구축에 관한 연구)

  • Seo, Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.283-304
    • /
    • 2008
  • This is to develop Intelligent Retrieval System which can automatically present early query's category terms(association terms connected with knowledge structure of relevant terminology) through learning function and it changes searching form automatically and runs it with association terms. For the reason, this theoretical study of Intelligent Automatic Indexing System abstracts expert's index term through learning and clustering algorism about automatic classification, text mining(categorization), and document category representation. It also demonstrates a good capacity in the aspects of expense, time, recall ratio, and precision ratio.

  • PDF

A Hybrid Query Disambiguation Adaptive Approach for Web Information Retrieval

  • Ibrahim, Roliana;Kamal, Shahid;Ghani, Imran;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2468-2487
    • /
    • 2015
  • In web searching, trustable and precise results are greatly affected by the inherent uncertainty in the input queries. Queries submitted to search engines are by nature ambiguous and constitute a significant proportion of the instances given to web search engines. Ambiguous queries pose real challenges for the web search engines due to versatility of information. Temporal based approaches whereas somehow reduce the uncertainty in queries but still lack to provide results according to users aspirations. Web search science has created an interest for the researchers to incorporate contextual information for resolving the uncertainty in search results. In this paper, we propose an Adaptive Disambiguation Approach (ADA) of hybrid nature that makes use of both the temporal and contextual information to improve user experience. The proposed hybrid approach presents the search results to the users based on their location and temporal information. A Java based prototype of the systems is developed and evaluated using standard dataset to determine its efficacy in terms of precision, accuracy, recall, and F1-measure. Supported by experimental results, ADA demonstrates better results along all the axes as compared to temporal based approaches.

An Analytic Study on the Categorization of Query through Automatic Term Classification (용어 자동분류를 사용한 검색어 범주화의 분석적 고찰)

  • Lee, Tae-Seok;Jeong, Do-Heon;Moon, Young-Su;Park, Min-Soo;Hyun, Mi-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.133-138
    • /
    • 2012
  • Queries entered in a search box are the results of users' activities to actively seek information. Therefore, search logs are important data which represent users' information needs. The purpose of this study is to examine if there is a relationship between the results of queries automatically classified and the categories of documents accessed. Search sessions were identified in 2009 NDSL(National Discovery for Science Leaders) log dataset of KISTI (Korea Institute of Science and Technology Information). Queries and items used were extracted by session. The queries were processed using an automatic classifier. The identified queries were then compared with the subject categories of items used. As a result, it was found that the average similarity was 58.8% for the automatic classification of the top 100 queries. Interestingly, this result is a numerical value lower than 76.8%, the result of search evaluated by experts. The reason for this difference explains that the terms used as queries are newly emerging as those of concern in other fields of research.

Image Retrieval using Adaptable Weighting Scheme on Relevance Feedback (사용자 피드백 기반의 적응적 가중치를 이용한 정지영상 검색)

  • 이진수;김현준;윤경로;이희연
    • Journal of Broadcast Engineering
    • /
    • v.5 no.1
    • /
    • pp.61-67
    • /
    • 2000
  • Generally, relevance, feedback reflecting user's intention has been used to refine the refine the query conditions in image retrieval. However, in this paper, the usage of the relevance feedback is extended to the image database categorization so as to be accommodated to the user independent image retrieval. In our approach, to guarantee a desirable user-satisfactory performance descriptors and the elements of the descriptors corresponding unique features associatiated with of each image are weighted using the relevance feedback where experts can more lead rather than beginners do. In this paper, we propose a proper image description scheme consisting of global information, local information, descriptor weights and element weights based on color and texture descriptors. In addition, we also introduce an appropriate learning method based on the reliability scheme preventing wrong learning from abusive feedback.

  • PDF

Image Retrieval by Important Feature Weighting for Each Class (영상 클레스별 중요 특징 가중에 의한 영상 검색 방법)

  • Yoo, Donggeun;Park, Chaehoon;Choi, Yukyung;Kweon, In So
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.382-385
    • /
    • 2012
  • 이 논문에서는 영상 검색(image retrieval) 및 영상 부류(image categorization)을 위하여 영상을 기술할 때 영상의 클레스(class)별로 서로 다른 주요 특징량(feature)에 가중치 를 주는 방법론을 제안한다. 기존에 연구되어온 영상의 특징량 벡터에 가중치를 주는 방식은 모든 영상 클레스에 대하여 동일하게 가중치를 적용하기 때문에 영상이 클레스별로 서로 다른 특징량이 중요하다는 성질을 이용할 수 없다. 영상이 클레 별로 서로 다른 특징량이 중요하다는 성질을 이용하기 위하여 영상의 클레스별로 특징량 벡터에 서로 다른 가중치 벡터(weight vector)를 학습하였다. 그 후 질의 영상(query image)이 입력되면, 기존의 영상 검색 프레임워크(framework)를 통해 데이터베이 스(database)로 부터 미리 정의된 서브 클레스(sub-class)의 수에 해당하는 영상부 집합(subset)을 만들었다. 그리고 영상부 집합의 특징량 벡터들에 클레스별로 각각 학습된 가중치 벡터를 적용하여 특징량 벡터들 간의 거리를 다시 계산하여 리랭킹(re-ranking)하였다. 이 방법론을 UKBench Dataset에 적용하여 실험을 해보았으며 가중치를 주기 전과 비교 하였을 때 더 높은 정확도를 보였다.