• Title/Summary/Keyword: 단어 검색

Search Result 560, Processing Time 0.026 seconds

The Design and Implementation of Automatic Query Term Refiner for Term Expansion/Restriction in Information Retrieval (정보검색에서 질의 용어 확장/한정을 위한 자동 질의 용어 정련기의 설계 및 구현)

  • Kang, Hyun-Su;Kang, Hyun-Kyu;Lee, Yong-Seok;Kim, Young-Sum
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.65-72
    • /
    • 1998
  • 인터넷 정보 검색에서 이용자들이 주로 사용하는 질의는 2-3개의 용어로 이루어진 짧은 질의이다. 또만 동음이의어를 갖는 용어를 사용하기도 한다. 짧은 질의를 처리하는 일반적인 방법은 시소러스[8]나 Wordnet[1]을 이용한 질의 확장이다. 그러나 시소러스나 Wordnet과 같은 지식 베이스는 구축하기가 용이하지 않으며, 도메인 종속적인 면과 단어의 회귀(sparseness) 문제를 극복하기 어려운 단점이 있다. 또한 동음이의어 용어로 인하여 검색의 정확성이 털어지는 문제점이 있다. 한편, 사용자의 질의를 주의 깊게 살펴보면, 질의로부터 관련 용어 분류 정보를 추출할 수 있다. 본 논문은 사용자의 질의가 관련 용어 분류 정보에 의해 유기적으로 관계를 가지고 있다는 사실에 기인하여 관련 용어 분류 정보에 따라 자동으로 용어 확장 및 한정을 수행하며 적절한 용어 가중치를 부여하는 자동 질의 용어 정련기를 제안한다. 자동 질의 용어 정련기는 용어의 확장, 한정 및 가중치 부여를 통하여 사용자의 정보 검색 요구를 명확히 하여 검색의 정확성을 향상시킨다.

  • PDF

A Relation Analysis between NDSL User Queries and Technical Terms (NDSL 검색 질의어와 기술용어간의 관계에 대한 분석적 연구)

  • Kang, Nam-Gyu;Cho, Min-Hee;Kwon, Oh-Seok
    • Journal of Information Management
    • /
    • v.39 no.3
    • /
    • pp.163-177
    • /
    • 2008
  • In this paper, we analyzed the relationship between user query keywords that is used to search NDSL and technical terms extracted from NDSL journals. For the analysis, we extracted about 833,000 query keywords from NDSL search logs during nearly 17 months and approximately 41,000,000 technical terms from NDSL, INSPEC, FSTA journals. And we used only the English noun phrase in extracted those and then we did an experiment on analysis of equality, relationship analysis and frequency analysis.

Experimental Analysis of Correct Answer Characteristics in Question Answering Systems (질의응답시스템에서 정답 특징에 관한 실험적 분석)

  • Han, Kyoung-Soo
    • Journal of Digital Contents Society
    • /
    • v.19 no.5
    • /
    • pp.927-933
    • /
    • 2018
  • One of the factors that have the greatest influence on the error of the question answering system that finds and provides answers to natural language questions is the step of searching for documents or passages that contain correct answers. In order to improve the retrieval performance, it is necessary to understand the characteristics of documents and passages containing correct answers. This paper experimentally analyzes how many question words appear in the correct answer documents, how the location of the question word is distributed, and how the topic of the question and the correct answer document are similar using the corpus composed of the question, the documents with correct answer, and the documents without correct answer. This study explains the causes of previous search research results for question answer system and discusses the necessary elements of effective search step.

A Two-Phase On-Device Analysis for Gender Prediction of Mobile Users Using Discriminative and Popular Wordsets (모바일 사용자의 성별 예측을 위한 식별 및 인기 단어 집합 기반 2단계 기기 내 분석)

  • Choi, Yerim;Park, Kyuyon;Kim, Solee;Park, Jonghun
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.1
    • /
    • pp.65-77
    • /
    • 2016
  • As respecting one's privacy becomes an important issue in mobile device data analysis, on-device analysis is getting attention, in which the data analysis is conducted inside a mobile device without sending data from the device to outside. One possible application of the on-device analysis is gender prediction using text data in mobile devices, such as text messages, search keyword, website bookmarks, and contact, which are highly private, and the limited computing power of mobile devices can be addressed by utilizing the word comparison method, where words are selected beforehand and delivered to a mobile device of a user to determine the user's gender by matching mobile text data and the selected words. Moreover, it is known that performing prediction after filtering instances using definite evidences increases accuracy and reduces computational complexity. In this regard, we propose a two-phase approach to on-device gender prediction, where both discriminability and popularity of a word are sequentially considered. The proposed method performs predictions using a few highly discriminative words for all instances and popular words for unclassified instances from the previous prediction. From the experiments conducted on real-world dataset, the proposed method outperformed the compared methods.

The Tresnds of Artiodactyla Researches in Korea, China and Japan using Text-mining and Co-occurrence Analysis of Words (텍스트마이닝과 동시출현단어분석을 이용한 한국, 중국, 일본의 우제목 연구 동향 분석)

  • Lee, Byeong-Ju;Kim, Baek-Jun;Lee, Jae Min;Eo, Soo Hyung
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.1
    • /
    • pp.9-15
    • /
    • 2019
  • Artiodactyla, which is an even-toed mammal, widely inhabits worldwide. In recent years, wild Artiodactyla species have attracted public attention due to the rapid increase of crop damage and road-kill caused by wild Artiodactyla such as water deer and wild boar and the decrease of some species such as long-tailed goral and musk deer. In spite of such public attention, however, there have been few studies on Artiodactyla in Korea, and no studies have focused on the trend analysis of Artiodactyla, making it difficult to understand actual problems. Many recent studies on trend used text-mining and co-occurrence analysis to increase objectivity in the classification of research subjects by extracting keywords appearing in literature and quantifying relevance between words. In this study, we analyzed texts from research articles of three countries (Korea, China, and Japan) through text-mining and co-occurrence analysis and compared the research subjects in each country. We extracted 199 words from 665 articles related to Artiodactyla of three countries through text-mining. Three word-clusters were formed as a result of co-occurrence analysis on extracted words. We determined that cluster1 was related to "habitat condition and ecology", cluster2 was related to "disease" and cluster3 was related to "conservation genetics and molecular ecology". The results of comparing the rates of occurrence of each word clusters in each country showed that they were relatively even in China and Japan whereas Korea had a prevailing rate (69%) of cluster2 related to "disease". In the regression analysis on the number of words per year in each cluster, the number of words in both China and Japan increased evenly by year in each cluster while the rate of increase of cluster2 was five times more than the other clusters in Korea. The results indicate that Korean researches on Artiodactyla tended to focus on diseases more than those in China and Japan, and few researchers considered other subjects including habitat characteristics, behavior and molecular ecology. In order to control the damage caused by Artiodactyla and to establish a reasonable policy for the protection of endangered species, it is necessary to accumulate basic ecological data by conducting researches on wild Artiodactyla more.

Sentence Similarity Measurement Method Using a Set-based POI Data Search (집합 기반 POI 검색을 이용한 문장 유사도 측정 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.711-716
    • /
    • 2014
  • With the gradual increase of interest in plagiarism and intelligent file content search, the demand for similarity measuring between two sentences is increasing. There is a lot of researches for sentence similarity measurement methods in various directions such as n-gram, edit-distance and LSA. However, these methods have their own advantages and disadvantages. In this paper, we propose a new sentence similarity measurement method approaching from another direction. The proposed method uses the set-based POI data search that improves search performance compared to the existing hard matching method when data includes the inverse, omission, insertion and revision of characters. Using this method, we are able to measure the similarity between two sentences more accurately and more quickly. We modified the data loading and text search algorithm of the set-based POI data search. We also added a word operation algorithm and a similarity measure between two sentences expressed as a percentage. From the experimental results, we observe that our sentence similarity measurement method shows better performance than n-gram and the set-based POI data search.

Accuracy Improvement Methods for String Similarity Measurement in POI(Point Of Interest) Data Retrieval (POI(Point Of Interest) 데이터 검색에서 문자열 유사도 측정 정확도 향상 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.498-506
    • /
    • 2014
  • With the development of smart transportation, people are likely to find their paths by using navigation and map application. However, the existing retrieval system cannot output the correct retrieval result due to the inaccurate query. In order to remedy this problem, set-based POI search algorithm was proposed. Subsequently, additionally a method for measuring POI name similarity and POI search algorithm supporting classifying duplicate characters were proposed. These algorithms tried to compensate the insufficient part of the compensate set-based POI search algorithm. In this paper, accuracy improvement methods for measuring string similarity in POI data retrieval system are proposed. By formulization, similarity measurement scheme is systematized and generalized with the development of transportation. As a result, it improves the accuracy of the retrieval result. From the experimental results, we can observe that our accuracy improvement methods show better performance than the previous algorithms.

A minimal pair searching tool based on dictionary (사전 기반 최소대립쌍 검색 도구)

  • Kim, Tae-Hoon;Lee, Jae-Ho;Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.2
    • /
    • pp.117-122
    • /
    • 2014
  • The minimal pairs mean the pairs that have same phonotactics except just one sound in the sequences cause different lexical items. This paper proposes the searching tool of minimal pairs for efficiency of phonological researches with minimal pairs. We suggest a guide to develop Korean minimal pair searching programs by comparing to other programs. Proposing tool has user-friendly interface, minimizing key inputs, for linguistics who are not fluent in computer programs. And it serves the function which classifies the words in dictionary for the detailed researches. And for efficiency, it increases speed of dictionary loading by separating syllables through Unicode analysis, and optimizes dictionary structure for searching efficiency. The searching algorithm gains in speed by hashing algorithm using syllable counts. In our tool, the speed is improved more than earlier version about 5 times at converting dictionary and about 3 times at searching.

Extended Semantic Web Services Retrieval Model for the Intelligent Web Services (지능형 웹 서비스를 위한 확장된 시맨틱 웹서비스 검색 모델)

  • Choi, Ok-Kyung;Han, Sang-Yong;Lee, Zoon-Ky
    • The KIPS Transactions:PartD
    • /
    • v.13D no.5 s.108
    • /
    • pp.725-730
    • /
    • 2006
  • Recently Web services have become a key technology which is indispensable for e-business. Due to its ability to provide the desired information or service regardless of time and place, integrating current application systems within a single business or between multiple businesses with standardized technologies are realized using the open network and Internet. However, the current Web Services Retrieval Systems, based on text oriented search are incapable of providing reliable search results by perceiving the similarity or interrelation between the various terms. Currently there are no web services retrieval models containing such semantic web functions. This research work is purported for solving such problems by designing and implementing an extended Semantic Web Services Retrieval Model that is capable of searching for general web documents, UDDI and semantic web documents. Execution result is proposed in this paper and its efficiency and accuracy are verified through it.

An Effcient Two-Level Hybrid Signature File Method for Large Text Databases (대용량 텍스트 데이터베이스를 위한 효율적인 2단계 합성 요약 화일 방법)

  • Yoo, Jae-Soo;Gang, Hyeong-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.4
    • /
    • pp.923-932
    • /
    • 1997
  • In this paper, we propose a two-level hybrid signature file method(THM) to dffciently deal with large txt databases that use a term discrimination concept.In addition, we apply Yoo's clustering scheme to the two-level hybeid signature file method. The clustering schme groups similar signatures together according to the similarity of the highly discriminatiory tems so that we may achive better performance on retrival. The space-time ana-lyhtical model of the proposed two-level hybrid method is provided. Based on the analytical model and experiments, we compare it with the exsting methods, i.e. the bit-sliced method(BM), the-level method(TM), and the hybrid method(HM). As a result, we show that THM achives the best retrival performance in a large database with 100,000 records when the mumber fo matching records is less than 160.

  • PDF