• Title/Summary/Keyword: Text-search

Search Result 554, Processing Time 0.024 seconds

Interplay of Text Mining and Data Mining for Classifying Web Contents (웹 컨텐츠의 분류를 위한 텍스트마이닝과 데이터마이닝의 통합 방법 연구)

  • 최윤정;박승수
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.3
    • /
    • pp.33-46
    • /
    • 2002
  • Recently, unstructured random data such as website logs, texts and tables etc, have been flooding in the internet. Among these unstructured data there are potentially very useful data such as bulletin boards and e-mails that are used for customer services and the output from search engines. Various text mining tools have been introduced to deal with those data. But most of them lack accuracy compared to traditional data mining tools that deal with structured data. Hence, it has been sought to find a way to apply data mining techniques to these text data. In this paper, we propose a text mining system which can incooperate existing data mining methods. We use text mining as a preprocessing tool to generate formatted data to be used as input to the data mining system. The output of the data mining system is used as feedback data to the text mining to guide further categorization. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We apply this method to categorize web sites containing adult contents as well as illegal contents. The result shows improvements in categorization performance for previously ambiguous data.

  • PDF

Question and Answering System through Search Result Summarization of Q&A Documents (Q&A 문서의 검색 결과 요약을 활용한 질의응답 시스템)

  • Yoo, Dong Hyun;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.4
    • /
    • pp.149-154
    • /
    • 2014
  • A user should pick up relevant answers by himself from various search results when using user participation question answering community like Knowledge-iN. If refined answers are automatically provided, usability of question answering community must be improved. This paper divides questions in Q&A documents into 4 types(word, list, graph and text), then proposes summarizing methods for each question type using document statistics. Summarized answers for word, list and text type are obtained by question clustering and calculating scores for words using frequency, proximity and confidence of answers. Answers for graph type is shown by extracting user opinion from answers.

A Survey on User Interface Design of University Webzines (대학 웹진의 사용자 인터페이스 디자인 조사)

  • Lee, Joo-Hee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.303-308
    • /
    • 2014
  • This paper deals with interface design of a university webzines that to search through an internet portal site Naver. It was obtained the following conclusions. First, university webzines are using a hypertext link to such images, text, movie. Second, it could be seen that mainly been using block grid, the module grid, and a transformed layout of 2 tier grid. Third, Seoul woman's University, Kyungpook National University, and Korea Maritime University's webzines found that layout, color, user-friendly access the structure. Fourth, webzine was used the text or image a link, search function, site map, icon, favorites, quick menus, navigation bars, and rollover menu. Last, university webzines were shown to contribute mere to the enhancement of its value as a promotional medium.

A Study in the Preference of e-Learning Contents Delivery Types on Web Information Search Literacy in the case of Agricultural High School (농업계 고등학교 학생들의 정보검색 능력에 따른 이러닝 콘텐츠 유형 선호도 연구)

  • Yu, Byeong-Min;Kim, Su-Wook;Park, Sung-Youl;Choi, Jun-Sik
    • Journal of Agricultural Extension & Community Development
    • /
    • v.16 no.2
    • /
    • pp.463-486
    • /
    • 2009
  • The purpose of this study was to find out the differences of preferences in e-Learning contents delivery types according to information searching retrieval ability in agricultural high school students. Contents delivery types are limited three kinds which are HTML type, video type, and text type and need to know about differences. The following summarizes the results of this study. On the preference of e-Learning contents delivery type on information searching retrieval ability had differences. High level group of information searching retrieval ability showed that they mostly preferred text contents delivery type. However, low level group of information searching retrieval ability showed that they preferred video contents delivery type. The results support our belief that there could be the differences in preferences in e-Learning delivery types with students' information searching retrieval abilities. We suggest that delivery types of e-Learning should be based on the students not on designers and developers.

  • PDF

An Image Retrieving Scheme Using Salient Features and Annotation Watermarking

  • Wang, Jenq-Haur;Liu, Chuan-Ming;Syu, Jhih-Siang;Chen, Yen-Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.1
    • /
    • pp.213-231
    • /
    • 2014
  • Existing image search systems allow users to search images by keywords, or by example images through content-based image retrieval (CBIR). On the other hand, users might learn more relevant textual information about an image from its text captions or surrounding contexts within documents or Web pages. Without such contexts, it's difficult to extract semantic description directly from the image content. In this paper, we propose an annotation watermarking system for users to embed text descriptions, and retrieve more relevant textual information from similar images. First, tags associated with an image are converted by two-dimensional code and embedded into the image by discrete wavelet transform (DWT). Next, for images without annotations, similar images can be obtained by CBIR techniques and embedded annotations can be extracted. Specifically, we use global features such as color ratios and dominant sub-image colors for preliminary filtering. Then, local features such as Scale-Invariant Feature Transform (SIFT) descriptors are extracted for similarity matching. This design can achieve good effectiveness with reasonable processing time in practical systems. Our experimental results showed good accuracy in retrieving similar images and extracting relevant tags from similar images.

Information Searching on STN Web (STN Easy & ChemPort) (인터넷 웹에서의 STN 검색)

  • Yoo, Sun-Hi
    • Journal of Information Management
    • /
    • v.30 no.1
    • /
    • pp.11-28
    • /
    • 1999
  • STN(The Scientific & Technical Information Network) is a fee-based, comprehensive, online search service that provides. accurate, up-to-date information from over 200 scientific, technical, business, and patent databases. STN Easy(http: //stneasy.cas.org) provides point-and-click access to 59 selected key STN databases on the web, and it gives drawings and 3-dimensional chemical structures as well as citation-abstract informations. And information searchers are now able to access full-text documents from key scientific publishers and patent offices through STN Easy via the ChemPort(http://www.chemport.org) connection.

  • PDF

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

Selecting a key issue through association analysis of realtime search words (실시간 검색어 연관 분석을 통한 핵심 이슈 선정)

  • Chong, Min-Yeong
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.161-169
    • /
    • 2015
  • Realtime search words of typical portal sites appear every few seconds in descending order by search frequency in order to show issues increasing rapidly in interest. However, the characteristics of realtime search words reordering within too short a time cause problems that they go over the key issues of the day. This paper proposes a method for deriving a key issue through association analysis of realtime search words. The proposed method first makes scores of realtime search words depending on the ranking and the relative interest, and derives the top 10 search words through descriptive statistics for groups. Then, it extracts association rules depending on 'support' and 'confidence', and chooses the key issue based on the results as a graph visualizing them. The results of experiments show that the key issue through association rules is more meaningful than the first realtime search word.

Dataset Search System Using Metadata-Based Ranking Algorithm (메타데이터 기반 순위 알고리즘을 활용한 데이터셋 검색 시스템)

  • Choi, Wooyoung;Chun, Jonghoon
    • Journal of Broadcast Engineering
    • /
    • v.27 no.4
    • /
    • pp.581-592
    • /
    • 2022
  • Recently, as the requirements for using big data have increased, interest in dataset search technology needed for data analysis is also growing. Although it is necessary to proactively utilize metadata, unlike conventional text search, research on such dataset search systems has not been actively carried out. In this paper, we propose a new dataset-tailored search system that indexes metadata of datasets and performs dataset search based on metadata indices. The ranking given to the dataset search results from a newly devised algorithm that reflects the unique characteristics of the dataset. The system provides the capability to search for additional datasets which correlate with the dataset searched by the user-submitted query so that multiple datasets needed for analysis can be found at once.

Legal search method using S-BERT

  • Park, Gil-sik;Kim, Jun-tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.57-66
    • /
    • 2022
  • In this paper, we propose a legal document search method that uses the Sentence-BERT model. The general public who wants to use the legal search service has difficulty searching for relevant precedents due to a lack of understanding of legal terms and structures. In addition, the existing keyword and text mining-based legal search methods have their limits in yielding quality search results for two reasons: they lack information on the context of the judgment, and they fail to discern homonyms and polysemies. As a result, the accuracy of the legal document search results is often unsatisfactory or skeptical. To this end, This paper aims to improve the efficacy of the general public's legal search in the Supreme Court precedent and Legal Aid Counseling case database. The Sentence-BERT model embeds contextual information on precedents and counseling data, which better preserves the integrity of relevant meaning in phrases or sentences. Our initial research has shown that the Sentence-BERT search method yields higher accuracy than the Doc2Vec or TF-IDF search methods.