• 제목/요약/키워드: Text Retrieval

검색결과 344건 처리시간 0.027초

Enabling a fast annotation process with the Table2Annotation tool

  • Larmande, Pierre;Jibril, Kazim Muhammed
    • Genomics & Informatics
    • /
    • 제18권2호
    • /
    • pp.19.1-19.6
    • /
    • 2020
  • In semantic annotation, semantic concepts are linked to natural language. Semantic annotation helps in boosting the ability to search and access resources and can be used in information retrieval systems to augment the queries from the user. In the research described in this paper, we aimed to identify ontological concepts in scientific text contained in spreadsheets. We developed a tool that can handle various types of spreadsheets. Furthermore, we used the NCBO Annotator API provided by BioPortal to enhance the semantic annotation functionality to cover spreadsheet data. Table2Annotation has strengths in certain criteria such as speed, error handling, and complex concept matching.

개념적 거리와 밀도를 이용한 웹 문서 검색 (Web Document Retrieval based on Conceptual Distance and Density)

  • 황희철;최창;김판구
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2006년도 춘계학술발표대회
    • /
    • pp.817-820
    • /
    • 2006
  • 최근 인터넷 기술의 비약적인 발전으로 웹상에 많은 양의 정보가 존재하고, 많은 사람들이 이를 검색하고 활용하게 되었다. 그러나 기존의 검색방식은 단순히 텍스트 매칭(Text matching) 방법을 사용하고 있어 많은 자료들 사이에서 자신이 원하는 자료를 찾는데 어려움이 있다. 이에 본 논문에서는 검색할 자료의 정보를 바탕으로 그와 유사한 자료를 검색해주는 웹 문서 검색 시스템을 제안하고자 한다. 이를 위해 울산대학교 어휘 지능망인 U-WIN을 기반으로 개념적 밀도와 단어 간의 유사성 측정을 이용하여 의미적인 검색이 되도록 하였다.

  • PDF

사용자 중심의 멀티미디어 설계: 할인 사용성 공학의 적용 (User-centered multimedia design: The application of discount usability engineering)

  • 임치환
    • 산업경영시스템학회지
    • /
    • 제20권41호
    • /
    • pp.189-196
    • /
    • 1997
  • Multimedia systems present information by various media, for example, video, sound, music, animation, movie, etc., in addition to the text which has long been used for conveying the information. But using several media may cause users' confusion and poorly designed user interface often aggravate the situation. Hypermedia systems allow the retrieval and representation of multimedia information using navigation and browsing mechanisms. Typically, there are two major navigation problems in a hypermedia compared to the ordinary user interface: disorientation and cognitive overload. In this study, the multimedia system was studied from the viewpoint of usability. Practical usability evaluation needs cost-effective, low-skill, and low- investment methods. The 'discount usability engineering' method, one of these methods, is based on the use of the following techniques: scenarios, simplified thinking aloud, and heuristic evaluation. The discount usability engineering method was applied to the usability evaluation of multimedia CD-ROM title.

  • PDF

Validity Study of Kohonen Self-Organizing Maps

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제10권2호
    • /
    • pp.507-517
    • /
    • 2003
  • Self-organizing map (SOM) has been developed mainly by T. Kohonen and his colleagues as a unsupervised learning neural network. Because of its topological ordering property, SOM is known to be very useful in pattern recognition and text information retrieval areas. Recently, data miners use Kohonen´s mapping method frequently in exploratory analyses of large data sets. One problem facing SOM builder is that there exists no sensible criterion for evaluating goodness-of-fit of the map at hand. In this short communication, we propose valid evaluation procedures for the Kohonen SOM of any size. The methods can be used in selecting the best map among several candidates.

교차 언어 문서 검색에서 질의어의 중의성 해소 방법 (Word Sense Disambiguation in Query Translation of CLTR)

  • 강인수;이종혁;이근배
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 1997년도 제9회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.52-58
    • /
    • 1997
  • 정보 검색에서는 질의문과 문서를 동일한 표현으로 변환시켜 관련성을 비교하게 된다. 특히 질의문과 문서의 언어가 서로 다른 교차 언어 문서 검색 (CLTR : Cross-Language Text Retrieval) 에서 이러한 변환 과정은 언어 변환을 수반하게 된다. 교차 언어 문서 검색의 기존 연구에는 사전, 말뭉치, 기계 번역 등을 이용한 방법들이 있다. 일반적으로 언어간 변환에는 필연적으로 의미의 중의성이 발생되며 사전에 기반한 기존 연구에서는 다의어의 중의성 의미해소를 고려치 않고 있다. 본 연구에서는 질의어의 언어 변환시 한-일 대역어 사전 및 카도가와 시소러스 (각천(角川) 시소러스) 에 기반한 질의어 중의성 해소 방법과 공기하는 대역어를 갖는 문서에 가중치를 부여하는 방법을 제안한다. 제안된 방법들은 일본어 특허 문서를 대상으로 실험하였으며 5 %의 정확도 향상을 얻을 수 있었다.

  • PDF

Finding approximate occurrence of a pattern that contains gaps by the bit-vector approach

  • Lee, In-Bok;Park, Kun-Soo
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.193-199
    • /
    • 2003
  • The application of finding occurrences of a pattern that contains gaps includes information retrieval, data mining, and computational biology. As the biological sequences may contain errors, it is important to find not only the exact occurrences of a pattern but also approximate ones. In this paper we present an O(mnk$_{max}$/w) time algorithm for the approximate gapped pattern matching problem, where m is the length of the text, H is the length of the pattern, w is the word size of the target machine, and k$_{max}$ is the greatest error bound for subpatterns.

  • PDF

Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods

  • Im, ChanJong;Kim, DoWan;Mandl, Thomas
    • International Journal of Contents
    • /
    • 제13권2호
    • /
    • pp.66-74
    • /
    • 2017
  • Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain's experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.

가중치 기반 PLSA를 이용한 문서 평가 분석 (Reputation Analysis of Document Using Probabilistic Latent Semantic Analysis Based on Weighting Distinctions)

  • 조시원;이동욱
    • 전기학회논문지
    • /
    • 제58권3호
    • /
    • pp.632-638
    • /
    • 2009
  • Probabilistic Latent Semantic Analysis has many applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. In this paper, we propose an algorithm using weighted Probabilistic Latent Semantic Analysis Model to find the contextual phrases and opinions from documents. The traditional keyword search is unable to find the semantic relations of phrases, Overcoming these obstacles requires the development of techniques for automatically classifying semantic relations of phrases. Through experiments, we show that the proposed algorithm works well to discover semantic relations of phrases and presents the semantic relations of phrases to the vector-space model. The proposed algorithm is able to perform a variety of analyses, including such as document classification, online reputation, and collaborative recommendation.

Graph based KNN for Optimizing Index of News Articles

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • 제3권3호
    • /
    • pp.53-61
    • /
    • 2016
  • This research proposes the index optimization as a classification task and application of the graph based KNN. We need the index optimization as an important task for maximizing the information retrieval performance. And we try to solve the problems in encoding words into numerical vectors, such as huge dimensionality and sparse distribution, by encoding them into graphs as the alternative representations to numerical vectors. In this research, the index optimization is viewed as a classification task, the similarity measure between graphs is defined, and the KNN is modified into the graph based version based on the similarity measure, and it is applied to the index optimization task. As the benefits from this research, by modifying the KNN so, we expect the improvement of classification performance, more graphical representations of words which is inherent in graphs, the ability to trace more easily results from classifying words. In this research, we will validate empirically the proposed version in optimizing index on the two text collections: NewsPage.com and 20NewsGroups.

CIM 구축을 위한 FMC 운용 소프트웨어 개발 (Progress on the development of FMC control software for CIM)

  • 이경휘;김의석;정무영;서석환;고병철
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1991년도 한국자동제어학술회의논문집(국내학술편); KOEX, Seoul; 22-24 Oct. 1991
    • /
    • pp.821-825
    • /
    • 1991
  • This paper presents an architecture and control logic of a Flexible Manufacturing Cell (FMC) which is one of the important elements under Computer Integrated Manufacturing (CIM) environment. To implement FMC, it is very important to develop a software which can control and monitor the overall system in an integrated environment. Our primary concern in this research is not to develop individual systems, but to integrate them in the hierarchical control level. Progress on the research of integrating CAD/CAM, Process Planning, Off-line Robot Programming and Simulation module into FMC control system is reported. FMC hardware system used here has an Automated Storage & Retrieval System (AS/RS), a conveyor system, a transfer robot, a CNC milling machine, a bar-code system, and an IBM PC/AT as Cell Control System (CCS). In order to demonstrate the operational result, the name plates, text-carved aluminium plates, are manufactured by this system.

  • PDF