• Title/Summary/Keyword: Document information retrieval

Search Result 410, Processing Time 0.025 seconds

Semantic Search System using Ontology-based Inference (온톨로지기반 추론을 이용한 시맨틱 검색 시스템)

  • Ha Sang-Bum;Park Yong-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.202-214
    • /
    • 2005
  • The semantic web is the web paradigm that represents not general link of documents but semantics and relation of document. In addition it enables software agents to understand semantics of documents. We propose a semantic search based on inference with ontologies, which has the following characteristics. First, our search engine enables retrieval using explicit ontologies to reason though a search keyword is different from that of documents. Second, although the concept of two ontologies does not match exactly, can be found out similar results from a rule based translator and ontological reasoning. Third, our approach enables search engine to increase accuracy and precision by using explicit ontologies to reason about meanings of documents rather than guessing meanings of documents just by keyword. Fourth, domain ontology enables users to use more detailed queries based on ontology-based automated query generator that has search area and accuracy similar to NLP. Fifth, it enables agents to do automated search not only documents with keyword but also user-preferable information and knowledge from ontologies. It can perform search more accurately than current retrieval systems which use query to databases or keyword matching. We demonstrate our system, which use ontologies and inference based on explicit ontologies, can perform better than keyword matching approach .

A Study on the Efficiency & Limitation of 3D Animation Production Management Using Production Management Tool - Focusing on Shotgun Software & Ftrack (3D 애니메이션 제작 관리를 위한 제작관리도구(Tool)의 효율성 및 한계 - 샷건(Shotgun)과 Ftrack(에프트랙)을 중심으로)

  • Lee, Esther Kkotsongyi
    • Cartoon and Animation Studies
    • /
    • s.49
    • /
    • pp.1-23
    • /
    • 2017
  • 3D animation production has had a pivotal position in current animation industry and the necessity of professional management tool for 3D animation production has claimed due to its sophisticated pipeline from advance of technology and global production partnership trend. Shotgun and Ftrack are providing the most appropriate management toolset for 3D animation management among the extant management tools and the efficiency of Shotgun & Ftrack is identified compared with the traditional document oriented management style. The biggest strength of production management using Shotgun is that all of the production staff can directly participate in the communication on the tools therefore they can share the information on Shotgun & Ftrack in real time without constraint of time and location. Moreover, all the process of the production and the history of the discussion on certain production issues are systematically accrue on the tool so that the production history can be easily tracked. Finally, the production management using tools contributes collecting and analysing the production information for the production management team in studios. However, Shotgun & Ftrack has metadata based retrieval method which cost huge amount of effort by human's manual annotation and it also has the limitation of accuracy. In addition, the fact that studios has to have technical professionals first in order to institute the tools into their studios is the actual difficulty of Korean studios when they want to use management tools for their project. Thus, this paper suggests adopting the content-based retrieval system on the tools and tools' expanded technical service for the studios as the solution of the identified issues.

A Research on Enhancement of Text Categorization Performance by using Okapi BM25 Word Weight Method (Okapi BM25 단어 가중치법 적용을 통한 문서 범주화의 성능 향상)

  • Lee, Yong-Hun;Lee, Sang-Bum
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.12
    • /
    • pp.5089-5096
    • /
    • 2010
  • Text categorization is one of important features in information searching system which classifies documents according to some criteria. The general method of categorization performs the classification of the target documents by eliciting important index words and providing the weight on them. Therefore, the effectiveness of algorithm is so important since performance and correctness of text categorization totally depends on such algorithm. In this paper, an enhanced method for text categorization by improving word weighting technique is introduced. A method called Okapi BM25 has been proved its effectiveness from some information retrieval engines. We applied Okapi BM25 and showed its good performance in the categorization. Various other words weights methods are compared: TF-IDF, TF-ICF and TF-ISF. The target documents used for this experiment is Reuter-21578, and SVM and KNN algorithms are used. Finally, modified Okapi BM25 shows the most excellent performance.

User Profile based Personalized Web Agent (사용자 프로파일 기반 개인 웹 에이전트)

  • So, Young-Jun;Park, Young-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.3
    • /
    • pp.248-256
    • /
    • 2000
  • This paper presents a personalized web agent that constructs user profile which consists of user preferences on the web and recommends his/her relevant information to the user. The personalized web agent consists of monitor agent, user profile construction agent, and user profile refinement agent. The monitor agent makes a user describe his/her preferences directly and it creates the database of preference document, finally performs several keyword extraction to increase the accuracy of the DB. The user profile construction agent transforms the extracted keywords into user profile that could be confirmed and edited by the user. and the refinement agent refines user profile by recursively learning and processing user feedback. In this paper, we describe the several keyword weighting and inductive learning techniques in detail. Finally, we describe the adaptive web retrieval and push agent that perform adaptive services to the user.

  • PDF

A Study on the Musical Theme Clustering for Searching Note Sequences (음렬 탐색을 위한 주제소절 자동분류에 관한 연구)

  • 심지영;김태수
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.3
    • /
    • pp.5-30
    • /
    • 2002
  • In this paper, classification feature is selected with focus of musical content, note sequences pattern, and measures similarity between note sequences followed by constructing clusters by similar note sequences, which is easier for users to search by showing the similar note sequences with the search result in the CBMR system. Experimental document was $\ulcorner$A Dictionary of Musical Themes$\lrcorner$, the index of theme bar focused on classical music and obtained kern-type file. Humdrum Toolkit version 1.0 was used as note sequences treat tool. The hierarchical clustering method is by stages focused on four-type similarity matrices by whether the note sequences segmentation or not and where the starting point is. For the measurement of the result, WACS standard is used in the case of being manual classification and in the case of the note sequences starling from any point in the note sequences, there is used common feature pattern distribution in the cluster obtained from the clustering result. According to the result, clustering with segmented feature unconnected with the starting point Is higher with distinct difference compared with clustering with non-segmented feature.

Developing of Text Plagiarism Detection Model using Korean Corpus Data (한글 말뭉치를 이용한 한글 표절 탐색 모델 개발)

  • Ryu, Chang-Keon;Kim, Hyong-Jun;Cho, Hwan-Gue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.231-235
    • /
    • 2008
  • Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.

Effective Searchable Symmetric Encryption System using Conjunctive Keyword on Remote Storage Environment (원격 저장소 환경에서 다중 키워드를 이용한 효율적인 검색 가능한 대칭키 암호 시스템)

  • Lee, Sun-Ho;Lee, Im-Yeong
    • The KIPS Transactions:PartC
    • /
    • v.18C no.4
    • /
    • pp.199-206
    • /
    • 2011
  • Removable Storage provides the excellent portability with light weight and small size which fits in one's hand, many users have recently turned attention to the high-capacity products. However, due to the easy of portability for Removable Storage, Removable Storage are frequently lost and stolen and then many problems have been occurred such as the leaking of private information to the public. The advent of remote storage services where data is stored throughout the network, has allowed an increasing number of users to access data. The main data of many users is stored together on remote storage, but this has the problem of disclosure by an unethical administrator or attacker. To solve this problem, the encryption of data stored on the server has become necessary, and a searchable encryption system is needed for efficient retrieval of encrypted data. However, the existing searchable encryption system has the problem of low efficiency of document insert/delete operations and multi-keyword search. In this paper, an efficient searchable encryption system is proposed.

PIRS : Personalized Information Retrieval System using Adaptive User Profiling and Real-time Filtering for Search Results (적응형 사용자 프로파일기법과 검색 결과에 대한 실시간 필터링을 이용한 개인화 정보검색 시스템)

  • Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.21-41
    • /
    • 2010
  • This paper proposes a system that can serve users with appropriate search results through real time filtering, and implemented adaptive user profiling based personalized information retrieval system(PIRS) using users' implicit feedbacks in order to deal with the problem of existing search systems such as Google or MSN that does not satisfy various user' personal search needs. One of the reasons that existing search systems hard to satisfy various user' personal needs is that it is not easy to recognize users' search intentions because of the uncertainty of search intentions. The uncertainty of search intentions means that users may want to different search results using the same query. For example, when a user inputs "java" query, the user may want to be retrieved "java" results as a computer programming language, a coffee of java, or a island of Indonesia. In other words, this uncertainty is due to ambiguity of search queries. Moreover, if the number of the used words for a query is fewer, this uncertainty will be more increased. Real-time filtering for search results returns only those results that belong to user-selected domain for a given query. Although it looks similar to a general directory search, it is different in that the search is executed for all web documents rather than sites, and each document in the search results is classified into the given domain in real time. By applying information filtering using real time directory classifying technology for search results to personalization, the number of delivering results to users is effectively decreased, and the satisfaction for the results is improved. In this paper, a user preference profile has a hierarchical structure, and consists of domains, used queries, and selected documents. Because the hierarchy structure of user preference profile can apply the context when users perfomed search, the structure is able to deal with the uncertainty of user intentions, when search is carried out, the intention may differ according to the context such as time or place for the same query. Furthermore, this structure is able to more effectively track web documents search behaviors of a user for each domain, and timely recognize the changes of user intentions. An IP address of each device was used to identify each user, and the user preference profile is continuously updated based on the observed user behaviors for search results. Also, we measured user satisfaction for search results by observing the user behaviors for the selected search result. Our proposed system automatically recognizes user preferences by using implicit feedbacks from users such as staying time on the selected search result and the exit condition from the page, and dynamically updates their preferences. Whenever search is performed by a user, our system finds the user preference profile for the given IP address, and if the file is not exist then a new user preference profile is created in the server, otherwise the file is updated with the transmitted information. If the file is not exist in the server, the system provides Google' results to users, and the reflection value is increased/decreased whenever user search. We carried out some experiments to evaluate the performance of adaptive user preference profile technique and real time filtering, and the results are satisfactory. According to our experimental results, participants are satisfied with average 4.7 documents in the top 10 search list by using adaptive user preference profile technique with real time filtering, and this result shows that our method outperforms Google's by 23.2%.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

A Design and Implementation of XML DTDs for Integrated Medical Information System (통합의료정보 시스템을 위한 XML DTD 설계 및 구현)

  • 안철범;나연묵
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.6
    • /
    • pp.106-117
    • /
    • 2003
  • The advanced medical information systems usually consist of loosely-coupled interaction of independent systems, such as HIS/RIS and PACS. To support easier information exchange between these systems and between hospitals, and to support new types of medical service such as teleradiology, it becomes essential to integrate separated medical information and allow them to be exchanged and retrieved through internet. This thesis proposes an integrated medical information system using XML. We analyzed HL7 and DICOM standard formats, and designed an integrated XML DTD. We extracted information from HL7 messages and DICOM files and generated XML document instances and XSL stylesheets based on the proposed XML DTD. We implemented the web interface for the integrated medical information system, which supports data sharing, information exchange and retrieval between two different standard formats. The proposed XML-based integrated medical information system will contribute to solve the problems of current medical information systems, by enabling integration of separated medical informations and by allowing data exchange and sharing through internet. The proposed system with XML is more robust than web-based medical information systems developed by using HTML, because XML itself provides more flexibility and extensibility than HTML.