• Title/Summary/Keyword: 용어추출

Search Result 365, Processing Time 0.03 seconds

Extraction of Relationships between Scientific Terms based on Composite Kernels (혼합 커널을 활용한 과학기술분야 용어간 관계 추출)

  • Choi, Sung-Pil;Choi, Yun-Soo;Jeong, Chang-Hoo;Myaeng, Sung-Hyon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.988-992
    • /
    • 2009
  • In this paper, we attempted to extract binary relations between terminologies using composite kernels consisting of convolution parse tree kernels and WordNet verb synset vector kernels which explain the semantic relationships between two entities in a sentence. In order to evaluate the performance of our system, we used three domain specific test collections. The experimental results demonstrate the superiority of our system in all the targeted collection. Especially, the increase in the effectiveness on KREC 2008, 8% in F1, shows that the core contexts around the entities play an important role in boosting the entire performance of relation extraction.

An XML Tag Indexing Method Using on Lexical Similarity (XML 태그를 분류에 따른 가중치 결정)

  • Jeong, Hye-Jin;Kim, Yong-Sung
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.71-78
    • /
    • 2009
  • For more effective index extraction and index weight determination, studies of extracting indices are carried out by using document content as well as structure. However, most of studies are concentrating in calculating the importance of context rather than that of XML tag. These conventional studies determine its importance from the aspect of common sense rather than verifying that through an objective experiment. This paper, for the automatic indexing by using the tag information of XML document that has taken its place as the standard for web document management, classifies major tags of constructing a paper according to its importance and calculates the term weight extracted from the tag of low weight. By using the weight obtained, this paper proposes a method of calculating the final weight while updating the term weight extracted from the tag of high weight. In order to determine more objective weight, this paper tests the tag that user considers as important and reflects it in calculating the weight by classifying its importance according to the result. Then by comparing with the search performance while using the index weight calculated by applying a method of determining existing tag importance, it verifies effectiveness of the index weight calculated by applying the method proposed in this paper.

A Query Expansion Technique using Query Patterns in QA systems (QA 시스템에서 질의 패턴을 이용한 질의 확장 기법)

  • Kim, Hea-Jung;Bu, Ki-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.1
    • /
    • pp.1-8
    • /
    • 2007
  • When confronted with a query, question answering systems endeavor to extract the most exact answers possible by determining the answer type that fits with the key terms used in the query. However, the efficacy of such systems is limited by the fact that the terms used in a query may be in a syntactic form different to that of the same words in a document. In this paper, we present an efficient semantic query expansion methodology based on query patterns in a question category concept list comprised of terms that are semantically close to terms used in a query. The proposed system first constructs a concept list for each question type and then builds the concept list for each question category using a learning algorithm. The results of the present experiments suggest the promise of the proposed method.

  • PDF

Detection of Porno Sites on the Web using Fuzzy Inference (퍼지추론을 적용한 웹 음란문서 검출)

  • 김병만;최상필;노순억;김종완
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.419-425
    • /
    • 2001
  • A method to detect lots of porno documents on the internet is presented in this parer. The proposed method applies fuzzy inference mechanism to the conventional information retrieval techniques. First, several example sites on porno arc provided by users and then candidate words representing for porno documents are extracted from theme documents. In this process, lexical analysis and stemming are performed. Then, several values such as tole term frequency(TF), the document frequency(DF), and the Heuristic Information(HI) Is computed for each candidate word. Finally, fuzzy inference is performed with the above three values to weight candidate words. The weights of candidate words arc used to determine whether a liven site is sexual or not. From experiments on small test collection, the proposed method was shown useful to detect the sexual sites automatically.

  • PDF

Investigations on Public Perception of Science Articles in the Mass Media and Understanding of Scientific Terms Used in High Frequency in Science Articles (대중매체의 과학기사에 대한 대중들의 인식과 고빈도로 사용되는 과학용어에 대한 이해도 조사)

  • Yun, Eunjeong;Park, Yunebae
    • Journal of The Korean Association For Science Education
    • /
    • v.39 no.4
    • /
    • pp.535-544
    • /
    • 2019
  • In order to find out whether the traditional mass media in our society are sufficiently functioning as a vehicle of providing scientific information to the public outside the school education, public perception of science articles in mass media and scientific terms used in high frequency in science articles have been examined. To investigate the public perception on science articles, a questionnaire was constructed about the usefulness, importance, access frequency, and understanding of science articles. The questionnaires were conducted in areas with high flow populations such as train stations or subway stations. A total of 425 responses were used for analysis. In order to extract high frequency scientific terms used in science articles, two television companies and two newspapers were designated as target media, and their texts on science articles reported over the last 17 years were collected to investigate the frequency of scientific terms used. Based on the frequency, we conducted the self-report comprehension test for the top 100 scientific terms. The results of this study show that the public in our society has relatively high perception of the importance and usefulness of science articles, however, reading and understanding the articles seems to be somewhat difficult. In addition, the scientific terminology used in science articles has a high degree of comprehension for those of higher education, natural sciences majors, and men. In addition, scientific terms with high understanding degree were characterized according to gender, age, educational background, and field of major.

Method of Document Retrieval Using Word Embeddings and Disease-Centered Document Clusters (단어 의미 표현과 질병 중심 의학 문서 클러스터 기반 의학 문서 검색 기법)

  • Jo, Seung-Hyeon;Lee, Kyung-Soon
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.51-55
    • /
    • 2016
  • 본 논문에서는 임상 의사 결정 지원을 위한 UMLS와 위키피디아를 이용하여 지식 정보를 추출하고 질병중심 문서 클러스터와 단어 의미 표현을 이용하여 질의 확장 및 문서를 재순위화하는 방법을 제안한다. 질의로는 해당 환자가 겪고 있는 증상들이 주어진다. UMLS와 위키피디아를 사용하여 병명과 병과 관련된 증상, 검사 방법, 치료 방법 정보를 추출하고 의학 인과 관계를 구축한다. 또한, 위키피디아에 나타나는 의학 용어들에 대하여 단어의 효율적인 의미 추정 기법을 이용하여 질병 어휘의 의미 표현 벡터를 구축하고 임상 인과 관계를 이용하여 질병 중심 문서 클러스터를 구축한다. 추출한 의학 정보를 이용하여 질의와 관련된 병명을 추출한다. 이후 질의와 관련된 병명과 단어 의미 표현을 이용하여 확장 질의를 선택한다. 또한, 질병 중심 문서 클러스터를 이용하여 문서 재순위화를 진행한다. 제안 방법의 유효성을 검증하기 위해 TREC Clinical Decision Support(CDS) 2014, 2015 테스트 컬렉션에 대해 비교 평가한다.

  • PDF

A Korean Morphological Analyzer CBKMA and A Index Word Extractor CBKMA/IX (한국어 형태소 분석기 CBKMA와 색인어 추출기 CBKMA/IX)

  • Kim, Nam-Churl;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10d
    • /
    • pp.50-59
    • /
    • 1999
  • 본 논문은 한국어 형태소 분석기 CBKMA와 이 CBKMA를 이용한 색인어 추출기 CBKMA/IX를 소개하고, 각각의 특징들에 대해서 설명한다. CBKMA는 음절 정보를 이용하는 분석 알고리즘과, 효율적인 사전구성을 이용한 형태소 분석기로서, 과다한 분석 후보의 생성을 줄임으로써 처리 속도를 향상시켰다. 수행시 필요로 하는 컴퓨터 자원은 Main Memory 약 4Mb정도로, 작은 규모의 시스템에서도 수행이 가능한 특징을 갖는다. CBKMA/IX는 CBKMA의 형태소 분석 기능을 이용하는 색인어 자동 추출기로서, 처리 속도 향상을 위하여 대분류 수준의 품사 태그만을 이용한다. 또한 CBKMA의 분석 기능에 색인어 추출을 위해 불용어 사전, 사용자 키워드 사전 처리 부분과, 복합명사와 미등록어 분석 부분 및 한자어, 일본어 등에 대한 처리를 강화시켰다. 특히 비소설류 자료의 분석시 좋은 성능을 발휘한다.

  • PDF

A Study on the Korean-Engligh Semantic Thesaurus Construction for Knowledge Management System (지식관리시스템을 위한 의미형 한영 시소러스 구축에 관한 연구)

  • 남영준
    • Journal of Korean Library and Information Science Society
    • /
    • v.32 no.4
    • /
    • pp.77-98
    • /
    • 2001
  • As the role of a library has changed to the integrated management system of knowledge, the library needs new information retrieval tools. The purpose of this study is to propose a method and principle of the Korean-English semantic thesaurus construction for a knowledge management system. The method and principle is as follows; 1) in collecting terminology, I included not only internal documents but external documents on the web as a source for the descriptors extraction. 2) conceptual descriptors are more needed than semantic ones. I also proposed the necessity of the authority files for complement. 3) I proposed the appropriate scale of the descriptors to be 15,000 in a thesaurus. And 4) I proposed a hybrid method that used both a manual and an automatic process in establishing the relationship.

  • PDF

A Study of Designing the Intelligent Information Retrieval System by Automatic Classification Algorithm (자동분류 알고리즘을 이용한 지능형 정보검색시스템 구축에 관한 연구)

  • Seo, Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.283-304
    • /
    • 2008
  • This is to develop Intelligent Retrieval System which can automatically present early query's category terms(association terms connected with knowledge structure of relevant terminology) through learning function and it changes searching form automatically and runs it with association terms. For the reason, this theoretical study of Intelligent Automatic Indexing System abstracts expert's index term through learning and clustering algorism about automatic classification, text mining(categorization), and document category representation. It also demonstrates a good capacity in the aspects of expense, time, recall ratio, and precision ratio.

  • PDF

Deep Analysis on Index Terms Using Baysian Inference Network (베이지안 추론망 기반 색인어의 심층 분석 방법)

  • Song, Sa-Kwang;Lee, Seungwoo;Jung, Hanmin
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.84-87
    • /
    • 2012
  • 대분분의 검색 엔진에서 색인어의 추출 및 가중치의 부여방법은 매우 중요한 연구주제로, 검색 엔진의 성능에 큰 영항을 미친다. 일반적으로, 불용어 리스트를 통해 성능에 긍정적인 영향을 미치지 않는 색인어를 제거하거나, 핵심어 또는 전문용어 등 상대적으로 중요한 색인어를 강조하는 방식을 사용하여 검색엔진의 성능을 향상시킨다. 하지만, 어절 분리, 형태소 분석, 불용어 처리 등 검색엔진의 단계열 처리 과정에서, 개별적인 색인어가 검색엔진에 미치는 영향을 분석하고 이를 반영한 검색 엔진 성능 향상 기법은 제시되지 않고 있다. 따라서 본 연구에서는 각 단계별 처리 과정에서 생성된 색인어가 미치는 영항을 계랑화하여 긍정적/부정적 색인어를 분류하는 방법론을 소개하고, 이를 기반으로 색인어 가중치를 조절함으로써 검색 엔진의 성능 또한 향상 가능한 방법을 소개한다.

  • PDF