• Title/Summary/Keyword: 용어추출

Search Result 365, Processing Time 0.033 seconds

Comparative study of legal document summary method based on pre-trained model (사전학습 기반의 법률문서 요약 방법 비교연구)

  • Kim, EuiSoon;Lim, HeuiSeok
    • Annual Conference of KIPS
    • /
    • 2021.11a
    • /
    • pp.614-617
    • /
    • 2021
  • 법률 문서는 일반 사용자가 이해하기 어려운 용어로 이루어져 있고 특히 장문의 문서가 많아 법률시스템에 종사하는 종사자들 또한 많은 양의 문서를 읽기가 어려운 현실이다. 이에 문서 요약 방법중 딥러닝 기반의 사전학습 모델을 적용한 추출요약기반, 생성요약 방법론과 딥러닝 이전의 핵심문장 추출 방법론을 비교하여 법률용어의 요약성능에 대한 비교 평가를 수행하고자 하며 추후 연구과제로 법률문서에 특화된 요약 모델을 만들어보고자 한다.

A Method of Descriptor Extraction for Automatic Document Clustering (자동 문서 클러스터링을 위한 디스크립터 추출 방안)

  • Yun, Bo-Hyun;Kang, Hyun-Kyu;Ko, Hyung-Dae
    • Annual Conference of KIPS
    • /
    • 2000.04a
    • /
    • pp.230-233
    • /
    • 2000
  • 기존의 검색엔진은 검색결과를 적합도 순서로 나열하여 사용자가 원하는 문서를 찾는데 어려움이 있다. 이러한 문제의 해결책으로 검색결과 문서에 대해 자동 클러스터링을 수행하여 문서 내용이 유사한 문서가 하나의 클러스터내에 존재하도록 한다. 본 논문에서는 검색 결과 문서의 클러스터링에서 필요한 디스크립터 추출 방안을 제안한다. 각 클러스터 내에서 디스크립터를 추출하기 위해 정보검색의 색인과정에서 사용하는 용어 가중치 계산 방법을 이용한다.

  • PDF

An Alignment Model for Extracting English-Korean Translations of Term Constituents (영-한 조어단위 대역쌍 추출을 위한 조어단위 정렬 모델)

  • Oh Jong-Hoon;Huang Jin-Xia;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.4
    • /
    • pp.300-311
    • /
    • 2005
  • Terms are linguistic realization of technical concepts. Term constituents are important elements used for representing the concept. Since many new terms are created from the modification or combination of existing constituents, it is important to analyze term constituents for understanding the concept of the term. It means that term constituents offer clues for understanding the concept of terms. However, there are a couple of difficulties in matching concept unit and term constituents such as mismatching between a term constituent and a concept unit, homonym of term constituents and synonym of term constituents. To solve them, it is necessary to recognize concept units of term constituents. In this paper, we define an English term constituent as the concept unit and use an alignment algorithm between English-Korean term constituents in order to recognize concept units of term constituents. By our alignment algorithm we recognize Korean term constituents corresponding to an English term constituent with about $93\%$ precision.

Analysis of Scientific Item Networks from Science and Biology Textbooks (고등학교 과학 및 생물교과서 과학용어 네트워크 분석)

  • Park, Byeol-Na;Lee, Yoon-Kyeong;Ku, Ja-Eul;Hong, Young-Soo;Kim, Hak-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.427-435
    • /
    • 2010
  • We extracted core terms by constructing scientific item networks from textbooks, analyzing their structures, and investigating the connected information and their relationships. For this research, we chose three high-school textbooks from different publishers for each three subjects, i.e, Science, Biology I and Biology II, to construct networks by linking scientific items in each sentence, where used items were regarded as nodes. Scientific item networks from all textbooks showed scare-free character. When core networks were established by applying k-core algorithm which is one of generally used methods for removing lesser weighted nodes and links from complex network, they showed the modular structure. Science textbooks formed four main modules of physics, chemistry, biology and earth science, while Biology I and Biology II textbooks revealed core networks composed of more detailed specific items in each field. These findings demonstrate the structural characteristics of networks in textbooks, and suggest core scientific items helpful for students' understanding of concept in Science and Biology.

Text mining on internet-news regarding climate change and food (기후변화 및 식품 관련 뉴스기사의 텍스트 마이닝)

  • Hyun, Yoonjin;Kim, Jeong Seon;Jeong, Jin-Wook;Yun, Simon;Lee, Moon-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.419-427
    • /
    • 2015
  • Despite of correlation between climate changes and food-related information, it is still not easy for many users to get access to the information with interest. This study investigated how much climate change and food-related information are correlated with each other and how often they are exposed through frequency and correlation analysis on news articles on the internet portals. Through analysis on the frequency of climate change and food-related news articles, this study was able to figure out how often they are exposed at the same time by the internet news portals. In addition, a total of 59 correlation rules regarding the climate change and food-related vocabularies were derived from these news articles using the climate change and food-related glossaries. Then, a correlation between certain climate change-related and food-related words was analyzed in order to package the related words.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

A Study on Constructing Theological Thesaurus (신학 시소러스 구축에 관한 연구)

  • Yoo, Yeong-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.207-225
    • /
    • 2010
  • Terms collected from theological dictionaries in English and the Scripture are used in order to construct conceptual relationships of theological thesaurus. Using the terms, equivalence relationships, hierarchical relationships, and associative relationships as the basic relationships in thesaurus are constructed. In equivalence relationships, Hebrew, Greek, and Latin terms are included as descriptors and in hierarchical relationships, generic, instance, whole-part, and polyhierarchical relationships are constructed. Also, there is no big difference in the kinds of conceptual relationships between this theological thesaurus and the thesauri of other subjects. Examples of Biblical Theology are showed. Because Biblical Theology has a strong point to view the Scripture and Protestantism on comprehensive perspective. In this context, one of the main feature in the theological thesaurus is that there are a lot of the allegorical terms. Typology, which is the core structure causes this result.

Research of Topic Analysis for Extracting the Relationship between Science Data (과학기술용어 간 관계 도출을 위한 토픽 분석 연구)

  • Kim, Mucheol
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.1
    • /
    • pp.119-129
    • /
    • 2016
  • With the development of web, amount of information are generated in social web. Then many researchers are focused on the extracting and analyzing social issues from various social data. The proposed approach performed gathering the science data and analyzing with LDA algorithm. It generated the clusters which represent the social topics related to 'health'. As a result, we could deduce the relationship between science data and social issues.

Feature Term Based Retrieval Method for Image Retrieval (이미지 검색을 위한 특징용어 기반 검색 기법)

  • Park, Sung-Hee;Hur, Jeung;Kim, Hyun-Jin;Jang, Myung-Gil
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.576-578
    • /
    • 2003
  • 본 논문에서는 이미지 검색을 위한 새로운 검색 기법을 제시한다. 기존의 특징기반 검색 기법이나 주석기반 검색 기법은 특징이나 주석에 대하여 색인 형태나 질의 형태가 동일하였다. 그러나, 제안하는 검색 기법은 위의 두 전형적인 검색기법을 혼합한 것으로, 텍스트로 질의하면 질의 텍스트를 질의처리를 통해 텍스트에 포함된 특징용어를 추출하고 특징용어를 이미지가 본질적으로 가지는 특징(color, shape, texture)으로 변환한 다음 그 특징을 질의로 이용하여 특징기반 검색을 하는 기법이다. 이러한 기법은 현재 사용자에게 친숙한 텍스트 질의를 유지할 수 있게 해 주며 앞으로 음성인식을 통한 음성 질의인터페이스가 적용될 경우 더욱 효과적으로 사용될 수 있을 것이다.

  • PDF

Logical representation of ontological terminologies in biomedical domain (생물의료분야의 온톨로지 용어의 논리적 표현 기법)

  • KIm, Jung-Jae;Lee, Jin-Bok;Min, Hye-Jin;Jung, Ji-Yong;Park, Jong-C.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.79-85
    • /
    • 2003
  • 본 논문은 대량의 생물의료분야 문서에서 단백질 이름을 자동으로 인식하고 각 단백질의 특성을 문서에서 자동으로 파악하여 기존의 온톨로지와 연계시키는 방법을 제안한다. 온톨로지 용어가 문서에서 다양한 형태로 발견되기 때문에, 이들을 논리적 표현으로 자동 변환하고, 문서에서 단백질의 특성을 설명하는 문장들을 추출 및 분석하여 온톨로지 용어의 논리적 표현과 비교하였다. 문서에서 단백질 특성을 인식할 때, 약어 처리 및 조응 현상 해결 등의 자연언어처리 기법을 이용하는 방법을 제안하였다.

  • PDF