• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.036 seconds

Recognition of Korean Implicit Citation Sentences Using Machine Learning with Lexical Features (어휘 자질 기반 기계 학습을 사용한 한국어 암묵 인용문 인식)

  • Kang, In-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.8
    • /
    • pp.5565-5570
    • /
    • 2015
  • Implicit citation sentence recognition is to locate citation sentences which lacks explicit citation markers, from articles' full-text. State-of-the-art approaches exploit word ngrams, clue words, researcher's surnames, mentions of previous methods, and distance relative to nearest explicit citation sentences, etc., reaching over 50% performance. However, most previous works have been conducted on English. As for Korean, a rule-based method using positive/negative clue patterns was reported to attain the performance of 42%, requiring further improvement. This study attempted to learn to recognize implicit citation sentences from Korean literatures' full-text using Korean lexical features. Different lexical feature units such as Eojeol, morpheme, and Eumjeol were evaluated to determine proper lexical features for Korean implicit citation sentence recognition. In addition, lexical features were combined with the position features representing backward/forward proximities to explicit citation sentences, improving the performance up to over 50%.

A Study of Metadata Elements for Digital Image Records Management (디지털이미지 기록관리를 위한 메타데이터 요소 연구)

  • Lee, Ji-Young;Kim, Hee-Jung
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.49-71
    • /
    • 2009
  • As the importance and proportion of electronic records increases in the public sector, the necessity for variable types of records management has strengthened. Elements of records management metadata standards, which were provided in 2007 by the National Archives of Korea, focused mainly on text-centered records management standards. Therefore an extension of elements which can represent diverse types of electronic records is needed. In this study, metadata elements focusing on image records are suggested. For this, the characteristics of image records are investigated and the Australian government recordkeeping metadata standard and the PREMIS data dictionary, which have been recently modified, are analyzed. Through this, four elements, format, significant properties, environment, and coverage are suggested to fortify the current records management standard.

Analysis of the Relations between Social Issues and Prices Using Text Mining - Avian Influenza and Egg Prices - (뉴스기사 분석을 통한 사회이슈와 가격에 관한 연구 - 조류인플루엔자와 달걀가격 중심으로 -)

  • Han, Mu Moung Cho;Kim, Yangsok;Lee, Choong Kwon
    • Smart Media Journal
    • /
    • v.7 no.1
    • /
    • pp.45-51
    • /
    • 2018
  • Avian influenza (AI) is notorious for its rapid infection rate, and has a serious impact on consumers and producers alike, especially in poultry farms. The AI outbreak, which occurred nationwide at the end of 2016, devastated the livestock farming industries. As a result, the prices of eggs and egg products had skyrocketed, and the event was reported by the media with heavy emphasis. The purpose of this study was to investigate the correlation between the egg price fluctuation and the keyword changes in online news articles reflecting social issues. To this end, we analyzed 682 cases of AI-related online news articles for fourteen weeks from November 2016 in South Korea. The results of this study are expected to contribute to understanding the relationship between the actual price of eggs and the keywords from news articles related to social issues.

A Study on the Conceptualization of Information Resources for Localities Based on the FRBRoo/CIDOC CRM (FRBRoo/CIDOC CRM 기반의 로컬리티 정보자원 구조화 연구)

  • Hyun, Moonsoo
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.25 no.4
    • /
    • pp.265-290
    • /
    • 2014
  • The aim of the study is to examine the applicability of FRBRoo / CIDOC CRM to conceptualize information resources for localities. It attemps to establish the conceptual structure and the relationship of them and seeks ways to apply the model. For this purpose, almost 30 articles specially in localities research were analysed and categories of information resources for localities were identified. After examining conceptual model in cultural information management sectors (library, museum, archive), 6 case of conceptualization were attempted based on FRBRoo / CIDOC CRM. In conclusion, it presented that FRBRoo / CIDOC CRM could be applied to various type of information resources for localities and that it could be possible to represent information resources based on particular space(place, local) through the conceptualization.

Usability of the National Science and Technology Information System (웹 사용성 개선에 관한 연구 - 국가과학기술정보시스템을 중심으로 -)

  • Park, Min-Soo;Hyun, Mi-Hwan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.4
    • /
    • pp.5-19
    • /
    • 2011
  • The purpose of this study is to identify possible needs for system improvements and reflect them on the operation and development of the system as a result of the usability assessment of an information site in science and technology. For this study, a variety of data collection techniques, including search logs, interviews, and think-alouds, were used. The search log data was processed to quantify four evaluation aspects, which were the effectiveness, efficiency, satisfaction, and errors. The verbal data collected by think-alouds and post-interviews were used to identify possible needs of enhancement in a qualitative analysis. The comparison of the usability before and after the system enhancement revealed an increase of 15 points for effectiveness, 35 seconds decrease in efficiency, 5 points increase in satisfaction, and 1.1 errors decreased, implying an overall improvement of the usability of the current system.

Technology Forecasting using Bayesian Discrete Model (베이지안 이산모형을 이용한 기술예측)

  • Jun, Sunghae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.27 no.2
    • /
    • pp.179-186
    • /
    • 2017
  • Technology forecasting is predict future trend and state of technology by analyzing the results so far of developing technology. In general, a patent has novel information about the result of developed technology, because the exclusive right of technology included in patent is protected for a time period by patent law. So many studies on the technology forecasting using patent data analysis has been performed. The patent keyword data widely used in patent analysis consist of occurred frequency of the keyword. In most previous researches, the continuous data analyses such as regression or Box-Jenkins Models were applied to the patent keyword data. But, we have to apply the analytical methods of discrete data for patent keyword analysis because the keyword data is discrete. To solve this problem, we propose a patent analysis methodology using Bayesian Poisson discrete model. To verify the performance of our research, we carry out a case study by analyzing the patent documents applied by Apple until now.

A Study on the Development of a Story Database Based on English Literature: Focus on Motif Extracting (영문학 작품을 기반으로 둔 스토리 DB의 필요성 연구: 모티프 추출 방안을 중심으로)

  • Kim, Eun-Jung;Shin, Dong-il;Hwang, Su-Kyung
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.463-472
    • /
    • 2015
  • The purpose of this study is to suggest a development model of English literature database, which will be widely available for narrative creation and editing in digital environment. The database will be allowed to assist effective recycling of various motifs prompted by existing literary works. This paper suggests how to build a story database of English literature by demonstrating a motif abstracting model with Hamlet originally written by William Shakespeare. It is hoped that this study will contribute to producing quality contents of storytelling and also give English literature experts chances of collaboration in the development of digitalized contents.

A study of the vitalization strategy for public sports facility through big-data (빅데이터 분석을 활용한 기금지원 체육시설 활성화 방안)

  • Kim, Mi-ok;Ko, Jin-soo;Noh, Seung-Chul;Chung, Jae-Hoon
    • Journal of Digital Convergence
    • /
    • v.15 no.2
    • /
    • pp.527-535
    • /
    • 2017
  • As interest increases in health promotion through sports, demand for public sports facilities is steadily growing. However, there is a lack of research on operation and management compared with the supply plan of public sports facility. In this context, the aim of this study is to address problems of management of public sports centers and suggest strategies for vitalizing the facilities through the big-data. The data are collected from web such as news, blog, and cafe for one year in 2015. From the big-data, We can find that the national sports centers and the open gyms showed similar users' behavior but showed different needs. Both facilities have been used as sports and leisure area and have a high percentage of visitors for other purposes such as walking, picnics, etc. However, while the national sports facilities which were used for more specialized programs, the open sports center were used as leisure space.

A Study on the Arabic numeral reading rules in Modern Korean (현대 한국어에서 아라비안 숫자의 읽기 규칙 연구)

  • Jung, Young-Im;Kim, Jeong-Se;Kim, Sang-Hoon;Lee, Young-Jik;Yoon, Ae-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.16-23
    • /
    • 2002
  • 본 논문에서는 아라비안 숫자를 포함한 텍스트를 음성으로 합성하기 위하여, 숫자 형태와 분류사 그리고 숫자가 나오는 문맥에 따라 숫자를 자동으로 문자화할 수 있는 전처리 규칙을 설정하는데 목적을 둔다. 먼저 선행연구를 통해 숫자를 포함한 수사 및 수사표현의 읽기 규칙의 적용 범위 및 한계점을 살펴보고, 음성 합성을 위한 아라비안 숫자의 문자화 규칙을 설정하고자 한다. 현대 한국어에서 아라비안 숫자를 읽는 방식은 크게 고유어 방식과 한자어 방식이 있으며 단(單)단위에서는 영어가 사용되기도 한다. 또한 한자어 방식에서도 단위를 붙여 읽는 경우와 모든 수를 단 단위로 읽는 경우가 있으므로, 아라비안 숫자의 문자화를 단순한 규칙을 설정하여 자동화하기에는 중의성이 높다. 본 연구에서는 (1) 숫자 전 전치어(pre-numeral), (2) 기호를 포함한 숫자열의 표현 형식과 크기, (3) 단위 표현, (4) 숫자 후치어(post-numeral), (5) 분류사(classifier) (6) 분류사 후치어(post-classifier), (7) 수사표현 앞뒤 문맥에 따라, 아라비안 숫자표현이 문자화되는 방식을 살펴보았다. 분석 대상 말뭉치는 C 신문의 2000년 1월부터 2000년 4월까지 전체 기사 1,400건에서 숫자가 포함된 숫자표현 약 63,000개론 구성하였다. 패턴화된 구조 및 중의성이 없는 구조를 12가지로 밝히고 중의성이 있는 구조의 유형을 밝혔으며 분류사 후치어와의 결합 관계, 좌우 문맥정보를 통해 중의성 해결의 단서를 제시하고자 하였다.

  • PDF

A Study on Keyword Extraction From a Single Document Using Term Clustering (용어 클러스터링을 이용한 단일문서 키워드 추출에 관한 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.3
    • /
    • pp.155-173
    • /
    • 2010
  • In this study, a new keyword extraction algorithm is applied to a single document with term clustering. A single document is divided by multiple passages, and two ways of calculating similarities between two terms are investigated; the first-order similarity and the second-order distributional similarity. In this experiment, the best cluster performance is achieved with a 50-term passage from the second-order distributional similarity. From the results of first experiment, the second-order distribution similarity was also applied to various keyword extraction methods using statistic information of terms. In the second experiment, pf(paragraph frequency) and $tf{\times}ipf$(term frequency by inverse paragraph frequency) were found to improve the overall performance of keyword extraction. Therefore, it showed that the algorithm fulfills the necessary conditions which good keywords should have.