• Title/Summary/Keyword: Semantic Library

Search Result 135, Processing Time 0.023 seconds

An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning (기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구)

  • Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.309-336
    • /
    • 2016
  • This paper introduces a relation extraction system that can be used in identifying and classifying semantic relations between biomedical entities in scientific texts using machine learning methods such as Support Vector Machines (SVM). The suggested system includes many useful functions capable of extracting various linguistic features from sentences having a pair of biomedical entities and applying them into training relation extraction models for maximizing their performance. Three globally representative collections in biomedical domains were used in the experiments which demonstrate its superiority in various biomedical domains. As a result, it is most likely that the intensive experimental study conducted in this paper will provide meaningful foundations for research on bio-text analysis based on machine learning.

A Study on Analyzing the Features of 2019 Revised RDA (2019 개정 RDA 특징 분석에 관한 연구)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.3
    • /
    • pp.97-116
    • /
    • 2019
  • This study is to analyze the characteristics of 2019 revised RDA and suggest the consideration in aspects of cataloging using the literature reviews. The following 3 things are suggested with analyzing the revised RDA. First, high quality data such as supplementing cataloging data and constructing vocabulary encoding schemes should be needed to transform bibliographic data to linked data for the semantic web. Second, MARC should be expanded to accept the new conept of LRM and linked data being reflected in revised RDA because MARC is the unique encoding format untile linked data will be transformed from MARC data. Third, the policy statement and the application profile are needed for describing resource consistently because each entity and element has own condition and option, and there are different elements for applying rules in revised RDA. Based on this study, the RDA related researches should be in progress such as exapanding BIBFRAME as well as MARC to accept the new concepts in revised RDA, and, also, reflecting and accepting RDA being able to use revised RDA rules and registries in libraries and nations that have been faced to revise their own cataloging rules.

A Study on the maDMP (machine-actionable DMP) Implementation Cases and its Application Method (maDMP 구현 사례와 적용방안에 관한 연구)

  • Kim, Juseop;Kim, Suntae;Han, Yeonjung;Youe, Won-Jae
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.111-134
    • /
    • 2021
  • Recently, the preparation and submission of DMP is gradually becoming compulsory, centering on domestic government-funded research institutes. However, as DMP preparation is described in written or free text, there is a problem that research data management cannot be properly explained due to non-standardization and insufficient preparation in terms of standards, formats, and management. Therefore, in this study, a case study was conducted on a machine-readable DMP that can be automatically generated and maintained by a machine, and a method for applying maDMP was proposed. Examples of maDMP investigated included RDCS, Argos, Haplo Repository, and DMap. In addition, the use of permanent identifiers, application of controlled vocabulary, and application of semantic technologies such as ontology can be mentioned as possible ways to apply maDMP.

A Study on the Linking Structure for Authorized Access Point for Manifestation Based on the Current Bibliographic Trends in South Korea (국내 서지동향을 반영한 구현형의 전거형 접근점 연계 구조)

  • Mideum Park;Seungmin Lee
    • Journal of Korean Library and Information Science Society
    • /
    • v.55 no.2
    • /
    • pp.109-132
    • /
    • 2024
  • As the bibliographic environment has been evolved towards linked data and semantic web, revision of KCR5 based on RDA is also in progress in Korea. Even authorized access points play an important role in identification and linkage between resources in the evolving bibliographic environment. However, the original RDA applied by KCR5 does not provide authorized access points for all entities. Based on the analysis of the authorized access point of manifestation in RDA 2020, this research identified properties and proposed a linking structure of the authorized access point that can be applied to the revision of KCR5. An authorized access point for manifestation is an access point that considers both intellectual and physical aspects, and can be the foundation for linking and identifying actual resources. The proposed structure is expected to be a linking structure for authority record optimized to the current bibliographic environment, which can be helpful in the practical application of the authorized access point for manifestation.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

A Study on Optimization of Support Vector Machine Classifier for Word Sense Disambiguation (단어 중의성 해소를 위한 SVM 분류기 최적화에 관한 연구)

  • Lee, Yong-Gu
    • Journal of Information Management
    • /
    • v.42 no.2
    • /
    • pp.193-210
    • /
    • 2011
  • The study was applied to context window sizes and weighting method to obtain the best performance of word sense disambiguation using support vector machine. The context window sizes were used to a 3-word, sentence, 50-bytes, and document window around the targeted word. The weighting methods were used to Binary, Term Frequency(TF), TF ${\times}$ Inverse Document Frequency(IDF), and Log TF ${\times}$ IDF. As a result, the performance of 50-bytes in the context window size was best. The Binary weighting method showed the best performance.

Design and Implementation of A Native ATM API (Native ATM API의 설계 및 구현)

  • Seong, Jong-Jin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1337-1348
    • /
    • 1997
  • Ip over ATM and LAN Emulation over ATYM are cimmon methosd for applications to use ATM netework.But these can hardly provide full ATM serviece because of legacy transprot and network protocols they use.This paper presents work of desing and implementation of a Native ATM API that can enable direct use of native ATM scriviccs.In our work, Native ATM API specification which accommodates ATM Forum's "Native ATM Services :Semantic Deseription"has been defined, and based on this, Native ATM API has been implemented of our development are addressed, and implementation envuroment, sofware archiercture, Native ATM API library functions, and applecation programming using our Native ATM API are described.described.

  • PDF

Practical and Verifiable C++ Dynamic Cast for Hard Real-Time Systems

  • Dechev, Damian;Mahapatra, Rabi;Stroustrup, Bjarne
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.4
    • /
    • pp.375-393
    • /
    • 2008
  • The dynamic cast operation allows flexibility in the design and use of data management facilities in object-oriented programs. Dynamic cast has an important role in the implementation of the Data Management Services (DMS) of the Mission Data System Project (MDS), the Jet Propulsion Laboratory's experimental work for providing a state-based and goal-oriented unified architecture for testing and development of mission software. DMS is responsible for the storage and transport of control and scientific data in a remote autonomous spacecraft. Like similar operators in other languages, the C++ dynamic cast operator does not provide the timing guarantees needed for hard real-time embedded systems. In a recent study, Gibbs and Stroustrup (G&S) devised a dynamic cast implementation strategy that guarantees fast constant-time performance. This paper presents the definition and application of a cosimulation framework to formally verify and evaluate the G&S fast dynamic casting scheme and its applicability in the Mission Data System DMS application. We describe the systematic process of model-based simulation and analysis that has led to performance improvement of the G&S algorithm's heuristics by about a factor of 2. In this work we introduce and apply a library for extracting semantic information from C++ source code that helps us deliver a practical and verifiable implementation of the fast dynamic casting algorithm.

Development of the ISO 15926-based Classification Structure for Nuclear Plant Equipment (ISO 15926 국제 표준을 이용한 원자력 플랜트 기자재 분류체계)

  • Yun, J.;Mun, D.;Han, S.;Cho, K.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.12 no.3
    • /
    • pp.191-199
    • /
    • 2007
  • In order to construct a data warehouse of process plant equipment, a classification structure should be defined first, identifying not only the equipment categories but also attributes of an each equipment to represent the specifications of equipment. ISO 15926 Process Plants is an international standard dealing with the life-cycle data of process plant facilities. From the viewpoints of defining classification structure, Part 2 data model and Reference Data Library (RDL) of ISO 15926 are seen to respectively provide standard syntactic structure and semantic vocabulary, facilitating the exchange and sharing of plant equipment's life-cycle data. Therefore, the equipment data warehouse with an ISO 15926-based classification structure has the advantage of easy integration among different engineering systems. This paper introduces ISO 15926 and then discusses how to define a classification structure with ISO 15926 Part 2 data model and RDL. Finally, we describe the development result of an ISO 15926-based classification structure for a variety of equipment consisting in the reactor coolant system (RCS) of APR 1400 nuclear plant.

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.