• Title/Summary/Keyword: Ontology Selection

Search Result 43, Processing Time 0.031 seconds

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

Signatures of positive selection underlying beef production traits in Korean cattle breeds

  • Edea, Zewdu;Jung, Kyoung Sub;Shin, Sung-Sub;Yoo, Song-Won;Choi, Jae Won;Kim, Kwan-Suk
    • Journal of Animal Science and Technology
    • /
    • v.62 no.3
    • /
    • pp.293-305
    • /
    • 2020
  • The difference in the breeding programs and population history may have diversely shaped the genomes of Korean native cattle breeds. In the absence of phenotypic data, comparisons of breeds that have been subjected to different selective pressures can aid to identify genomic regions and genes controlling qualitative and complex traits. In this study to decipher genetic variation and identify evidence of divergent selection, 3 Korean cattle breeds were genotyped using the recently developed high-density GeneSeek Genomic Profiler F250 (GGP-F250) array. The three Korean cattle breeds clustered according to their coat color phenotypes and breeding programs. The Heugu breed reliably showed smaller effective population size at all generations considered. Across the autosomal chromosomes, 113 and 83 annotated genes were identified from Hanwoo-Chikso and Hanwoo-Heugu comparisons, respectively of which 16 genes were shared between the two pairwise comparisons. The most important signals of selection were detected on bovine chromosomes 14 (24.39-25.13 Mb) and 18 (13.34-15.07 Mb), containing genes related to body size, and coat color (XKR4, LYN, PLAG1, SDR16C5, TMEM68, CDH15, MC1R, and GALNS). Some of the candidate genes are also associated with meat quality traits (ACSF3, EIF2B1, BANP, APCDD1, and GALM) and harbor quantitative trait locus (QTL) for beef production traits. Further functional analysis revealed that the candidate genes (DBI, ACSF3, HINT2, GBA2, AGPAT5, SCAP, ELP6, APOB, and RBL1) were involved in gene ontology (GO) terms relevant to meat quality including fatty acid oxidation, biosynthesis, and lipid storage. Candidate genes previously known to affect beef production and quality traits could be used in the beef cattle selection strategies.

A Decision Support System for Product Design Common Attribute Selection under the Semantic Web and SWCL (시맨틱 웹과 SWCL하의 제품설계 최적 공통속성 선택을 위한 의사결정 지원 시스템)

  • Kim, Hak-Jin;Youn, Sohyun
    • Journal of Information Technology Services
    • /
    • v.13 no.2
    • /
    • pp.133-149
    • /
    • 2014
  • It is unavoidable to provide products that meet customers' needs and wants so that firms may survive under the competition in this globalized market. This paper focuses on how to provide levels for attributes that compse product so that firms may give the best products to customers. In particular, its main issue is how to determine common attributes and the others with their appropriate levels to maximize firms' profits, and how to construct a decision support system to ease decision makers' decisons about optimal common attribute selection using the Semantic Web and SWCL technologies. Parameter data in problems and the relationships in the data are expressed in an ontology data model and a set of constraints by using the Semantic Web and SWCL technologies. They generate a quantitative decision making model through the automatic process in the proposed system, which is fed into the solver using the Logic-based Benders Decomposition method to obtain an optimal solution. The system finally provides the generated solution to the decision makers. This presentation suggests the opportunity of the integration of the proposed system with the broader structured data network and other decision making tools because of the easy data shareness, the standardized data structure and the ease of machine processing in the Semantic Web technology.

Combining Support Vector Machine Recursive Feature Elimination and Intensity-dependent Normalization for Gene Selection in RNAseq (RNAseq 빅데이터에서 유전자 선택을 위한 밀집도-의존 정규화 기반의 서포트-벡터 머신 병합법)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.18 no.5
    • /
    • pp.47-53
    • /
    • 2017
  • In past few years, high-throughput sequencing, big-data generation, cloud computing, and computational biology are revolutionary. RNA sequencing is emerging as an attractive alternative to DNA microarrays. And the methods for constructing Gene Regulatory Network (GRN) from RNA-Seq are extremely lacking and urgently required. Because GRN has obtained substantial observation from genomics and bioinformatics, an elementary requirement of the GRN has been to maximize distinguishable genes. Despite of RNA sequencing techniques to generate a big amount of data, there are few computational methods to exploit the huge amount of the big data. Therefore, we have suggested a novel gene selection algorithm combining Support Vector Machines and Intensity-dependent normalization, which uses log differential expression ratio in RNAseq. It is an extended variation of support vector machine recursive feature elimination (SVM-RFE) algorithm. This algorithm accomplishes minimum relevancy with subsets of Big-Data, such as NCBI-GEO. The proposed algorithm was compared to the existing one which uses gene expression profiling DNA microarrays. It finds that the proposed algorithm have provided as convenient and quick method than previous because it uses all functions in R package and have more improvement with regard to the classification accuracy based on gene ontology and time consuming in terms of Big-Data. The comparison was performed based on the number of genes selected in RNAseq Big-Data.

Analysis of cross-population differentiation between Thoroughbred and Jeju horses

  • Lee, Wonseok;Park, Kyung-Do;Taye, Mengistie;Lee, Chul;Kim, Heebal;Lee, Hak-Kyo;Shin, Donghyun
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.8
    • /
    • pp.1110-1118
    • /
    • 2018
  • Objective: This study was intended to identify genes positively selected in Thoroughbred horses (THBs) that potentially contribute to their running performances. Methods: The genomes of THB and Jeju horses (JH, Korean native horse) were compared to identify genes positively selected in THB. We performed cross-population extended haplotype homozygosity (XP-EHH) and cross-population composite likelihood ratio test (XP-CLR) statistical methods for our analysis using whole genome resequencing data of 14 THB and 6 JH. Results: We identified 98 (XP-EHH) and 200 (XP-CLR) genes that are under positive selection in THB. Gene enrichment analysis identified 72 gene ontology biological process (GO BP) terms. The genes and GO BP terms explained some of THB's characteristics such as immunity, energy metabolism and eye size and function related to running performances. GO BP terms that play key roles in several cell signaling mechanisms, which affected ocular size and visual functions were identified. GO BP term Eye photoreceptor cell differentiation is among the terms annotated presumed to affect eye size. Conclusion: Our analysis revealed some positively selected candidate genes in THB related to their racing performances. The genes detected are related to the immunity, ocular size and function, and energy metabolism.

An Extension of SWCL to Represent Logical Implication Knowledge under Semantic Web Environment (의미웹 환경에서 조건부함축 제약 지식표현을 위한 SWCL의 확장)

  • Kim, Hak-Jin
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.3
    • /
    • pp.7-22
    • /
    • 2014
  • By the publications of RDF and OWL, the Semantic Web is confirmed as a technology through which information in the Internet can be processed by machines. The focus of the Semantic Web study after then has moved to how to provide more useful information to users for their decision making beyond simple use of the structured data in ontologies. SWRL that makes logical inference possible by rules, and SWCL that formulates constraints under the Semantic Web environment are some of many efforts toward the achievement of that goal. Constraint represents a connection or a relationship between individual data in ontology. Based on SWCL, this paper tries to extend the language by adding one more type of constraint, implication constaint, in its repertoire. When users use binary variables to represent logical relationships in mathematical models, it requires and knowledge on the solver to solve the models. The use of implication constraint ease this difficulty. Its need, definition and relevant technical description is presented by the use of the optimal common attribute selection problem in product design.

A Design and Implementation of the Semantic Search Engine (시멘틱 검색 엔진 설계 및 구현)

  • Heo, Sun-Young;Kim, Eun-Gyung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.331-335
    • /
    • 2008
  • 시맨틱 웹은 정보의 의미를 개념으로 정의하고 개념들 간의 관계성을 표현함으로써, 문서들 간의 단순 연결이 아닌 의미 연결을 통해서 보다 정확하고 효율적인 정보 검색이 가능하게 된다. 이러한 시맨틱 웹의 비전이 구체화되기 위해서는 웹 온톨로지(Web Ontology)를 기반으로 의미 정보로 구성된 시맨틱 문서들에 대한 추론을 통해서 웹상에 존재하는 엄청난 정보들 간의 관련성을 파악하고 사용자가 요구하는 정보를 보다 효율적으로 검색할 수 있는 시스템이 필수적이다. W3C에서 제안한 OWL은 대표적인 온톨로지 언어이다. 시맨틱 웹 상에서 OWL 데이타를 효율적으로 검색하기 위해서는 잘 구성되어진 저장 스키마를 구축해야 한다. 본 논문에서는 Jena2의 경우, 단일 테이블에 문서의 정보를 저장하기 때문에 단순 선택 연산 (Simple Selection), 조인 연산이 요구되는 질의에 대한 성능이 저하되고 대용량의 OWL데이터의 처리에 있어 성능이 저하되는 문제를 해결하기 위하여 본 논문에서는 OWL 문서의 의미를 Class, Property, Individual로 분류하여 각각의 데이터 정보들을 테이블에 저장하기 위한 다중 변환기와 OWL 변환기 기능을 가진 시멘텍 검색 엔진을 설계 및 구현하였다. 본 검색 엔진을 테스트한 결과, 단순정보검색 질의 시 Jena2에서 비정규화된 테이블 구조로 저장할 때보다 질의 응답 속도를 향상 시킬 수 있었고, 조인 연산 시 두 테이블의 크기로 인한 조인비용이 발생하는 문제점을 해결함으로써 빠른 검색 및 질의 속도를 보장할 수 있었다.

  • PDF

Selection signature reveals genes associated with susceptibility loci affecting respiratory disease due to pleiotropic and hitchhiking effect in Chinese indigenous pigs

  • Xu, Zhong;Sun, Hao;Zhang, Zhe;Zhang, Cheng-Yue;Zhao, Qing-bo;Xiao, Qian;Olasege, Babatunde Shittu;Ma, Pei-Pei;Zhang, Xiang-Zhe;Wang, Qi-Shan;Pan, Yu-Chun
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.2
    • /
    • pp.187-196
    • /
    • 2020
  • Objective: Porcine respiratory disease is one of the most important health problems causing significant economic losses. To understand the genetic basis for susceptibility to swine enzootic pneumonia (EP) in pigs, we detected 102,809 single nucleotide polymorphisms in a total of 249 individuals based on genome-wide sequencing data. Methods: Genome comparison of susceptibility to swine EP in three pig breeds (Jinhua, Erhualian, and Meishan) with two western lines that are considered more resistant (Duroc and Landrace) using cross-population extended haplotype homozygosity and F-statistic (FST) statistical approaches identified 691 positively selected genes. Based on quantitative trait loci, gene ontology terms and literature search, we selected 14 candidate genes that have convincible biological functions associated with swine EP or human asthma. Results: Most of these genes were tested by several methods including transcription analysis and candidate genes association study. Among these genes: cytochrome P450 1A1 and catenin beta 1 (CTNNB1) are involved in fertility; transforming growth factor beta receptor 3 plays a role in meat quality traits; Wnt family member 2, CTNNB1 and transcription factor 7 take part in adipogenesis and fat deposition simultaneously; plasminogen activator, urokinase receptor (completely linked to AXL receptor tyrosine kinase, r2 = 1) plays an essential role in the successful ovulation of matured oocytes in pigs; colipase like 2 (strongly linked to SAM pointed domain containing ETS transcription factor, r2 = 0.848) is involved in male fertility. Conclusion: These adverse genes susceptible to swine EP may be selected while selecting for economic traits (especially reproduction traits) due to pleiotropic and hitchhiking effect of linked genes. Our study provided a completely new point of view to understand the genetic basis for susceptibility or resistance to swine EP in pigs thereby, provides insight for designing sustainable breed selection programs. Finally, the candidate genes are crucial due to their potential roles in respiratory diseases in a large number of species, including human.

Development of the Efficient DAML+OIL Document Management System to support the DAML-S Services in the Embedded Systems (내장형 시스템에서 DAML-S서비스 지원을 위한 효율적인 DAML+OIL문서 관리 시스템)

  • Kim Hag Soo;Jung Moon-young;Cha Hyun Seok;Son Jin Hyun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.1
    • /
    • pp.36-49
    • /
    • 2005
  • Recently, many researchers have given high attention to the semantic web services based on the semantic web technology While existing web services use the XML-based web service description language, WSDL, semantic web services are utilizing web service description languages such as DAML-S in ontology languages. The researchers of semantic web services are generally focused on web service discovery, web service invocation, web service selection and composition, and web service execution monitoring. Especially, the semantic web service discovery as the basis to accomplish the ultimate semantic web service environment has some different properties from previous information discovery areas. Hence, it is necessary to develop the storage system and discovery mechanism appropriate to the semantic well description languages. Even though some related systems have been developed, they are not appropriate for the embedded system environment, such as intelligent robotics, in which there are some limitations on memory disk space, and computing power In this regard, we in the embedded system environment have developed the document management system which efficiently manages the web service documents described by DAML-S for the purpose of the semantic web service discovery, In addition, we address the distinguishing characteristics of the system developed in this paper, compared with the related researches.

Cataloguing of Anther Expressed Genes through Differential Slot Blot in Oriental Lily (Lilium Oriental Hybrid 'Acapulco') (아카풀코나리에서 Differential Slot Blot을 이용한 약발현 유전자 목록작성)

  • Suh, Eun-Jung;Yu, Hee Ju;Han, Bong Hee;Lim, Yong Pyo;Jeong, Mi-Jeong;Lee, Seong-Kon;Kim, Dong-Hern;Chang, An-Cheol;Yae, Byeong Woo
    • Horticultural Science & Technology
    • /
    • v.31 no.5
    • /
    • pp.598-606
    • /
    • 2013
  • Anther is the major organ of flower in responsible to reproduction and outward appearance. From anther-specific cDNA library of Lilium Oriental Hybrid 'Acapulco', 2000 expressed sequence tags were selected randomly. Differential slot blot analysis with cDNA probes from the anther and leaf was used to get anther-expressed clone and 570 non-redundant ESTs were obtained and sequenced. Compared to the GenBank database using BLASTX algorithm, 191 clones showed significant similarity but others (66.5%) did not measured to known sequence. Functional categories according to gene ontology (GO) annotation included sequence representing a significant portion of protein in cell and cell part respectively. A transcriptional analysis at 7 different organs and developmental stage was performed using northern blot with thirty ESTs as putative anther specific gene. This report suggest that selection of anther expressed clone using differential slot blot was considered as very effective tool and our current study can provide fundamental information on the lily anther including pollen furthermore.