Search | Korea Science

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

Kim, JaeHun;Lee, Myungjin
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.43-61
- /
- 2019
Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.
https://doi.org/10.13088/jiis.2019.25.1.043 인용 PDF KSCI HTML

A Reflection on the Consumer Culture in the Post-COVID 19 Era from the Lens of Christian Education: Learning from the Drama, Penthouse (포스트 코로나 시대의 소비문화에 대한 기독교교육의 성찰 : 드라마 「펜트하우스」를 중심으로)

Won, Shin-Ae
- Journal of Christian Education in Korea
- /
- v.66
- /
- pp.113-145
- /
- 2021
As a contemporary exponent of Bauderillard's Simulation and Simulacra, this paper aims to reflect on the 'consumer culture' criticized by Baudrillard from the lens of Christian Education in reading the Drama, Penthouse related to the notions of the consumption-ideology, the desire and violence of image in the post-Covid 19 era. As Baudrillard begins to realize that the concept of simulation rooted from mass media in the modern society, he explains mass media as the emerging of Simulation or the process of Simulation will lead to the impulsion of reality, which ends up with vanishing the original reality. Baudrillard is explaining in his argument that the process of Simulation proceeds among various areas of the contemporary society being manipulated by mass media. While Simulation is the process of producing the hyperreality characterized by the excess of images that seems more real than the original reality, Simulation brought about Simulacra as excess reality or consequently exploding reality. Christian educators in the post-Covid 19 must know how to deal with critical theory by considering positive ways of avoiding questioning of how to articulate what the norm of universal consensus is in the specific situation. In other words, it should be noted that the nature of the ruling ideology and the ideology of consumption has been influenced or manipulated by mass media. Christian educators especially have to help young people in seeing the messages from the images of the screens, television, soap-opera, and commercial advertising making reality as Simulacre which is more real than the original reality. When the medium becomes the message, the power of medium makes the consumer not reach communication with it. This is the main reason in the controversy about the images on television drama, Penthouse and the impact of images on people's mind. As an exponent of McLuhan's belief that "the medium is the message", Baudrillard argues although the message and a subject of Simulacra(excessive reality) is unexpectedly disappearing, the medium itself is vanished through the silence of image. However, the task of Christian education has to fuel how we teach, learn, share and pass on the Word of God as the Message. Furthermore, it is worth noting that the Message of God cannot be vanished or burst with the impulsion of it, but exists forever. With Baudrillard's ideas of Simulation and Simulacra in mind, the work of Christian education as an observation platform can better engage the reflection on a consumer society of consumerism that makes Church community and a consumer irresistible against the Fake world.
https://doi.org/10.17968/jcek.2021..66.004 인용 PDF

An Analysis of Subject Competencies Applied in the Activity Tasks of the 'Human Develop ment and Family' Area in High School Technology & Home Economics Textbook Based on the 2015 Revised National Curriculum (2015 개정 교육과정 고등학교 기술·가정 교과서 '인간 발달과 가족' 영역 활동과제에 반영된 교과역량 분석)

Lim, Mo Seop;Choi, Seong Youn
- Journal of Korean Home Economics Education Association
- /
- v.35 no.3
- /
- pp.21-45
- /
- 2023
The purpose of this study was to analyze the curriculum competencies of relationship-forming ability and practical problem-solving ability reflected in the activity tasks corresponding to the content elements of 'Love and marriage', 'Preparation for parenthood', 'Pregnancy and childbirth', 'Child care', and 'Family culture and intergenerational relationship' in the 2015 revised high school technology & home economics textbooks. The data are 330 activity tasks from 12 kinds of high school technology & home economics textbooks. The sub-factors of the relationship-forming ability were selected as Respect for Diversity, Consideration and Care, Family Relationship and Community Spirit, Empathy Ability, Conflict Management, and Communication, and the sub-factors of practical problem-solving ability were selected as Practical Reasoning, Decision Making, Value Judgment, Critical Thinking, and Executive Power. Based on the analysis criteria, the results of the two analyses and the expert review are as follows. First, regarding both the core concepts 'Development' and 'Relationship', the share of relationship-forming ability was relatively higher than practical problem-solving ability, and conflict management and executive power were the least reflected. For the core concept 'Development', Family Relationship and Community Spirit and Critical Thinking were the most reflected sub-factors, and for the core concept 'Relationship', Consideration and Care and critical thinking were the most reflected sub-factors. Second, in the case of the relationship-forming ability, the examples of activity tasks across sub-factors of each subject competency were devised to understand diverse opinions and sentiments and to develop competencies to care for each other and maintain healthy family relationships. In the case of practical problem-solving ability, the tasks allowed students to objectively analyze the socio-cultural background underlying the real-life problem, explore alternatives, and apply in their own lives.
https://doi.org/10.19031/jkheea.2023.9.35.3.21 인용 PDF

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

Byun, Sungho;Lee, Donghoon;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.22 no.3
- /
- pp.23-43
- /
- 2016
As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.
https://doi.org/10.13088/jiis.2016.22.3.023 인용 PDF KSCI

Search Result 124, Processing Time 0.019 seconds

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

A Reflection on the Consumer Culture in the Post-COVID 19 Era from the Lens of Christian Education: Learning from the Drama, Penthouse (포스트 코로나 시대의 소비문화에 대한 기독교교육의 성찰 : 드라마 「펜트하우스」를 중심으로)

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)