Search | Korea Science

Training Techniques for Data Bias Problem on Deep Learning Text Summarization (딥러닝 텍스트 요약 모델의 데이터 편향 문제 해결을 위한 학습 기법)

Cho, Jun Hee;Oh, Hayoung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.7
- /
- pp.949-955
- /
- 2022
Deep learning-based text summarization models are not free from datasets. For example, a summarization model trained with a news summarization dataset is not good at summarizing other types of texts such as internet posts and papers. In this study, we define this phenomenon as Data Bias Problem (DBP) and propose two training methods for solving it. The first is the 'proper nouns masking' that masks proper nouns. The second is the 'length variation' that randomly inflates or deflates the length of text. As a result, experiments show that our methods are efficient for solving DBP. In addition, we analyze the results of the experiments and present future development directions. Our contributions are as follows: (1) We discovered DBP and defined it for the first time. (2) We proposed two efficient training methods and conducted actual experiments. (3) Our methods can be applied to all summarization models and are easy to implement, so highly practical.
https://doi.org/10.6109/jkiice.2022.26.7.949 인용 PDF KSCI

Short-term Predictive Models for Influenza-like Illness in Korea: Using Weekly ILI Surveillance Data and Web Search Queries (한국 인플루엔자 의사환자 단기 예측 모형 개발: 주간 ILI 감시 자료와 웹 검색 정보의 활용)

Jung, Jae Un
- Journal of Digital Convergence
- /
- v.16 no.9
- /
- pp.147-157
- /
- 2018
Since Google launched a prediction service for influenza-like illness(ILI), studies on ILI prediction based on web search data have proliferated worldwide. In this regard, this study aims to build short-term predictive models for ILI in Korea using ILI and web search data and measure the performance of the said models. In these proposed ILI predictive models specific to Korea, ILI surveillance data of Korea CDC and Korean web search data of Google and Naver were used along with the ARIMA model. Model 1 used only ILI data. Models 2 and 3 added Google and Naver search data to the data of Model 1, respectively. Model 4 included a common query used in Models 2 and 3 in addition to the data used in Model 1. In the training period, the goodness of fit of all predictive models was higher than 95% ($R^2$). In predictive periods 1 and 2, Model 1 yielded the best predictions (99.98% and 96.94%, respectively). Models 3(a), 4(b), and 4(c) achieved stable predictability higher than 90% in all predictive periods, but their performances were not better than that of Model 1. The proposed models that yielded accurate and stable predictions can be applied to early warning systems for the influenza pandemic in Korea, with supplementary studies on improving their performance.
https://doi.org/10.14400/JDC.2018.16.9.147 인용 PDF KSCI

Web Search Behavior Analysis Based on the Self-bundling Query Method (웹검색 행태 연구 - 사용자가 스스로 쿼리를 뭉치는 방법으로 -)

Lee, Joong-Seek
- Journal of the Korean Society for Library and Information Science
- /
- v.45 no.2
- /
- pp.209-228
- /
- 2011
Web search behavior has evolved. People now search using many diverse information devices in various situations. To monitor these scattered and shifting search patterns, an improved way of learning and analysis are needed. Traditional web search studies relied on the server transaction logs and single query instance analysis. Since people use multiple smart devices and their searching occurs intermittently through a day, a bundled query research could look at the whole context as well as penetrating search needs. To observe and analyze bundled queries, we developed a proprietary research software set including a log catcher, query bundling tool, and bundle monitoring tool. In this system, users' daily search logs are sent to our analytic server, every night the users need to log on our bundling tool to package his/her queries, a built in web survey collects additional data, and our researcher performs deep interviews on a weekly basis. Out of 90 participants in the study, it was found that a normal user generates on average 4.75 query bundles a day, and each bundle contains 2.75 queries. Query bundles were categorized by; Query refinement vs. Topic refinement and 9 different sub-categories.
https://doi.org/10.4275/KSLIS.2011.45.2.209 인용 PDF KSCI

Development of cardiopulmonary resuscitation nursing education program of web-based instruction (웹 기반의 심폐소생술 간호교육 프로그램 개발)

Sin, Hae-Won;Hong, Hae-Sook
- Journal of Korean Biological Nursing Science
- /
- v.4 no.1
- /
- pp.25-39
- /
- 2002
The purpose of this study is to develop and evaluate a web-based instruction Program(WBI) to help nurses improving their knowledge and skill of cardiopulmonary resuscitation. Using the model of web-based instruction(WBI) program designed by Rhu(1999), this study was carried out during February-April 2002 in five different steps; analysis, design, data collection and reconstruction, programming and publishing, and evaluation. The results of the study were as follows; 1) The goal of this program was focused on improving accuracy of knowledge and skills of cardiopulmonary resuscitation. The program texts consists of the concepts and importances of cardiopulmonary resuscitation(CPR), basic life support(BLS), advanced cardiac life support(ACLS), treatment of CPR, nursing care after CPR treatment. And in the file making step, photographs, drawings and image files were collected and edited by web-editor(Namo), scanner and Adobe photoshop program. Then, the files were modified and posted on the web by file transfer protocol(FTP). Finally, the program was demonstrated and once again revised by the result, and then completed. 2) For the evaluation of the program, 36 nurses who in K university hospital located in D city, and related questionnaire were distributed to them as well. Higher scores were given by the nurses in its learning contents with $4.2{\pm}.67$, and in its structuring and interaction of the program with $4.0{\pm}.79$, and also in its satisfactory of the program with $4.2{\pm}.58$ respectively. In conclusion, if the contents of this WBI educational program upgrade further based upon analysis and applying of the results the program evaluation, it is considered as an effective tool to implement for continuing education as life-long educational system for nurse.
PDF

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

Kim, JaeHun;Lee, Myungjin
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.43-61
- /
- 2019
Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.
https://doi.org/10.13088/jiis.2019.25.1.043 인용 PDF KSCI HTML

Effects of Instructional Material Using ICT at High School Earth Science (고등학교 지구과학 수업에서 ICT 활용 수업자료의 효과)

Lee, Yong-Seob;Kim, Jong-Hee;Kim, Sang-Dal
- Journal of the Korean earth science society
- /
- v.25 no.5
- /
- pp.336-347
- /
- 2004
This study investigated the effects of the application of a variety of ICTs cause the effects on self-directed learning capability, creativity and problem-solving ability. In order to achieve the above aim, Web-Based Instructions(WBI) and instructions using CD-ROM Titles for the unit of 'the solar system and the galaxy' were applicated and analyzed which belongs to the area of 'the earth' in the subject 'science' for high school students. Instructions using WBI materials and CD-ROM titles were found to be effective on 'self-conception', 'creativity', 'future inclination', 'self-assessment ability', 'openness' and' initiative' improvement all of which belong to self-directed learning characteristics. They did not, however, show meaningful effect on improving 'learning eagemess' and 'responsibility' improvement. On looking into self-directed learning characteristics according to prerequisite learning levels, both groups and these for instruction using CD-ROM learning materials were found to have no effect on interaction. With respect to problem-solving ability improvement which is characteristic of the instruction using ICTs, WBI proved more fruitful than instruction using CD-ROM titles on improving scholastic achievement level. WBI was effective on 'fluency', 'originality' and 'resistance to premature closure'. It on the other hand, was of no use on 'abstraction of titles' and 'elaborateness' These results came from the following characteristics: WBI came into effect on 'fluency' and 'originality' in the areas of variety and vitality, which are characteristic of WBI. In the area of resistance to premature closure WBI was effective on organizing learning contents owing to the animation of picture materials which are variously presented in the web site. As a result of WBI questionnaire about WBI, an excellent effect on the structure of display, quantity of information, indication and instruction, supplementary study and further study were discussed.
PDF KSCI

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
- Asia pacific journal of information systems
- /
- v.21 no.1
- /
- pp.103-122
- /
- 2011
Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.
PDF KSCI

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

Hong, Taeho;Lee, Taewon;Li, Jingjing
- Journal of Intelligence and Information Systems
- /
- v.22 no.1
- /
- pp.187-204
- /
- 2016
Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.
https://doi.org/10.13088/jiis.2016.22.1.187 인용 PDF KSCI

Training Needs Assessment Based on the CEO Competency Model of Vocational Training Institutes (역량모델에 기초한 직업훈련기관장의 훈련요구분석)

Rim, Kyung-Hwa;Kim, Jeong-ll;Lee, Moon-Su;Kwon, Oh-Young
- The Journal of Korean Institute for Practical Engineering Education
- /
- v.3 no.2
- /
- pp.158-165
- /
- 2011
The purpose of this study is to develop competency model in preparation for training programs for CEO of vocational training institutes in terms of needs assessment. This study collected data from 230 public and designated vocational training institutes including commercial learning facilities and life-long training centers using questionnaire by web mail. The frame of assessing training needs of competency applied to this study was based on the model with three components, namely importance, proficiency and learning desires of job skill and task needed to CEO of vocational training institutes. The methodologies of this study used were survey, FGI and case study. The major results proved that the higher priorities of training needs for vocational training institute CEO competency were as followings: (1) Competency for attaining talent of training teachers (2) Needs assessing competency for labor market change and trainee (3) Problem solving competency (4) Leadership skills and so on.
PDF

Design and Implementation of Education On Demand System based on SIP to support Conferencing Service (컨퍼런싱형 서비스를 지원하는 SIP 기반의 원격 교육 시스템의 설계 및 구현)

Park, Si-Yong;Yun, Byung-Nam;Chung, Ki-Dong
- The KIPS Transactions:PartA
- /
- v.10A no.6
- /
- pp.749-756
- /
- 2003
Currently, The Internet has been imported in many fields and it's development prompted accordingly. The field of education also has been developed as 'Learners Centered Environment' by combining the existing educational method, so called distance learning, with internet. Learners have become able to adopt the appropriate level of educational service anywhere and anytime they want. In this paper, we propose a EOD system based on SIP. The EOD system is a new conferencing type system in the distance learning field. Using SIP (Session Initial Protocol) for EOD system is more efficient than using H.323, also the more the number of its is increased, and as improvement of its QoS. The EOD system is scalable and provides conferencing service using SIP which suits with web environment. And, the EOD system supports multicast efficiently to utilize bandwidth of networks.
https://doi.org/10.3745/KIPSTA.2003.10A.6.749 인용 PDF KSCI

Search Result 1,315, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)