Search | Korea Science

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
- Journal of Intelligence and Information Systems
- /
- v.24 no.4
- /
- pp.111-136
- /
- 2018
In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.
https://doi.org/10.13088/jiis.2018.24.4.111 인용 PDF KSCI HTML

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.139-153
- /
- 2017
Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.
https://doi.org/10.13088/jiis.2017.23.3.139 인용 PDF KSCI

Fuzzy Control of Smart TMD using Multi-Objective Genetic Algorithm (다목적 유전자알고리즘을 이용한 스마트 TMD의 퍼지제어)

Kang, Joo-Won;Kim, Hyun-Su
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.24 no.1
- /
- pp.69-78
- /
- 2011
In this study, an optimization method using multi-objective genetic algorithm(MOGA) has been proposed to develop a fuzzy control algorithm that can effectively control a smart tuned mass damper(TMD). A 76-story benchmark building subjected to wind load was selected as an example structure. The smart TMD consists of 100kN MR damper and the natural period of the smart TMD was tuned to the first mode natural period of the example structure. Damping force of MR damper is controlled to reduce the wind-induced responses of the example structure by a fuzzy logic controller. Two input variables of the fuzzy logic controller are the acceleration of 75th floor and the displacement of the smart TMD and the output variable is the command voltage sent to MR damper. Multi-objective genetic algorithm(NSGA-II) was used for optimization of the fuzzy logic controller and the acceleration of 75th story and the displacement of the smart TMD were used as objective function. After optimization, a series of fuzzy logic controllers which could appropriately reduce both wind responses of the building and smart TMD were obtained. Based on numerical results, it has been shown that the control performance of the smart TMD is much better than that of the passive TMD and it is even better than that of the sample active TMD in some cases.
PDF KSCI

Synthetic Trajectory Generation Tool for Indoor Moving Objects (실내공간 이동객체 궤적 생성기)

Ryoo, Hyung Gyu;Kim, Soo Jin;Li, Ki Joune
- Journal of Korean Society for Geospatial Information Science
- /
- v.24 no.4
- /
- pp.59-66
- /
- 2016
For the performance experiments of databases systems with moving object databases, we need moving object trajectory data sets. For example, benchmark data sets of moving object trajectories are required for experiments on query processing of moving object databases. For those reasons, several tools have been developed for generating moving objects in Euclidean spaces or road network spaces. Indoor space differs from outdoor spaces in many aspects and moving object generator for indoor space should reflect these differences. Even some tools were developed to produce virtual moving object trajectories in indoor space, the movements generated by them are not realistic. In this paper, we present a moving object generation tool for indoor space. First, this tool generates trajectories for pedestrians in an indoor space. And it provides a parametric generation of trajectories considering not only speed, number of pedestrians, minimum distance between pedestrians but also type of spaces, time constraints, and type of pedestrians. We try to reflect the patterns of pedestrians in indoor space as realistic as possible. For the reason of interoperability, several geospatial standards are used in the development of the tool.
https://doi.org/10.7319/kogsis.2016.24.4.059 인용 PDF KSCI

A Benchmark of Micro Parallel Computing Technology for Real-time Control in Smart Farm (MPICH vs OpenMP) (제목을스마트 시설환경 실시간 제어를 위한 마이크로 병렬 컴퓨팅 기술 분석)

Min, Jae-Ki;Lee, DongHoon
- Proceedings of the Korean Society for Agricultural Machinery Conference
- /
- 2017.04a
- /
- pp.161-161
- /
- 2017
스마트 시설환경의 제어 요소는 난방기, 창 개폐, 수분/양액 밸브 개폐, 환풍기, 제습기 등 직접적으로 시설환경의 조절에 관여하는 인자와 정보 교환을 위한 통신, 사용자 인터페이스 등 간접적으로 제어에 관련된 요소들이 복합적으로 존재한다. PID 제어와 같이 하는 수학적 논리를 바탕으로 한 제어와 전문 관리자의 지식을 기반으로 한 비선형 학습 모델에 의한 제어 등이 공존할 수 있다. 이러한 다양한 요소들을 복합적으로 연동시키기 위해선 기존의 시퀀스 기반 제어 방식에는 한계가 있을 수 있다. 관행의 방식과 같이 시계열 상에서 획득한 충분한 데이터를 이용하여 제어의 양과 시점을 결정하는 방식은 예외 상황에 충분히 대처하기 어려운 단점이 있을 수 있다. 이러한 예외 상황은 자연적인 조건의 변화에 따라 불가피하게 발생하는 경우와 시스템의 오류에 기인하는 경우로 나뉠 수 있다. 본 연구에서는 실시간으로 변하는 시설환경 내의 다양한 환경요소를 실시간으로 분석하고 상응하는 제어를 수행하여 수학적이며 예측 가능한 논리에 의해 준비된 제어시스템을 보완할 방법을 연구하였다. 과거의 고성능 컴퓨팅(HPC; High Performance Computing)은 다수의 컴퓨터를 고속 네트워크로 연동하여 집적적으로 연산능력을 향상시킨 기술로 비용과 규모의 측면에서 많은 투자를 필요로 하는 첨단 고급 기술이었다. 핸드폰과 모바일 장비의 발달로 인해 소형 마이크로프로세서가 발달하여 근래 2 Ghz의 클럭 속도에 이르는 어플리케이션 프로세서(AP: Application Processor)가 등장하기도 하였다. 상대적으로 낮은 성능에도 불구하고 저전력 소모와 플랫폼의 소형화를 장점으로 한 AP를 시설환경의 실시간 제어에 응용하기 위한 방안을 연구하였다. CPU의 클럭, 메모리의 양, 코어의 수량을 다음과 같이 달리한 3가지 시스템을 비교하여 AP를 이용한 마이크로 클러스터링 기술의 성능을 비교하였다.1) 1.5 Ghz, 8 Processors, 32 Cores, 1GByte/Processor, 32Bit Linux(ARMv71). 2) 2.0 Ghz, 4 Processors, 32 Cores, 2GByte/Processor, 32Bit Linux(ARMv71). 3) 1.5 Ghz, 8 Processors, 32 Cores, 2GByte/Processor, 64Bit Linux(Arch64). 병렬 컴퓨팅을 위한 개발 라이브러리로 MPICH(www.mpich.org)와 Open-MP(www.openmp.org)를 이용하였다. 2,500,000,000에 이르는 정수 중 소수를 구하는 연산에 소요된 시간은 1)17초, 2)13초, 3)3초 이었으며, $12800{\times}12800$ 크기의 행렬에 대한 2차원 FFT 연산 소요시간은 각각 1)10초, 2)8초, 3)2초 이었다. 3번 경우는 클럭속도가 3Gh에 이르는 상용 데스크탑의 연산 속도보다 빠르다고 평가할 수 있다. 라이브러리의 따른 결과는 근사적으로 동일하였다. 선행 연구에서 획득한 3차원 계측 데이터를 1초 단위로 3차원 선형 보간법을 수행한 경우 코어의 수를 4개 이하로 한 경우 근소한 차이로 동일한 결과를 보였으나, 코어의 수를 8개 이상으로 한 경우 앞선 결과와 유사한 경향을 보였다. 현장 보급 가능성, 구축비용 및 전력 소모 등을 종합적으로 고려한 AP 활용 마이크로 클러스터링 기술을 지속적으로 연구할 것이다.
PDF

Plant-wide On-line Monitoring and Diagnosis Based on Hierarchical Decomposition and Principal Component Analysis (계층적 분해 방법과 PCA를 이용한 공장규모 실시간 감시 및 진단)

Cho Hyun-Woo;Han Chong-hun
- Journal of the Korean Institute of Gas
- /
- v.1 no.1
- /
- pp.27-32
- /
- 1997
Continual monitoring of abnormal operating conditions i a key issue in maintaining high product quality and safe operation, since the undetected process abnormality may lead to the undesirable operations, finally producing low quality products, or breakdown of equipment. The statistical projection method recently highlighted has the advantage of easily building reference model with the historical measurement data in the statistically in-control state and not requiring any detailed mathematical model or knowledge-base of process. As the complexity of process increases, however, we have more measurement variables and recycle streams. This situation may not only result in the frequent occurrence of process Perturbation, but make it difficult to pinpoint trouble-making causes or at most assignable source unit due to the confusing candidates. Consequently, an ad hoc skill to monitor and diagnose in plat-wide scale is needed. In this paper, we propose a hierarchical plant-wide monitoring methodology based on hierarchical decomposition and principal component analysis for handling the complexity and interactions among process units. This have the effect of preventing special events in a specific sub-block from propagating to other sub-blocks or at least delaying the transfer of undesired state, and so make it possible to quickly detect and diagnose the process malfunctions. To prove the performance of the proposed methodology, we simulate the Tennessee Eastman benchmark process which is operated continuously with 41 measurement variables of five major units. Simulation results have shown that the proposed methodology offers a fast and reliable monitoring and diagnosis for a large scale chemical plant.
PDF

VILODE : A Real-Time Visual Loop Closure Detector Using Key Frames and Bag of Words (VILODE : 키 프레임 영상과 시각 단어들을 이용한 실시간 시각 루프 결합 탐지기)

Kim, Hyesuk;Kim, Incheol
- KIPS Transactions on Software and Data Engineering
- /
- v.4 no.5
- /
- pp.225-230
- /
- 2015
In this paper, we propose an effective real-time visual loop closure detector, VILODE, which makes use of key frames and bag of visual words (BoW) based on SURF feature points. In order to determine whether the camera has re-visited one of the previously visited places, a loop closure detector has to compare an incoming new image with all previous images collected at every visited place. As the camera passes through new places or locations, the amount of images to be compared continues growing. For this reason, it is difficult for a visual loop closure detector to meet both real-time constraint and high detection accuracy. To address the problem, the proposed system adopts an effective key frame selection strategy which selects and compares only distinct meaningful ones from continuously incoming images during navigation, and so it can reduce greatly image comparisons for loop detection. Moreover, in order to improve detection accuracy and efficiency, the system represents each key frame image as a bag of visual words, and maintains indexes for them using DBoW database system. The experiments with TUM benchmark datasets demonstrates high performance of the proposed visual loop closure detector.
https://doi.org/10.3745/KTSDE.2015.4.5.225 인용 PDF KSCI

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories (분산 클러스터 메모리 기반 대용량 OWL Horst Lite 온톨로지 추론 기법)

Kim, Je-Min;Park, Young-Tack
- Journal of KIISE
- /
- v.42 no.3
- /
- pp.307-319
- /
- 2015
Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).
https://doi.org/10.5626/JOK.2015.42.3.307 인용 KSCI

A Qualitative Review of the Difficulties and Success Strategies of Workplace Health Management (사업장 보건관리의 어려움과 성공전략에 대한 질적 고찰)

Jung, Myung-Hee;Choi, Eun-Hi;Jung, Hye-Sun
- Journal of the Korean Applied Science and Technology
- /
- v.37 no.4
- /
- pp.925-935
- /
- 2020
This study aims to provide guidelines for the activities of workplace health managers by identifying their excellent health promotion activities and motivations. For this, consent for the study was obtained from 21 workplace health managers who had worked at the same company for more than five years, and a semi-structured questionnaire was sent via email to collect and qualitatively analyze the data. As a result, 17 categories and three topics were drawn. The derived topics included a sense of reward and accomplishment as workplace health managers, difficulties encountered as workplace health managers, and how to solve the difficulties. The survey respondents answered that they feel a sense of reward and accomplishment when workers open their mind, change their daily life, express gratitude, and pioneer new fields. On the other hand, they feel difficulties with unpredictable health, changes in the organizational culture, secrecy of disease, and people who think their job is easy. As a way of overcoming such difficulties, the research subjects said that it is necessary to read the minds of workers, let them know by themselves, use existing programs, have the most difficult people on their side, and spread their achievements. The results of this study revealed that it is required for workplace health managers to develop professional skills and emphasize the importance of health management to the policy-makers and employees of their workplace by continuously reporting health management performance. In addition, they need to actively benchmark the success strategies of exemplary workplace health managers.
https://doi.org/10.12925/jkocs.2020.37.4.925 인용 PDF KSCI

Application of Machine Learning to Predict Weight Loss in Overweight, and Obese Patients on Korean Medicine Weight Management Program (한의 체중 조절 프로그램에 참여한 과체중, 비만 환자에서의 머신러닝 기법을 적용한 체중 감량 예측 연구)

Kim, Eunjoo;Park, Young-Bae;Choi, Kahye;Lim, Young-Woo;Ok, Ji-Myung;Noh, Eun-Young;Song, Tae Min;Kang, Jihoon;Lee, Hyangsook;Kim, Seo-Young
- The Journal of Korean Medicine
- /
- v.41 no.2
- /
- pp.58-79
- /
- 2020
Objectives: The purpose of this study is to predict the weight loss by applying machine learning using real-world clinical data from overweight and obese adults on weight loss program in 4 Korean Medicine obesity clinics. Methods: From January, 2017 to May, 2019, we collected data from overweight and obese adults (BMI≥23 kg/m2) who registered for a 3-month Gamitaeeumjowi-tang prescription program. Predictive analysis was conducted at the time of three prescriptions, and the expected reduced rate and reduced weight at the next order of prescription were predicted as binary classification (classification benchmark: highest quartile, median, lowest quartile). For the median, further analysis was conducted after using the variable selection method. The data set for each analysis was 25,988 in the first, 6,304 in the second, and 833 in the third. 5-fold cross validation was used to prevent overfitting. Results: Prediction accuracy was increased from 1^st to 2^nd and 3^rd analysis. After selecting the variables based on the median, artificial neural network showed the highest accuracy in 1^st (54.69%), 2^nd (73.52%), and 3^rd (81.88%) prediction analysis based on reduced rate. The prediction performance was additionally confirmed through AUC, Random Forest showed the highest in 1^st (0.640), 2^nd (0.816), and 3^rd (0.939) prediction analysis based on reduced weight. Conclusions: The prediction of weight loss by applying machine learning showed that the accuracy was improved by using the initial weight loss information. There is a possibility that it can be used to screen patients who need intensive intervention when expected weight loss is low.
https://doi.org/10.13048/jkm.20015 인용 PDF KSCI

Search Result 853, Processing Time 0.042 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)