Search | Korea Science

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

Kim, JaeHun;Lee, Myungjin
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.43-61
- /
- 2019
Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.
https://doi.org/10.13088/jiis.2019.25.1.043 인용 PDF KSCI HTML

Adverse Effects on EEGs and Bio-Signals Coupling on Improving Machine Learning-Based Classification Performances

SuJin Bak
- Journal of the Korea Society of Computer and Information
- /
- v.28 no.10
- /
- pp.133-153
- /
- 2023
In this paper, we propose a novel approach to investigating brain-signal measurement technology using Electroencephalography (EEG). Traditionally, researchers have combined EEG signals with bio-signals (BSs) to enhance the classification performance of emotional states. Our objective was to explore the synergistic effects of coupling EEG and BSs, and determine whether the combination of EEG+BS improves the classification accuracy of emotional states compared to using EEG alone or combining EEG with pseudo-random signals (PS) generated arbitrarily by random generators. Employing four feature extraction methods, we examined four combinations: EEG alone, EG+BS, EEG+BS+PS, and EEG+PS, utilizing data from two widely-used open datasets. Emotional states (task versus rest states) were classified using Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) classifiers. Our results revealed that when using the highest accuracy SVM-FFT, the average error rates of EEG+BS were 4.7% and 6.5% higher than those of EEG+PS and EEG alone, respectively. We also conducted a thorough analysis of EEG+BS by combining numerous PSs. The error rate of EEG+BS+PS displayed a V-shaped curve, initially decreasing due to the deep double descent phenomenon, followed by an increase attributed to the curse of dimensionality. Consequently, our findings suggest that the combination of EEG+BS may not always yield promising classification performance.
https://doi.org/10.9708/jksci.2023.28.10.133 인용 PDF HTML

Comparison of Artificial Intelligence Multitask Performance using Object Detection and Foreground Image (물체탐색과 전경영상을 이용한 인공지능 멀티태스크 성능 비교)

Jeong, Min Hyuk;Kim, Sang-Kyun;Lee, Jin Young;Choo, Hyon-Gon;Lee, HeeKyung;Cheong, Won-Sik
- Journal of Broadcast Engineering
- /
- v.27 no.3
- /
- pp.308-317
- /
- 2022
Researches are underway to efficiently reduce the size of video data transmitted and stored in the image analysis process using deep learning-based machine vision technology. MPEG (Moving Picture Expert Group) has newly established a standardization project called VCM (Video Coding for Machine) and is conducting research on video encoding for machines rather than video encoding for humans. We are researching a multitask that performs various tasks with one image input. The proposed pipeline does not perform all object detection of each task that should precede object detection, but precedes it only once and uses the result as an input for each task. In this paper, we propose a pipeline for efficient multitasking and perform comparative experiments on compression efficiency, execution time, and result accuracy of the input image to check the efficiency. As a result of the experiment, the capacity of the input image decreased by more than 97.5%, while the accuracy of the result decreased slightly, confirming the possibility of efficient multitasking.
https://doi.org/10.5909/JBE.2022.27.3.308 인용 PDF KSCI KPUBS

Indoor Positioning System using Geomagnetic Field with Recurrent Neural Network Model (순환신경망을 이용한 자기장 기반 실내측위시스템)

Bae, Han Jun;Choi, Lynn;Park, Byung Joon
- The Journal of Korean Institute of Next Generation Computing
- /
- v.14 no.6
- /
- pp.57-65
- /
- 2018
Conventional RF signal-based indoor localization techniques such as BLE or Wi-Fi based fingerprinting method show considerable localization errors even in small-scale indoor environments due to unstable received signal strength(RSS) of RF signals. Therefore, it is difficult to apply the existing RF-based fingerprinting techniques to large-scale indoor environments such as airports and department stores. In this paper, instead of RF signal we use the geomagnetic sensor signal for indoor localization, whose signal strength is more stable than RF RSS. Although similar geomagnetic field values exist in indoor space, an object movement would experience a unique sequence of the geomagnetic field signals as the movement continues. We use a deep neural network model called the recurrent neural network (RNN), which is effective in recognizing time-varying sequences of sensor data, to track the user's location and movement path. To evaluate the performance of the proposed geomagnetic field based indoor positioning system (IPS), we constructed a magnetic field map for a campus testbed of about $94m{\times}26$ dimension and trained RNN using various potential movement paths and their location data extracted from the magnetic field map. By adjusting various hyperparameters, we could achieve an average localization error of 1.20 meters in the testbed.

A Study on Prediction of PM_2.5 Concentration Using DNN (Deep Neural Network를 활용한 초미세먼지 농도 예측에 관한 연구)

Choi, Inho;Lee, Wonyoung;Eun, Beomjin;Heo, Jeongsook;Chang, Kwang-Hyeon;Oh, Jongmin
- Journal of Environmental Impact Assessment
- /
- v.31 no.2
- /
- pp.83-94
- /
- 2022
In this study, DNN-based models were learned using air quality determination data for 2017, 2019, and 2020 provided by the National Measurement Network (Air Korea), and this models evaluated using data from 2016 and 2018. Based on Pearson correlation coefficient 0.2, four items (SO₂, CO, NO₂, PM₁₀) were initially modeled as independent variables. In order to improve the accuracy of prediction, monthly independent modeling was carried out. The error was calculated by RMSE (Root Mean Square Error) method, and the initial model of RMSE was 5.78, which was about 46% betterthan the national moving average modelresult (10.77). In addition, the performance improvement of the independent monthly model was observed in months other than November compared to the initial model. Therefore, this study confirms that DNN modeling was effective in predicting PM_2.5 concentrations based on air pollutants concentrations, and that the learning performance of the model could be improved by selecting additional independent variables.
https://doi.org/10.14249/eia.2022.31.2.83 인용 PDF KSCI

Evaluating the prediction models of leaf wetness duration for citrus orchards in Jeju, South Korea (제주 감귤 과수원에서의 이슬지속시간 예측 모델 평가)

Park, Jun Sang;Seo, Yun Am;Kim, Kyu Rang;Ha, Jong-Chul
- Korean Journal of Agricultural and Forest Meteorology
- /
- v.20 no.3
- /
- pp.262-276
- /
- 2018
Models to predict Leaf Wetness Duration (LWD) were evaluated using the observed meteorological and dew data at the 11 citrus orchards in Jeju, South Korea from 2016 to 2017. The sensitivity and the prediction accuracy were evaluated with four models (i.e., Number of Hours of Relative Humidity (NHRH), Classification And Regression Tree/Stepwise Linear Discriminant (CART/SLD), Penman-Monteith (PM), Deep-learning Neural Network (DNN)). The sensitivity of models was evaluated with rainfall and seasonal changes. When the data in rainy days were excluded from the whole data set, the LWD models had smaller average error (Root Mean Square Error (RMSE) about 1.5hours). The seasonal error of the DNN model had the similar magnitude (RMSE about 3 hours) among all seasons excluding winter. The other models had the greatest error in summer (RMSE about 9.6 hours) and the lowest error in winter (RMSE about 3.3 hours). These models were also evaluated by the statistical error analysis method and the regression analysis method of mean squared deviation. The DNN model had the best performance by statistical error whereas the CART/SLD model had the worst prediction accuracy. The Mean Square Deviation (MSD) is a method of analyzing the linearity of a model with three components: squared bias (SB), nonunity slope (NU), and lack of correlation (LC). Better model performance was determined by lower SB and LC and higher NU. The results of MSD analysis indicated that the DNN model would provide the best performance and followed by the PM, the NHRH and the CART/SLD in order. This result suggested that the machine learning model would be useful to improve the accuracy of agricultural information using meteorological data.
https://doi.org/10.5532/KJAFM.2018.20.3.262 인용 PDF KSCI

Efficient Deep Learning Approaches for Active Fire Detection Using Himawari-8 Geostationary Satellite Images (Himawari-8 정지궤도 위성 영상을 활용한 딥러닝 기반 산불 탐지의 효율적 방안 제시)

Sihyun Lee;Yoojin Kang;Taejun Sung;Jungho Im
- Korean Journal of Remote Sensing
- /
- v.39 no.5_3
- /
- pp.979-995
- /
- 2023
As wildfires are difficult to predict, real-time monitoring is crucial for a timely response. Geostationary satellite images are very useful for active fire detection because they can monitor a vast area with high temporal resolution (e.g., 2 min). Existing satellite-based active fire detection algorithms detect thermal outliers using threshold values based on the statistical analysis of brightness temperature. However, the difficulty in establishing suitable thresholds for such threshold-based methods hinders their ability to detect fires with low intensity and achieve generalized performance. In light of these challenges, machine learning has emerged as a potential-solution. Until now, relatively simple techniques such as random forest, Vanilla convolutional neural network (CNN), and U-net have been applied for active fire detection. Therefore, this study proposed an active fire detection algorithm using state-of-the-art (SOTA) deep learning techniques using data from the Advanced Himawari Imager and evaluated it over East Asia and Australia. The SOTA model was developed by applying EfficientNet and lion optimizer, and the results were compared with the model using the Vanilla CNN structure. EfficientNet outperformed CNN with F1-scores of 0.88 and 0.83 in East Asia and Australia, respectively. The performance was better after using weighted loss, equal sampling, and image augmentation techniques to fix data imbalance issues compared to before the techniques were used, resulting in F1-scores of 0.92 in East Asia and 0.84 in Australia. It is anticipated that timely responses facilitated by the SOTA deep learning-based approach for active fire detection will effectively mitigate the damage caused by wildfires.
https://doi.org/10.7780/kjrs.2023.39.5.3.8 인용 PDF HTML

Korean Morphological Analysis Method Based on BERT-Fused Transformer Model (BERT-Fused Transformer 모델에 기반한 한국어 형태소 분석 기법)

Lee, Changjae;Ra, Dongyul
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.4
- /
- pp.169-178
- /
- 2022
Morphemes are most primitive units in a language that lose their original meaning when segmented into smaller parts. In Korean, a sentence is a sequence of eojeols (words) separated by spaces. Each eojeol comprises one or more morphemes. Korean morphological analysis (KMA) is to divide eojeols in a given Korean sentence into morpheme units. It also includes assigning appropriate part-of-speech(POS) tags to the resulting morphemes. KMA is one of the most important tasks in Korean natural language processing (NLP). Improving the performance of KMA is closely related to increasing performance of Korean NLP tasks. Recent research on KMA has begun to adopt the approach of machine translation (MT) models. MT is to convert a sequence (sentence) of units of one domain into a sequence (sentence) of units of another domain. Neural machine translation (NMT) stands for the approaches of MT that exploit neural network models. From a perspective of MT, KMA is to transform an input sequence of units belonging to the eojeol domain into a sequence of units in the morpheme domain. In this paper, we propose a deep learning model for KMA. The backbone of our model is based on the BERT-fused model which was shown to achieve high performance on NMT. The BERT-fused model utilizes Transformer, a representative model employed by NMT, and BERT which is a language representation model that has enabled a significant advance in NLP. The experimental results show that our model achieves 98.24 F1-Score.
https://doi.org/10.3745/KTSDE.2022.11.4.169 인용 PDF KSCI

An Algorithm of Fingerprint Image Restoration Based on an Artificial Neural Network (인공 신경망 기반의 지문 영상 복원 알고리즘)

Jang, Seok-Woo;Lee, Samuel;Kim, Gye-Young
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.21 no.8
- /
- pp.530-536
- /
- 2020
The use of minutiae by fingerprint readers is robust against presentation attacks, but one weakness is that the mismatch rate is high. Therefore, minutiae tend to be used with skeleton images. There have been many studies on security vulnerabilities in the characteristics of minutiae, but vulnerability studies on the skeleton are weak, so this study attempts to analyze the vulnerability of presentation attacks against the skeleton. To this end, we propose a method based on the skeleton to recover the original fingerprint using a learning algorithm. The proposed method includes a new learning model, Pix2Pix, which adds a latent vector to the existing Pix2Pix model, thereby generating a natural fingerprint. In the experimental results, the original fingerprint is restored using the proposed machine learning, and then, the restored fingerprint is the input for the fingerprint reader in order to achieve a good recognition rate. Thus, this study verifies that fingerprint readers using the skeleton are vulnerable to presentation attacks. The approach presented in this paper is expected to be useful in a variety of applications concerning fingerprint restoration, video security, and biometrics.
https://doi.org/10.5762/KAIS.2020.21.8.530 인용 PDF KSCI

Evaluating the groundwater prediction using LSTM model (LSTM 모형을 이용한 지하수위 예측 평가)

Park, Changhui;Chung, Il-Moon
- Journal of Korea Water Resources Association
- /
- v.53 no.4
- /
- pp.273-283
- /
- 2020
Quantitative forecasting of groundwater levels for the assessment of groundwater variation and vulnerability is very important. To achieve this purpose, various time series analysis and machine learning techniques have been used. In this study, we developed a prediction model based on LSTM (Long short term memory), one of the artificial neural network (ANN) algorithms, for predicting the daily groundwater level of 11 groundwater wells in Hankyung-myeon, Jeju Island. In general, the groundwater level in Jeju Island is highly autocorrelated with tides and reflected the effects of precipitation. In order to construct an input and output variables based on the characteristics of addressing data, the precipitation data of the corresponding period was added to the groundwater level data. The LSTM neural network was trained using the initial 365-day data showing the four seasons and the remaining data were used for verification to evaluate the fitness of the predictive model. The model was developed using Keras, a Python-based deep learning framework, and the NVIDIA CUDA architecture was implemented to enhance the learning speed. As a result of learning and verifying the groundwater level variation using the LSTM neural network, the coefficient of determination (R²) was 0.98 on average, indicating that the predictive model developed was very accurate.
https://doi.org/10.3741/JKWRA.2020.53.4.273 인용 PDF KSCI

Search Result 1,085, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)