• Title/Summary/Keyword: MachineLearning

Search Result 5,657, Processing Time 0.035 seconds

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

Landslide Susceptibility Prediction using Evidential Belief Function, Weight of Evidence and Artificial Neural Network Models (Evidential Belief Function, Weight of Evidence 및 Artificial Neural Network 모델을 이용한 산사태 공간 취약성 예측 연구)

  • Lee, Saro;Oh, Hyun-Joo
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.2
    • /
    • pp.299-316
    • /
    • 2019
  • The purpose of this study was to analyze landslide susceptibility in the Pyeongchang area using Weight of Evidence (WOE) and Evidential Belief Function (EBF) as probability models and Artificial Neural Networks (ANN) as a machine learning model in a geographic information system (GIS). This study examined the widespread shallow landslides triggered by heavy rainfall during Typhoon Ewiniar in 2006, which caused serious property damage and significant loss of life. For the landslide susceptibility mapping, 3,955 landslide occurrences were detected using aerial photographs, and environmental spatial data such as terrain, geology, soil, forest, and land use were collected and constructed in a spatial database. Seventeen factors that could affect landsliding were extracted from the spatial database. All landslides were randomly separated into two datasets, a training set (50%) and validation set (50%), to establish and validate the EBF, WOE, and ANN models. According to the validation results of the area under the curve (AUC) method, the accuracy was 74.73%, 75.03%, and 70.87% for WOE, EBF, and ANN, respectively. The EBF model had the highest accuracy. However, all models had predictive accuracy exceeding 70%, the level that is effective for landslide susceptibility mapping. These models can be applied to predict landslide susceptibility in an area where landslides have not occurred previously based on the relationships between landslide and environmental factors. This susceptibility map can help reduce landslide risk, provide guidance for policy and land use development, and save time and expense for landslide hazard prevention. In the future, more generalized models should be developed by applying landslide susceptibility mapping in various areas.

Causal inference from nonrandomized data: key concepts and recent trends (비실험 자료로부터의 인과 추론: 핵심 개념과 최근 동향)

  • Choi, Young-Geun;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.173-185
    • /
    • 2019
  • Causal questions are prevalent in scientific research, for example, how effective a treatment was for preventing an infectious disease, how much a policy increased utility, or which advertisement would give the highest click rate for a given customer. Causal inference theory in statistics interprets those questions as inferring the effect of a given intervention (treatment or policy) in the data generating process. Causal inference has been used in medicine, public health, and economics; in addition, it has received recent attention as a tool for data-driven decision making processes. Many recent datasets are observational, rather than experimental, which makes the causal inference theory more complex. This review introduces key concepts and recent trends of statistical causal inference in observational studies. We first introduce the Neyman-Rubin's potential outcome framework to formularize from causal questions to average treatment effects as well as discuss popular methods to estimate treatment effects such as propensity score approaches and regression approaches. For recent trends, we briefly discuss (1) conditional (heterogeneous) treatment effects and machine learning-based approaches, (2) curse of dimensionality on the estimation of treatment effect and its remedies, and (3) Pearl's structural causal model to deal with more complex causal relationships and its connection to the Neyman-Rubin's potential outcome model.

Application of Google Search Queries for Predicting the Unemployment Rate for Koreans in Their 30s and 40s (한국 30~40대 실업률 예측을 위한 구글 검색 정보의 활용)

  • Jung, Jae Un;Hwang, Jinho
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.135-145
    • /
    • 2019
  • Prolonged recession has caused the youth unemployment rate in Korea to remain at a high level of approximately 10% for years. Recently, the number of unemployed Koreans in their 30s and 40s has shown an upward trend. To expand the government's employment promotion and unemployment benefits from youth-centered policies to diverse age groups, including people in their 30s and 40s, prediction models for different age groups are required. Thus, we aimed to develop unemployment prediction models for specific age groups (30s and 40s) using available unemployment rates provided by Statistics Korea and Google search queries related to them. We first estimated multiple linear regressions (Model 1) using seasonal autoregressive integrated moving average approach with relevant unemployment rates. Then, we introduced Google search queries to obtain improved models (Model 2). For both groups, consequently, Model 2 additionally using web queries outperformed Model 1 during training and predictive periods. This result indicates that a web search query is still significant to improve the unemployment predictive models for Koreans. For practical application, this study needs to be furthered but will contribute to obtaining age-wise unemployment predictions.

Study on Anomaly Detection Method of Improper Foods using Import Food Big data (수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구)

  • Cho, Sanggoo;Choi, Gyunghyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.19-33
    • /
    • 2018
  • Owing to the increase of FTA, food trade, and versatile preferences of consumers, food import has increased at tremendous rate every year. While the inspection check of imported food accounts for about 20% of the total food import, the budget and manpower necessary for the government's import inspection control is reaching its limit. The sudden import food accidents can cause enormous social and economic losses. Therefore, predictive system to forecast the compliance of food import with its preemptive measures will greatly improve the efficiency and effectiveness of import safety control management. There has already been a huge data accumulated from the past. The processed foods account for 75% of the total food import in the import food sector. The analysis of big data and the application of analytical techniques are also used to extract meaningful information from a large amount of data. Unfortunately, not many studies have been done regarding analyzing the import food and its implication with understanding the big data of food import. In this context, this study applied a variety of classification algorithms in the field of machine learning and suggested a data preprocessing method through the generation of new derivative variables to improve the accuracy of the model. In addition, the present study compared the performance of the predictive classification algorithms with the general base classifier. The Gaussian Naïve Bayes prediction model among various base classifiers showed the best performance to detect and predict the nonconformity of imported food. In the future, it is expected that the application of the abnormality detection model using the Gaussian Naïve Bayes. The predictive model will reduce the burdens of the inspection of import food and increase the non-conformity rate, which will have a great effect on the efficiency of the food import safety control and the speed of import customs clearance.

Construction of a Bark Dataset for Automatic Tree Identification and Developing a Convolutional Neural Network-based Tree Species Identification Model (수목 동정을 위한 수피 분류 데이터셋 구축과 합성곱 신경망 기반 53개 수종의 동정 모델 개발)

  • Kim, Tae Kyung;Baek, Gyu Heon;Kim, Hyun Seok
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.2
    • /
    • pp.155-164
    • /
    • 2021
  • Many studies have been conducted on developing automatic plant identification algorithms using machine learning to various plant features, such as leaves and flowers. Unlike other plant characteristics, barks show only little change regardless of the season and are maintained for a long period. Nevertheless, barks show a complex shape with a large variation depending on the environment, and there are insufficient materials that can be utilized to train algorithms. Here, in addition to the previously published bark image dataset, BarkNet v.1.0, images of barks were collected, and a dataset consisting of 53 tree species that can be easily observed in Korea was presented. A convolutional neural network (CNN) was trained and tested on the dataset, and the factors that interfere with the model's performance were identified. For CNN architecture, VGG-16 and 19 were utilized. As a result, VGG-16 achieved 90.41% and VGG-19 achieved 92.62% accuracy. When tested on new tree images that do not exist in the original dataset but belong to the same genus or family, it was confirmed that more than 80% of cases were successfully identified as the same genus or family. Meanwhile, it was found that the model tended to misclassify when there were distracting features in the image, including leaves, mosses, and knots. In these cases, we propose that random cropping and classification by majority votes are valid for improving possible errors in training and inferences.

Spatial Conservation Prioritization Considering Development Impacts and Habitat Suitability of Endangered Species (개발영향과 멸종위기종의 서식적합성을 고려한 보전 우선순위 선정)

  • Mo, Yongwon
    • Korean Journal of Environment and Ecology
    • /
    • v.35 no.2
    • /
    • pp.193-203
    • /
    • 2021
  • As endangered species are gradually increasing due to land development by humans, it is essential to secure sufficient protected areas (PAs) proactively. Therefore, this study checked priority conservation areas to select candidate PAs when considering the impact of land development. We determined the conservation priorities by analyzing four scenarios based on existing conservation areas and reflecting the development impact using MARXAN, the decision-making support software for the conservation plan. The development impact was derived using the developed area ratio, population density, road network system, and traffic volume. The conservation areas of endangered species were derived using the data of the appearance points of birds, mammals, and herptiles from the 3rd National Ecosystem Survey. These two factors were used as input data to map conservation priority areas with the machine learning-based optimization methodology. The result identified many non-PAs areas that were expected to play an important role conserving endangered species. When considering the land development impact, it was found that the areas with priority for conservation were fragmented. Even when both the development impact and existing PAs were considered, the priority was higher in areas from the current PAs because many road developments had already been completed around the current PAs. Therefore, it is necessary to consider areas other than the current PAs to protect endangered species and seek alternative measures to fragmented conservation priority areas.

A Study on the Awareness and Preparation of the Forth Industrial Revolution of Some Health Department College Students (일부 보건계열학과 대학생의 4차 산업혁명 인식 및 준비도 연구)

  • Cho, Hye-Eun
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.291-299
    • /
    • 2020
  • The purpose of this study was to be used as basic data for the development of future-type curriculum in health. The awareness and preparation of the forth industrial revolution were surveyed on 280 college students in health departments preparing medical technicians. A self-written structured questionnaire was used for data collection, and the recognition of the forth industry revolution was 2.74, 3D printing (3.59) was high, and neural network machine learning(2.33) was the lowest. Students majoring in Physiotherapy (3.00) had the highest perception, and those majored in Dental engineering(2.37) had the lowest perception, and there was a difference in the degree of perception of IoT by major (p=0.024). For the forth industrial revolution, 54.5% of students are preparing, and lack of interest (42.9%) is the most difficult reason to prepare, and 50.6% of educational experience and 60.9% of VR&AR game experience have experience. In the era of the forth industrial revolution, job loss (38.7%) was high, and the required competency was creative capacity (50.6%). Therefore, it is necessary to develop a curriculum related to the fourth industrial revolution and apply teaching methods that can increase the awareness and preparation of health college students in the era of the fourth industrial revolution.

Extraction of Important Areas Using Feature Feedback Based on PCA (PCA 기반 특징 되먹임을 이용한 중요 영역 추출)

  • Lee, Seung-Hyeon;Kim, Do-Yun;Choi, Sang-Il;Jeong, Gu-Min
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.6
    • /
    • pp.461-469
    • /
    • 2020
  • In this paper, we propose a PCA-based feature feedback method for extracting important areas of handwritten numeric data sets and face data sets. A PCA-based feature feedback method is proposed by extending the previous LDA-based feature feedback method. In the proposed method, the data is reduced to important feature dimensions by applying the PCA technique, one of the dimension reduction machine learning algorithms. Through the weights derived during the dimensional reduction process, the important points of data in each reduced dimensional axis are identified. Each dimension axis has a different weight in the total data according to the size of the eigenvalue of the axis. Accordingly, a weight proportional to the size of the eigenvalues of each dimension axis is given, and an operation process is performed to add important points of data in each dimension axis. The critical area of the data is calculated by applying a threshold to the data obtained through the calculation process. After that, induces reverse mapping to the original data in the important area of the derived data, and selects the important area in the original data space. The results of the experiment on the MNIST dataset are checked, and the effectiveness and possibility of the pattern recognition method based on PCA-based feature feedback are verified by comparing the results with the existing LDA-based feature feedback method.

On the Effect of Air-Simulated Side-Jets on the Aerodynamic Characteristics of a Missile by Multi-Fidelity Modeling (다충실도 모형화를 통한 공기로 모사된 측방제트가 유도무기의 공력특성에 미치는 영향 연구)

  • Kang, Shinseong;Kang, Dayoung;Lee, Kyunghoon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.49 no.2
    • /
    • pp.95-106
    • /
    • 2021
  • Side-jets enable the immediate maneuver of a missile compared to control surfaces; however, they may cause adverse effects on aerodynamic coefficients, for they interfere with freestream. To find out the impact of side-jets on aerodynamic coefficients, we simulate side-jets as air gas and utilize multi-fidelity models to evaluate differences between aerodynamic coefficients obtained with and without side-jets. We computed differences in aerodynamic coefficients to investigate side-jet effects for the changes of a Mach number, a bank angle, and an angle of attack. As a result, asymmetrically developed side-jets affect the longitudinal force and moment coefficients, and the lateral force and moment coefficients drastically change in-between -30 and 30 degrees of bank angles. In contrast, side-jets hardly influence the axial force coefficients. As for the axial moment coefficient, we could not determine the side-jet effect due to a lack of aerodynamic coefficient samples in the Mach number. All in all, we confirm that side-jets lead to the change of a missile attitude as they considerably vary the longitudinal and lateral aerodynamic coefficients.