• Title/Summary/Keyword: 기계 학습.훈련

Search Result 130, Processing Time 0.024 seconds

Multi-Label Classification for Corporate Review Text: A Local Grammar Approach (머신러닝 기반의 기업 리뷰 다중 분류: 부분 문법 적용을 중심으로)

  • HyeYeon Baek;Young Kyun Chang
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.27-41
    • /
    • 2023
  • Unlike the previous works focusing on the state-of-the-art methodologies to improve the performance of machine learning models, this study improves the 'quality' of training data used in machine learning. We propose a method to enhance the quality of training data through the processing of 'local grammar,' frequently used in corpus analysis. We collected a vast amount of unstructured corporate review text data posted by employees working in the top 100 companies in Korea. After improving the data quality using the local grammar process, we confirmed that the classification model with local grammar outperformed the model without it in terms of classification performance. We defined five factors of work engagement as classification categories, and analyzed how the pattern of reviews changed before and after the COVID-19 pandemic. Through this study, we provide evidence that shows the value of the local grammar-based automatic identification and classification of employee experiences, and offer some clues for significant organizational cultural phenomena.

Analysis of PBL for Korean Apprenticeship Program in Mechanical Engineering (기계분야 일학습병행제에서의 PBL 실태 분석)

  • Chang, Hea Jung;Kang, Seonae
    • Journal of Practical Engineering Education
    • /
    • v.13 no.3
    • /
    • pp.515-532
    • /
    • 2021
  • The purpose of this study was to analysis of PBL for Korean Apprenticeship Program in Mechanical Engineering. The details of the study were as follows: First, the perception related to the PBL of Korean apprenticeship program was investigated. Second, the utilization and the operational difficulties of PBL for Korean Apprenticeship Program were investigated. Third, the supporting system for PBL was suggested. Research methods were literature research, questionnaire survey and FGI. The survey was conducted online from July 15 to August 14, 2021. A total of 515 respondents responded. A total of 108 in 515 respondents were in Mechanical Engineering. FGI conducted a total of 25 people who actual use PBL in the field of Korean Apprenticeship Program. Conclusions and suggestions based upon the result of this study are as follows. First, It is necessary to improve the utilization of PBL for Korean Apprenticeship Program in Industry. Second, PBL is necessary to apply optionally according to the job and field situation. Third, it is necessary to support system of evaluation for PBL in Korean Apprenticeship Program. Finally, related operation model and guideline need to be prepared for best practice.

Application of Machine Learning Algorithm and Remote-sensed Data to Estimate Forest Gross Primary Production at Multi-sites Level (산림 총일차생산량 예측의 공간적 확장을 위한 인공위성 자료와 기계학습 알고리즘의 활용)

  • Lee, Bora;Kim, Eunsook;Lim, Jong-Hwan;Kang, Minseok;Kim, Joon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_2
    • /
    • pp.1117-1132
    • /
    • 2019
  • Forest covers 30% of the Earth's land area and plays an important role in global carbon flux through its ability to store much greater amounts of carbon than other terrestrial ecosystems. The Gross Primary Production (GPP) represents the productivity of forest ecosystems according to climate change and its effect on the phenology, health, and carbon cycle. In this study, we estimated the daily GPP for a forest ecosystem using remote-sensed data from Moderate Resolution Imaging Spectroradiometer (MODIS) and machine learning algorithms Support Vector Machine (SVM). MODIS products were employed to train the SVM model from 75% to 80% data of the total study period and validated using eddy covariance measurement (EC) data at the six flux tower sites. We also compare the GPP derived from EC and MODIS (MYD17). The MODIS products made use of two data sets: one for Processed MODIS that included calculated by combined products (e.g., Vapor Pressure Deficit), another one for Unprocessed MODIS that used MODIS products without any combined calculation. Statistical analyses, including Pearson correlation coefficient (R), mean squared error (MSE), and root mean square error (RMSE) were used to evaluate the outcomes of the model. In general, the SVM model trained by the Unprocessed MODIS (R = 0.77 - 0.94, p < 0.001) derived from the multi-sites outperformed those trained at a single-site (R = 0.75 - 0.95, p < 0.001). These results show better performance trained by the data including various events and suggest the possibility of using remote-sensed data without complex processes to estimate GPP such as non-stationary ecological processes.

A Comparative Study between Vocational Training Using Virtual Reality and Traditional Training: Focusing on Industrial Cranes (가상현실을 활용한 직업훈련과 전통적인 훈련과의 비교연구: 산업용크레인을 중심으로)

  • Seong-Yeon Mun;Hyun-Jung Oh;Sang-Joon Lee
    • Journal of Practical Engineering Education
    • /
    • v.16 no.4
    • /
    • pp.529-540
    • /
    • 2024
  • In industrial sites, experiential virtual training contents are partially used to replace high-risk and high-cost training, and virtual training contents development is also becoming active along with the increasing demand for non-face-to-face industries. Existing studies mainly focused on quantitative research through surveys, and only measured the change in users' learning commitment. This study attempted to investigate the effect of the combination of theoretical education and virtual training on the improvement of actual job performance in a dual vocational training environment by conducting an experimental study. This study studied whether the combination of theoretical education and virtual training can improve the performance of vocational training in dual vocational training (comparative work and learning) in which companies and schools participate. The results of pre- and post-evaluation of vocational training using traditional vocational training and virtual training contents were compared with 24 vocational training trainees. As a result of the study, it was demonstrated that the outcome of virtual training education was higher than that of traditional vocational training, and the combination of virtual reality-based education was more effective in theoretical education. This study suggests that the virtual training content presents a new paradigm for industrial safety education, and through the interview results of trainees, it was confirmed that virtual training can lead to a change in attitude toward safety beyond just knowledge transfer. This contributes to the prevention of safety accidents in industrial sites and provides important implications for improving the quality of vocational training.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Performance Evaluation of Machine Learning and Deep Learning Algorithms in Crop Classification: Impact of Hyper-parameters and Training Sample Size (작물분류에서 기계학습 및 딥러닝 알고리즘의 분류 성능 평가: 하이퍼파라미터와 훈련자료 크기의 영향 분석)

  • Kim, Yeseul;Kwak, Geun-Ho;Lee, Kyung-Do;Na, Sang-Il;Park, Chan-Won;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.5
    • /
    • pp.811-827
    • /
    • 2018
  • The purpose of this study is to compare machine learning algorithm and deep learning algorithm in crop classification using multi-temporal remote sensing data. For this, impacts of machine learning and deep learning algorithms on (a) hyper-parameter and (2) training sample size were compared and analyzed for Haenam-gun, Korea and Illinois State, USA. In the comparison experiment, support vector machine (SVM) was applied as machine learning algorithm and convolutional neural network (CNN) was applied as deep learning algorithm. In particular, 2D-CNN considering 2-dimensional spatial information and 3D-CNN with extended time dimension from 2D-CNN were applied as CNN. As a result of the experiment, it was found that the hyper-parameter values of CNN, considering various hyper-parameter, defined in the two study areas were similar compared with SVM. Based on this result, although it takes much time to optimize the model in CNN, it is considered that it is possible to apply transfer learning that can extend optimized CNN model to other regions. Then, in the experiment results with various training sample size, the impact of that on CNN was larger than SVM. In particular, this impact was exaggerated in Illinois State with heterogeneous spatial patterns. In addition, the lowest classification performance of 3D-CNN was presented in Illinois State, which is considered to be due to over-fitting as complexity of the model. That is, the classification performance was relatively degraded due to heterogeneous patterns and noise effect of input data, although the training accuracy of 3D-CNN model was high. This result simply that a proper classification algorithms should be selected considering spatial characteristics of study areas. Also, a large amount of training samples is necessary to guarantee higher classification performance in CNN, particularly in 3D-CNN.

Development of Evaluation Model for Learning Company Participating Work-Study Parallel Program using AHP (AHP를 활용한 일학습병행 학습기업 평가모형 개발)

  • Dong-Wook Kim;Hwan Young Choi
    • Journal of Practical Engineering Education
    • /
    • v.15 no.3
    • /
    • pp.671-679
    • /
    • 2023
  • This study aims to establish an evaluation model by quantifying the evaluation index as a follow-up study to the development of evaluation index for work-study parallel learning companies. An evaluation model was established by verifying the 2nd level components based on the quantitative factors of the learning company, the qualitative factors, the competency factors of the person in charge, and the competency factors of the learning workers, which are the highest-level components derived from previous study. For the evaluation of a learning company, an AHP survey was conducted with experts in charge of the company consulting to derive important factors that determine the quality of on-site education and training, and the evaluation model of the learning company was completed and grouped by calculating the weight between evaluation items proceeded. Work-study parallel program was promoted as a key policy to resolve the mismatch between industrial sites and school education and realize a competency-centered society, and as of December 2022, 16,664 companies participated in the training. Learning companies play a very important role as education and training supply organizations that conduct field training. It is expected that the support and consulting plan for each level of learning companies according to the evaluation model presented in this study will be used as basic data to improve the quality of work-study parallel program.

KOMPSAT-3A Urban Classification Using Machine Learning Algorithm - Focusing on Yang-jae in Seoul - (기계학습 기법에 따른 KOMPSAT-3A 시가화 영상 분류 - 서울시 양재 지역을 중심으로 -)

  • Youn, Hyoungjin;Jeong, Jongchul
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_2
    • /
    • pp.1567-1577
    • /
    • 2020
  • Urban land cover classification is role in urban planning and management. So, it's important to improve classification accuracy on urban location. In this paper, machine learning model, Support Vector Machine (SVM) and Artificial Neural Network (ANN) are proposed for urban land cover classification based on high resolution satellite imagery (KOMPSAT-3A). Satellite image was trained based on 25 m rectangle grid to create training data, and training models used for classifying test area. During the validation process, we presented confusion matrix for each result with 250 Ground Truth Points (GTP). Of the four SVM kernels and the two activation functions ANN, the SVM Polynomial kernel model had the highest accuracy of 86%. In the process of comparing the SVM and ANN using GTP, the SVM model was more effective than the ANN model for KOMPSAT-3A classification. Among the four classes (building, road, vegetation, and bare-soil), building class showed the lowest classification accuracy due to the shadow caused by the high rise building.

Network Anomaly Detection Technologies Using Unsupervised Learning AutoEncoders (비지도학습 오토 엔코더를 활용한 네트워크 이상 검출 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.617-629
    • /
    • 2020
  • In order to overcome the limitations of the rule-based intrusion detection system due to changes in Internet computing environments, the emergence of new services, and creativity of attackers, network anomaly detection (NAD) using machine learning and deep learning technologies has received much attention. Most of these existing machine learning and deep learning technologies for NAD use supervised learning methods to learn a set of training data set labeled 'normal' and 'attack'. This paper presents the feasibility of the unsupervised learning AutoEncoder(AE) to NAD from data sets collecting of secured network traffic without labeled responses. To verify the performance of the proposed AE mode, we present the experimental results in terms of accuracy, precision, recall, f1-score, and ROC AUC value on the NSL-KDD training and test data sets. In particular, we model a reference AE through the deep analysis of diverse AEs varying hyper-parameters such as the number of layers as well as considering the regularization and denoising effects. The reference model shows the f1-scores 90.4% and 89% of binary classification on the KDDTest+ and KDDTest-21 test data sets based on the threshold of the 82-th percentile of the AE reconstruction error of the training data set.

Improvement of generalization of linear model through data augmentation based on Central Limit Theorem (데이터 증가를 통한 선형 모델의 일반화 성능 개량 (중심극한정리를 기반으로))

  • Hwang, Doohwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.19-31
    • /
    • 2022
  • In Machine learning, we usually divide the entire data into training data and test data, train the model using training data, and use test data to determine the accuracy and generalization performance of the model. In the case of models with low generalization performance, the prediction accuracy of newly data is significantly reduced, and the model is said to be overfit. This study is about a method of generating training data based on central limit theorem and combining it with existed training data to increase normality and using this data to train models and increase generalization performance. To this, data were generated using sample mean and standard deviation for each feature of the data by utilizing the characteristic of central limit theorem, and new training data was constructed by combining them with existed training data. To determine the degree of increase in normality, the Kolmogorov-Smirnov normality test was conducted, and it was confirmed that the new training data showed increased normality compared to the existed data. Generalization performance was measured through differences in prediction accuracy for training data and test data. As a result of measuring the degree of increase in generalization performance by applying this to K-Nearest Neighbors (KNN), Logistic Regression, and Linear Discriminant Analysis (LDA), it was confirmed that generalization performance was improved for KNN, a non-parametric technique, and LDA, which assumes normality between model building.