• Title/Summary/Keyword: 기계 학습.훈련

Search Result 130, Processing Time 0.022 seconds

Fuzzy Classification Algorithm for Incomplete Data (불완전 데이터 처리를 위한 퍼지 분류 알고리즘)

  • Lee, Chan-Hee;Park, Choong-shik;Woo, Young Woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.387-390
    • /
    • 2009
  • 패턴 분류 문제는 기계 학습 분야에서 매우 중요한 연구 주제이다. 하지만 불완전 데이터는 실생활에서 매우 빈번히 발생 할 뿐만 아니라 분류 모델의 학습도가 낮다는 문제점을 지니고 있다. 불완전한 데이터를 다루는 것에 대한 많은 방법들이 제안되어 왔지만 대부분의 방법들이 훈련 단계에 집중하고 있다. 본 논문에서는 삼각 형태의 퍼지 함수를 이용하여 불완전 데이터의 분류 알고리즘을 제안한다. 제안한 기법에서는 불완전한 특징 벡터에서의 불완전 데이터를 추론하고 학습하였으며, 추론된 데이터의 가중치를 삼각 퍼지 함수 분류기에 적용하였다. 실험을 통하여 제안한 기법이 상대적으로 높은 인식률을 나타냄을 확인할 수 있었다.

  • PDF

Machine Learning Based State of Health Prediction Algorithm for Batteries Using Entropy Index (엔트로피 지수를 이용한 기계학습 기반의 배터리의 건강 상태 예측 알고리즘)

  • Sangjin, Kim;Hyun-Keun, Lim;Byunghoon, Chang;Sung-Min, Woo
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.531-536
    • /
    • 2022
  • In order to efficeintly manage a battery, it is important to accurately estimate and manage the SOH(State of Health) and RUL(Remaining Useful Life) of the batteries. Even if the batteries are of the same type, the characteristics such as facility capacity and voltage are different, and when the battery for the training model and the battery for prediction through the model are different, there is a limit to measuring the accuracy. In this paper, We proposed the entropy index using voltage distribution and discharge time is generalized, and four batteries are defined as a training set and a test set alternately one by one to predict the health status of batteries through linear regression analysis of machine learning. The proposed method showed a high accuracy of more than 95% using the MAPE(Mean Absolute Percentage Error).

Machine Learning-based Phase Picking Algorithm of P and S Waves for Distributed Acoustic Sensing Data (분포형 광섬유 센서 자료 적용을 위한 기계학습 기반 P, S파 위상 발췌 알고리즘 개발)

  • Yonggyu, Choi;Youngseok, Song;Soon Jee, Seol;Joongmoo, Byun
    • Geophysics and Geophysical Exploration
    • /
    • v.25 no.4
    • /
    • pp.177-188
    • /
    • 2022
  • Recently, the application of distributed acoustic sensors (DAS), which can replace geophones and seismometers, has significantly increased along with interest in micro-seismic monitoring technique, which is one of the CO2 storage monitoring techniques. A significant amount of temporally and spatially continuous data is recorded in a DAS monitoring system, thereby necessitating fast and accurate data processing techniques. Because event detection and seismic phase picking are the most basic data processing techniques, they should be performed on all data. In this study, a machine learning-based P, S wave phase picking algorithm was developed to compensate for the limitations of conventional phase picking algorithms, and it was modified using a transfer learning technique for the application of DAS data consisting of a single component with a low signal-to-noise ratio. Our model was constructed by modifying the convolution-based EQTransformer, which performs well in phase picking, to the ResUNet structure. Not only the global earthquake dataset, STEAD but also the augmented dataset was used as training datasets to enhance the prediction performance on the unseen characteristics of the target dataset. The performance of the developed algorithm was verified using K-net and KiK-net data with characteristics different from the training data. Additionally, after modifying the trained model to suit DAS data using the transfer learning technique, the performance was verified by applying it to the DAS field data measured in the Pohang Janggi basin.

A Model of Recursive Hierarchical Nested Triangle for Convergence from Lower-layer Sibling Practices (하위 훈련 성과 융합을 위한 순환적 계층 재귀 모델)

  • Moon, Hyo-Jung
    • Journal of Digital Contents Society
    • /
    • v.19 no.2
    • /
    • pp.415-423
    • /
    • 2018
  • In recent years, Computer-based learning, such as machine learning and deep learning in the computer field, is attracting attention. They start learning from the lowest level and propagate the result to the highest level to calculate the final result. Research literature has shown that systematic learning and growth can yield good results. However, systematic models based on systematic models are hard to find, compared to various and extensive research attempts. To this end, this paper proposes the first TNT(Transitive Nested Triangle)model, which is a growth and fusion model that can be used in various aspects. This model can be said to be a recursive model in which each function formed through geometric forms an organic hierarchical relationship, and the result is used again as they grow and converge to the top. That is, it is an analytical method called 'Horizontal Sibling Merges and Upward Convergence'. This model is applicable to various aspects. In this study, we focus on explaining the TNT model.

Distributed Processing System Design and Implementation for Feature Extraction from Large-Scale Malicious Code (대용량 악성코드의 특징 추출 가속화를 위한 분산 처리 시스템 설계 및 구현)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.2
    • /
    • pp.35-40
    • /
    • 2019
  • Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality.

Psalm Text Generator Comparison Between English and Korean Using LSTM Blocks in a Recurrent Neural Network (순환 신경망에서 LSTM 블록을 사용한 영어와 한국어의 시편 생성기 비교)

  • Snowberger, Aaron Daniel;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.269-271
    • /
    • 2022
  • In recent years, RNN networks with LSTM blocks have been used extensively in machine learning tasks that process sequential data. These networks have proven to be particularly good at sequential language processing tasks by being more able to accurately predict the next most likely word in a given sequence than traditional neural networks. This study trained an RNN / LSTM neural network on three different translations of 150 biblical Psalms - in both English and Korean. The resulting model is then fed an input word and a length number from which it automatically generates a new Psalm of the desired length based on the patterns it recognized while training. The results of training the network on both English text and Korean text are compared and discussed.

  • PDF

Dust Prediction System based on Incremental Deep Learning (증강형 딥러닝 기반 미세먼지 예측 시스템)

  • Sung-Bong Jang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.301-307
    • /
    • 2023
  • Deep learning requires building a deep neural network, collecting a large amount of training data, and then training the built neural network for a long time. If training does not proceed properly or overfitting occurs, training will fail. When using deep learning tools that have been developed so far, it takes a lot of time to collect training data and learn. However, due to the rapid advent of the mobile environment and the increase in sensor data, the demand for real-time deep learning technology that can dramatically reduce the time required for neural network learning is rapidly increasing. In this study, a real-time deep learning system was implemented using an Arduino system equipped with a fine dust sensor. In the implemented system, fine dust data is measured every 30 seconds, and when up to 120 are accumulated, learning is performed using the previously accumulated data and the newly accumulated data as a dataset. The neural network for learning was composed of one input layer, one hidden layer, and one output. To evaluate the performance of the implemented system, learning time and root mean square error (RMSE) were measured. As a result of the experiment, the average learning error was 0.04053796, and the average learning time of one epoch was about 3,447 seconds.

A Learning Agent for Automatic Bookmark Classification (북 마크 자동 분류를 위한 학습 에이전트)

  • Kim, In-Cheol;Cho, Soo-Sun
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.455-462
    • /
    • 2001
  • The World Wide Web has become one of the major services provided through Internet. When searching the vast web space, users use bookmarking facilities to record the sites of interests encountered during the course of navigation. One of the typical problems arising from bookmarking is that the list of bookmarks lose coherent organization when the the becomes too lengthy, thus ceasing to function as a practical finding aid. In order to maintain the bookmark file in an efficient, organized manner, the user has to classify all the bookmarks newly added to the file, and update the folders. This paper introduces our learning agent called BClassifier that automatically classifies bookmarks by analyzing the contents of the corresponding web documents. The chief source for the training examples are the bookmarks already classified into several bookmark folders according to their subject by the user. Additionally, the web pages found under top categories of Yahoo site are collected and included in the training examples for diversifying the subject categories to be represented, and the training examples for these categories as well. Our agent employs naive Bayesian learning method that is a well-tested, probability-based categorizing technique. In this paper, the outcome of some experimentation is also outlined and evaluated. A comparison of naive Bayesian learning method alongside other learning methods such as k-Nearest Neighbor and TFIDF is also presented.

  • PDF

A Study on Selective Sampling using SOM (SOM을 적용한 선택적 샘플링에 관한 연구)

  • Kim, Man-Sun;Yang, Hyung-Jeong;Kim, Jeong-Sik;Kim, Sun-Hee
    • Annual Conference of KIPS
    • /
    • 2007.11a
    • /
    • pp.38-41
    • /
    • 2007
  • 데이타 마이닝을 위하여 수집된 대용량의 데이타를 여과 없이 기계학습에 적용하는 것은 많은 시간과 비용이 요구될 뿐만 아니라 저장 공간면에서도 비효율적이다. 선별적 샘플링은 이러한 상황에서 매우 효율적으로 적용할 수 있도록 원본 데이타의 특성을 가능한 반영하여 새로운 훈련 데이타를 생성하는 방법이다. 본 연구에서는 신경망의 하나인 SOM을 적용한 선별적 샘플링을 수행하는데 있어서 여러 가지 선택 문제를 효과적으로 해결하기 위한 실험을 수행한다. 실험 결과로는 두 가지 결과를 얻었다. 1) 충분한 맵 사이즈를 선택해야 학습 데이타의 함축적인 특성을 잘 반영한다, 2) 선택적 샘플링을 위한 유닛선택 방법에서는 의미없는 유닛을 제거함으로서 분류 성능향상을 얻을 수 있다.

Comparison of Korean Classification Models' Korean Essay Score Range Prediction Performance (한국어 학습 모델별 한국어 쓰기 답안지 점수 구간 예측 성능 비교)

  • Cho, Heeryon;Im, Hyeonyeol;Yi, Yumi;Cha, Junwoo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.133-140
    • /
    • 2022
  • We investigate the performance of deep learning-based Korean language models on a task of predicting the score range of Korean essays written by foreign students. We construct a data set containing a total of 304 essays, which include essays discussing the criteria for choosing a job ('job'), conditions of a happy life ('happ'), relationship between money and happiness ('econ'), and definition of success ('succ'). These essays were labeled according to four letter grades (A, B, C, and D), and a total of eleven essay score range prediction experiments were conducted (i.e., five for predicting the score range of 'job' essays, five for predicting the score range of 'happiness' essays, and one for predicting the score range of mixed topic essays). Three deep learning-based Korean language models, KoBERT, KcBERT, and KR-BERT, were fine-tuned using various training data. Moreover, two traditional probabilistic machine learning classifiers, naive Bayes and logistic regression, were also evaluated. Experiment results show that deep learning-based Korean language models performed better than the two traditional classifiers, with KR-BERT performing the best with 55.83% overall average prediction accuracy. A close second was KcBERT (55.77%) followed by KoBERT (54.91%). The performances of naive Bayes and logistic regression classifiers were 52.52% and 50.28% respectively. Due to the scarcity of training data and the imbalance in class distribution, the overall prediction performance was not high for all classifiers. Moreover, the classifiers' vocabulary did not explicitly capture the error features that were helpful in correctly grading the Korean essay. By overcoming these two limitations, we expect the score range prediction performance to improve.