• Title/Summary/Keyword: 비지도 학습.

Search Result 225, Processing Time 0.024 seconds

A Spam Filter System Based on Maximum Entropy Model Using Co-training with Spamminess Features and URL Features (스팸성 자질과 URL 자질의 공동 학습을 이용한 최대 엔트로피 기반 스팸메일 필터 시스템)

  • Gong, Mi-Gyoung;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.1
    • /
    • pp.61-68
    • /
    • 2008
  • This paper presents a spam filter system using co-training with spamminess features and URL features based on the maximum entropy model. Spamminess features are the emphasizing patterns or abnormal patterns in spam messages used by spammers to express their intention and to avoid being filtered by the spam filter system. Since spammers use URLs to give the details and make a change to the URL format not to be filtered by the black list, normal and abnormal URLs can be key features to detect the spam messages. Co-training with spamminess features and URL features uses two different features which are independent each other in training. The filter system can learn information from them independently. Experiment results on TREC spam test collection shows that the proposed approach achieves 9.1% improvement and 6.9% improvement in accuracy compared to the base system and bogo filter system, respectively. The result analysis shows that the proposed spamminess features and URL features are helpful. And an experiment result of the co-training shows that two feature sets are useful since the number of training documents are reduced while the accuracy is closed to the batch learning.

A Reconstruction of Classification for Iris Species Using Euclidean Distance Based on a Machine Learning (머신러닝 기반 유클리드 거리를 이용한 붓꽃 품종 분류 재구성)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.225-230
    • /
    • 2020
  • Machine learning is an algorithm which learns a computer based on the data so that the computer can identify the trend of the data and predict the output of new input data. Machine learning can be classified into supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is a way of learning a machine with given label of data. In other words, a method of inferring a function of the system through a pair of data and a label is used to predict a result using a function inferred about new input data. If the predicted value is continuous, regression analysis is used. If the predicted value is discrete, it is used as a classification. A result of analysis, no. 8 (5, 3.4, setosa), 27 (5, 3.4, setosa), 41 (5, 3.5, setosa), 44 (5, 3.5, setosa) and 40 (5.1, 3.4, setosa) in Table 3 were classified as the most similar Iris flower. Therefore, theoretical practical are suggested.

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

3D Human Shape Deformation using Deep Learning (딥러닝을 이용한 3차원 사람모델형상 변형)

  • Kim, DaeHee;Hwang, Bon-Woo;Lee, SeungWook;Kwak, Sooyeong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.2
    • /
    • pp.19-27
    • /
    • 2020
  • Recently, rapid and accurate 3D models creation is required in various applications using virtual reality and augmented reality technology. In this paper, we propose an on-site learning based shape deformation method which transforms the clothed 3D human model into the shape of an input point cloud. The proposed algorithm consists of two main parts: one is pre-learning and the other is on-site learning. Each learning consists of encoder, template transformation and decoder network. The proposed network is learned by unsupervised method, which uses the Chamfer distance between the input point cloud form and the template vertices as the loss function. By performing on-site learning on the input point clouds during the inference process, the high accuracy of the inference results can be obtained and presented through experiments.

Study on hole-filling technique of motion capture images using GANs (Generative Adversarial Networks) (GANs(Generative Adversarial Networks)를 활용한 모션캡처 이미지의 hole-filling 기법 연구)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.160-161
    • /
    • 2019
  • As a method for modeling a three-dimensional object, there are a method using a 3D scanner, a method using a motion capture system, and a method using a Kinect system. Through this method, a portion that is not captured due to occlusion occurs in the process of creating a three-dimensional object. In order to implement a perfect three-dimensional object, it is necessary to arbitrarily fill the obscured part. There is a technique to fill the unexposed part by various image processing methods. In this study, we propose a method using GANs, which is the latest trend of unsupervised machine learning, as a method for more natural hole-filling.

  • PDF

Artificial Intelligence for Clinical Research in Voice Disease (후두음성 질환에 대한 인공지능 연구)

  • Jungirl, Seok;Tack-Kyun, Kwon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

Multi-view learning review: understanding methods and their application (멀티 뷰 기법 리뷰: 이해와 응용)

  • Bae, Kang Il;Lee, Yung Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.41-68
    • /
    • 2019
  • Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

A review of gene selection methods based on machine learning approaches (기계학습 접근법에 기반한 유전자 선택 방법들에 대한 리뷰)

  • Lee, Hajoung;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.667-684
    • /
    • 2022
  • Gene expression data present the level of mRNA abundance of each gene, and analyses of gene expressions have provided key ideas for understanding the mechanism of diseases and developing new drugs and therapies. Nowadays high-throughput technologies such as DNA microarray and RNA-sequencing enabled the simultaneous measurement of thousands of gene expressions, giving rise to a characteristic of gene expression data known as high dimensionality. Due to the high-dimensionality, learning models to analyze gene expression data are prone to overfitting problems, and to solve this issue, dimension reduction or feature selection techniques are commonly used as a preprocessing step. In particular, we can remove irrelevant and redundant genes and identify important genes using gene selection methods in the preprocessing step. Various gene selection methods have been developed in the context of machine learning so far. In this paper, we intensively review recent works on gene selection methods using machine learning approaches. In addition, the underlying difficulties with current gene selection methods as well as future research directions are discussed.

Development of Security Anomaly Detection Algorithms using Machine Learning (기계 학습을 활용한 보안 이상징후 식별 알고리즘 개발)

  • Hwangbo, Hyunwoo;Kim, Jae Kyung
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.1-13
    • /
    • 2022
  • With the development of network technologies, the security to protect organizational resources from internal and external intrusions and threats becomes more important. Therefore in recent years, the anomaly detection algorithm that detects and prevents security threats with respect to various security log events has been actively studied. Security anomaly detection algorithms that have been developed based on rule-based or statistical learning in the past are gradually evolving into modeling based on machine learning and deep learning. In this study, we propose a deep-autoencoder model that transforms LSTM-autoencoder as an optimal algorithm to detect insider threats in advance using various machine learning analysis methodologies. This study has academic significance in that it improved the possibility of adaptive security through the development of an anomaly detection algorithm based on unsupervised learning, and reduced the false positive rate compared to the existing algorithm through supervised true positive labeling.