• Title/Summary/Keyword: 비지도 학습.

Search Result 225, Processing Time 0.03 seconds

Food Recipe Clustering Model from the User's Perspective (사용자 관점에서의 음식 레시피 분류 모델에 관한 연구)

  • Lee, Woo-Hang;Choi, Soo-Yeun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1441-1446
    • /
    • 2022
  • Modern people can access various information about food recipes very easily on the Internet or social media. As the supply of food recipes increases, it is difficult to find a suitable recipe for each user in the overflowing information. As such, the need to provide information by reflecting users' requirements has increased, and research related to food recipes and cooking recommendations is becoming active. In addition, the Internet, video, and application markets using this are also rapidly activating. In this study, in order to classify recipes from the user's perspective of food recipe users, the user's review data was applied with the k-mean clustering technique, which is unsupervised learning, and a "food recipe classification model" was derived. As a result, it was classified into a total of 25 clusters including information needed by many users, such as specific purposes and cooking stages.

Design and Implentation of Body Fat Percentage Analysis Model using K-means and CNN (K-means와 CNN을 활용한 체지방율 분석 모델 설계 및 구현)

  • Lee, Taejun;Park, Chanmyeong;Kim, Changsu;Jung, Heokyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.329-331
    • /
    • 2021
  • Recently, as various cases of using deep learning in the health-care field are increasing, functions such as electrocardiogram examination and body composition analysis through wearable device can be provided to provide rational decision-making and a process tailored to the individual. In order to utilize deep learning, it it most important to secure refined data, and this data is being made through human intervention or unsupervised learning. In this paper, we propose a model that conducts unsupervised learning by clusters according to gender and age using human body data such as chest and waist circumferences, which are easy to measure, and classifies them with CNN. For data, the 7th human body data provided by Korean Agency for Technology and Standards was used. Through this, it it thought that it can be applied to various application cases such as personalized body shape management service and obesity analysis.

  • PDF

Autoencoder-Based Anomaly Detection Method for IoT Device Traffics (오토인코더 기반 IoT 디바이스 트래픽 이상징후 탐지 방법 연구)

  • Seung-A Park;Yejin Jang;Da Seul Kim;Mee Lan Han
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.2
    • /
    • pp.281-288
    • /
    • 2024
  • The sixth generation(6G) wireless communication technology is advancing toward ultra-high speed, ultra-high bandwidth, and hyper-connectivity. With the development of communication technologies, the formation of a hyper-connected society is rapidly accelerating, expanding from the IoT(Internet of Things) to the IoE(Internet of Everything). However, at the same time, security threats targeting IoT devices have become widespread, and there are concerns about security incidents such as unauthorized access and information leakage. As a result, the need for security-enhancing solutions is increasing. In this paper, we implement an autoencoder-based anomaly detection model utilizing real-time collected network traffics in respond to IoT security threats. Considering the difficulty of capturing IoT device traffic data for each attack in real IoT environments, we use an unsupervised learning-based autoencoder and implement 6 different autoencoder models based on the use of noise in the training data and the dimensions of the latent space. By comparing the model performance through experiments, we provide a performance evaluation of the anomaly detection model for detecting abnormal network traffic.

Applying of SOM for Automatic Recognition of Tension and Relaxation (긴장과 이완상태의 자동인식을 위한 SOM의 적용)

  • Jeong, Chan-Soon;Ham, Jun-Seok;Ko, Il-Ju;Jang, Dae-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.2
    • /
    • pp.65-74
    • /
    • 2010
  • We propose a system that automatically recognizes the tense or relaxed condition of scrolling-shooting game subject that plays. Existing study compares the changed values of source of stimulation to the player by suggesting the source, and thus involves limitation in automatic classification. This study applies SOM of unsupervised learning for automatic classification and recognition of player's condition change. Application of SOM for automatic recognition of tense and relaxed condition is composed of two steps. First, ECG measurement and analysis, is to extract characteristic vector through HRV analysis by measuring ECG after having the player play the game. Secondly, SOM learning and recognition, is to classify and recognize the tense and relaxed conditions of player through SOM learning of the input vectors of heart beat signals that the characteristic extracted. Experiment results are divided into three groups. The first is HRV frequency change and the second the SOM learning results of heart beat signal. The third is the analysis of match rate to identify SOM learning performance. As a result of matching the LF/HF ratio of HRV frequency analysis to the distance of winner neuron of SOM based on 1.5, a match rate of 72% performance in average was shown.

KB-BERT: Training and Application of Korean Pre-trained Language Model in Financial Domain (KB-BERT: 금융 특화 한국어 사전학습 언어모델과 그 응용)

  • Kim, Donggyu;Lee, Dongwook;Park, Jangwon;Oh, Sungwoo;Kwon, Sungjun;Lee, Inyong;Choi, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.191-206
    • /
    • 2022
  • Recently, it is a de-facto approach to utilize a pre-trained language model(PLM) to achieve the state-of-the-art performance for various natural language tasks(called downstream tasks) such as sentiment analysis and question answering. However, similar to any other machine learning method, PLM tends to depend on the data distribution seen during the training phase and shows worse performance on the unseen (Out-of-Distribution) domain. Due to the aforementioned reason, there have been many efforts to develop domain-specified PLM for various fields such as medical and legal industries. In this paper, we discuss the training of a finance domain-specified PLM for the Korean language and its applications. Our finance domain-specified PLM, KB-BERT, is trained on a carefully curated financial corpus that includes domain-specific documents such as financial reports. We provide extensive performance evaluation results on three natural language tasks, topic classification, sentiment analysis, and question answering. Compared to the state-of-the-art Korean PLM models such as KoELECTRA and KLUE-RoBERTa, KB-BERT shows comparable performance on general datasets based on common corpora like Wikipedia and news articles. Moreover, KB-BERT outperforms compared models on finance domain datasets that require finance-specific knowledge to solve given problems.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Multi Cycle Consistent Adversarial Networks for Multi Attribute Image to Image Translation

  • Jo, Seok Hee;Cho, Kyu Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.9
    • /
    • pp.63-69
    • /
    • 2020
  • Image-image translation is a technology that creates a target image through input images, and has recently shown high performance in creating a more realistic image by utilizing GAN, which is a non-map learning structure. Therefore, there are various studies on image-to-image translation using GAN. At this point, most image-to-image translations basically target one attribute translation. But the data used and obtainable in real life consist of a variety of features that are hard to explain with one feature. Therefore, if you aim to change multiple attributes that can divide the image creation process by attributes to take advantage of the various attributes, you will be able to play a better role in image-to-image translation. In this paper, we propose Multi CycleGAN, a dual attribute transformation structure, by utilizing CycleGAN, which showed high performance among image-image translation structures using GAN. This structure implements a dual transformation structure in which three domains conduct two-way learning to learn about the two properties of an input domain. Experiments have shown that images through the new structure maintain the properties of the input area and show high performance with the target properties applied. Using this structure, it is possible to create more diverse images in the future, so we can expect to utilize image generation in more diverse areas.

A Study on Automatic Classification Technique of Malware Packing Type (악성코드 패킹유형 자동분류 기술 연구)

  • Kim, Su-jeong;Ha, Ji-hee;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.5
    • /
    • pp.1119-1127
    • /
    • 2018
  • Most of the cyber attacks are caused by malicious codes. The damage caused by cyber attacks are gradually expanded to IoT and CPS, which is not limited to cyberspace but a serious threat to real life. Accordingly, various malicious code analysis techniques have been appeared. Dynamic analysis have been widely used to easily identify the resulting malicious behavior, but are struggling with an increase in Anti-VM malware that is not working in VM environment detection. On the other hand, static analysis has difficulties in analysis due to various packing techniques. In this paper, we proposed malware classification techniques regardless of known packers or unknown packers through the proposed model. To do this, we designed a model of supervised learning and unsupervised learning for the features that can be used in the PE structure, and conducted the results verification through 98,000 samples. It is expected that accurate analysis will be possible through customized analysis technology for each class.

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.

Analysis of Characteristics of Clusters of Middle School Students Using K-Means Cluster Analysis (K-평균 군집분석을 활용한 중학생의 군집화 및 특성 분석)

  • Jaebong, Lee
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.6
    • /
    • pp.611-619
    • /
    • 2022
  • The purpose of this study is to explore the possibility of applying big data analysis to provide appropriate feedback to students using evaluation data in science education at a time when interest in educational data mining has recently increased in education. In this study, we use the evaluation data of 2,576 students who took 24 questions of the national assessment of educational achievement. And we use K-means cluster analysis as a method of unsupervised machine learning for clustering. As a result of clustering, students were divided into six clusters. The middle-ranking students are divided into various clusters when compared to upper or lower ranks. According to the results of the cluster analysis, the most important factor influencing clusterization is academic achievement, and each cluster shows different characteristics in terms of content domains, subject competencies, and affective characteristics. Learning motivation is important among the affective domains in the lower-ranking achievement cluster, and scientific inquiry and problem-solving competency, as well as scientific communication competency have a major influence in terms of subject competencies. In the content domain, achievement of motion and energy and matter are important factors to distinguish the characteristics of the cluster. As a result, we can provide students with customized feedback for learning based on the characteristics of each cluster. We discuss implications of these results for science education, such as the possibility of using this study results, balanced learning by content domains, enhancement of subject competency, and improvement of scientific attitude.