• 제목/요약/키워드: Unsupervised machine learning

검색결과 135건 처리시간 0.029초

Advanced insider threat detection model to apply periodic work atmosphere

  • Oh, Junhyoung;Kim, Tae Ho;Lee, Kyung Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권3호
    • /
    • pp.1722-1737
    • /
    • 2019
  • We developed an insider threat detection model to be used by organizations that repeat tasks at regular intervals. The model identifies the best combination of different feature selection algorithms, unsupervised learning algorithms, and standard scores. We derive a model specifically optimized for the organization by evaluating each combination in terms of accuracy, AUC (Area Under the Curve), and TPR (True Positive Rate). In order to validate this model, a four-year log was applied to the system handling sensitive information from public institutions. In the research target system, the user log was analyzed monthly based on the fact that the business process is processed at a cycle of one year, and the roles are determined for each person in charge. In order to classify the behavior of a user as abnormal, the standard scores of each organization were calculated and classified as abnormal when they exceeded certain thresholds. Using this method, we proposed an optimized model for the organization and verified it.

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF

  • Alyamani, Hasan J.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권1호
    • /
    • pp.283-287
    • /
    • 2022
  • Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.

머신러닝 기법을 활용한 LDPE 공정의 이상 감지 (Fault Detection in LDPE Process using Machine Learning Techniques)

  • 이창송;이규황;이호경
    • Korean Chemical Engineering Research
    • /
    • 제58권2호
    • /
    • pp.224-229
    • /
    • 2020
  • 머신러닝 기법을 활용하여 LDPE (Low Density Polyethylene) 공정의 이상을 사전 감지하고, 설비의 수명을 예측할 수 있는 기술을 소개한다. 안전성과 생산성 극대화를 위해, 화학 공정의 예상치 못한 이상을 사전에 감지하고 예방하는 것은 매우 중요하다. LDPE 공정은 3,000 kg/㎠g 이상까지 승압되는 고압 공정이기 때문에, ESD (Emergency Shutdown)가 발생하면 예상치 못한 부동이 발생하고, 그에 따른 보수 기간 증가로 인한 생산성 손실이 발생한다. 고압 공정의 주요 변수들의 운전 데이터를 수집하고, 비지도학습 머신러닝 기술을 활용하여, ESD의 사전 감지 모형을 개발하였다. 4회의 ESD를 2.4일 전에 감지하는 결과를 얻을 수 있었다. 더불어, 물리적으로 의미 있는 핵심 변수들을 활용하면, 고압 설비의 수명을 예측할 수 있음을 확인할 수 있었다.

Machine Learning Techniques for Speech Recognition using the Magnitude

  • Krishnan, C. Gopala;Robinson, Y. Harold;Chilamkurti, Naveen
    • Journal of Multimedia Information System
    • /
    • 제7권1호
    • /
    • pp.33-40
    • /
    • 2020
  • Machine learning consists of supervised and unsupervised learning among which supervised learning is used for the speech recognition objectives. Supervised learning is the Data mining task of inferring a function from labeled training data. Speech recognition is the current trend that has gained focus over the decades. Most automation technologies use speech and speech recognition for various perspectives. This paper demonstrates an overview of major technological standpoint and gratitude of the elementary development of speech recognition and provides impression method has been developed in every stage of speech recognition using supervised learning. The project will use DNN to recognize speeches using magnitudes with large datasets.

머신러닝 기반의 최적 양식장 조건 검색에 관한 연구 (A Study on the Search of Optimal Aquaculture farm condition based on Machine Learning)

  • 강민수;정용규;장두환
    • 한국인터넷방송통신학회논문지
    • /
    • 제17권2호
    • /
    • pp.135-140
    • /
    • 2017
  • 세계 수산시장은 초과 수요적 현상으로 이러한 경향은 지속적으로 가속화 될 것으로 전망하고 있다. 수산물 수요가 증가되는 양식업은 어업과 비교해 볼 때 비교적 적은 자원의 투입으로도 생산량의 조절 및 표준화 등이 가능하여 높은 성과를 얻을 수 있는 산업이다. 그러나 전통적인 양식은 자연재해, 생태계 오염 등 저생산성의 문제점을 안고 있어 최적의 양식장소로 이동할 수 있는 새로운 양식시스템의 개발이 필요하다. 최적의 장소를 찾기 위해서는 온도, 산소 용존량 등 필요한 데이터를 실시간으로 수집하고 분석해야 한다. 데이터 분석은 머신러닝 기반의 K-means 클러스터링 기법을 적용하여 반복된 자기학습으로 언제, 어디로 양식장을 이동할지 스스로 판단할 수 있도록 하였다. 제시한 연구결과가 어류 양식업 종사자에게 적용된다면 최적의 양식장소를 스스로 찾아감으로써 자연재해, 생태계 오염 등 저생산성의 문제점을 해결 할 수 있을 것이다.

청소년 건강행태에 따른 정신건강 위험 예측: 하이브리드 머신러닝 방법의 적용 (Predicting Mental Health Risk based on Adolescent Health Behavior: Application of a Hybrid Machine Learning Method)

  • 고은경;전효정;박현태;옥수열
    • 한국학교보건학회지
    • /
    • 제36권3호
    • /
    • pp.113-125
    • /
    • 2023
  • Purpose: The purpose of this study is to develop a model for predicting mental health risk among adolescents based on health behavior information by employing a hybrid machine learning method. Methods: The study analyzed data of 51,850 domestic middle and high school students from 2022 Youth Health Behavior Survey conducted by the Korea Disease Control and Prevention Agency. Firstly, mental health risk levels (stress perception, suicidal thoughts, suicide attempts, suicide plans, experiences of sadness and despair, loneliness, and generalized anxiety disorder) were classified using the k-mean unsupervised learning technique. Secondly, demographic factors (family economic status, gender, age), academic performance, physical health (body mass index, moderate-intensity exercise, subjective health perception, oral health perception), daily life habits (sleep time, wake-up time, smartphone use time, difficulty recovering from fatigue), eating habits (consumption of high-caffeine drinks, sweet drinks, late-night snacks), violence victimization, and deviance (drinking, smoking experience) data were input to develop a random forest model predicting mental health risk, using logistic and XGBoosting. The model and its prediction performance were compared. Results: First, the subjects were classified into two mental health groups using k-mean unsupervised learning, with the high mental health risk group constituting 26.45% of the total sample (13,712 adolescents). This mental health risk group included most of the adolescents who had made suicide plans (95.1%) or attempted suicide (96.7%). Second, the predictive performance of the random forest model for classifying mental health risk groups significantly outperformed that of the reference model (AUC=.94). Predictors of high importance were 'difficulty recovering from daytime fatigue' and 'subjective health perception'. Conclusion: Based on an understanding of adolescent health behavior information, it is possible to predict the mental health risk levels of adolescents and make interventions in advance.

기계학습의 문제점 및 해결방안 (Problems and Solutions for Machine Learning)

  • 임환희;김세준;이병준;김경태;윤희용
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2018년도 제58차 하계학술대회논문집 26권2호
    • /
    • pp.33-34
    • /
    • 2018
  • 기계학습이란 인공지능의 한 분야이다. 컴퓨터에 명시적인 프로그램 없이 배울 수 있는 능력을 부여하는 연구 분야이며, 사람이 학습하듯이 컴퓨터에도 데이터들을 줘서 학습하게 함으로써 새로운 지식을 얻어내게 하는 분야이다. 기계학습 종류에는 크게 Supervised Learning, Unsupervised Learning, Reinforcement Learning이 있다. 본 논문에서는 기계학습 종류 및 컴퓨터가 데이터들을 학습하면서 생기는 문제점을 알아보고, 문제점의 종류 및 해결방안을 제시한다.

  • PDF

지능형 디지탈 보호계전 알고리즘 연구 (Study of an algorithm for intelligent digital protective relaying)

  • 신현익;이성환;강신준;김정한;김상철
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1996년도 한국자동제어학술회의논문집(국내학술편); 포항공과대학교, 포항; 24-26 Oct. 1996
    • /
    • pp.343-346
    • /
    • 1996
  • A new method for on-line induction motor fault detection is presented in this paper. This system utilizes unsupervised-learning clustering algorithm, the Dignet, proposed by Thomopoulos etc., to learn the spectral characteristics of a good motor operating on-line. After a sufficient training period, the Dignet signals one-phase ground fault, or a potential failure condition when a new cluster is formed and persists for some time. Since a fault condition is found by comparison to a prior condition of the machine, on-line failure prediction is possible with this system without requiring information on the motor of load characteristics.

  • PDF

A Low Complexity PTS Technique using Threshold for PAPR Reduction in OFDM Systems

  • Lim, Dai Hwan;Rhee, Byung Ho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권9호
    • /
    • pp.2191-2201
    • /
    • 2012
  • Traffic classification seeks to assign packet flows to an appropriate quality of service (QoS) class based on flow statistics without the need to examine packet payloads. Classification proceeds in two steps. Classification rules are first built by analyzing traffic traces, and then the classification rules are evaluated using test data. In this paper, we use self-organizing map and K-means clustering as unsupervised machine learning methods to identify the inherent classes in traffic traces. Three clusters were discovered, corresponding to transactional, bulk data transfer, and interactive applications. The K-nearest neighbor classifier was found to be highly accurate for the traffic data and significantly better compared to a minimum mean distance classifier.

Improvement of Self Organizing Maps using Gap Statistic and Probability Distribution

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권2호
    • /
    • pp.116-120
    • /
    • 2008
  • Clustering is a method for unsupervised learning. General clustering tools have been depended on statistical methods and machine learning algorithms. One of the popular clustering algorithms based on machine learning is the self organizing map(SOM). SOM is a neural networks model for clustering. SOM and extended SOM have been used in diverse classification and clustering fields such as data mining. But, SOM has had a problem determining optimal number of clusters. In this paper, we propose an improvement of SOM using gap statistic and probability distribution. The gap statistic was introduced to estimate the number of clusters in a dataset. We use gap statistic for settling the problem of SOM. Also, in our research, weights of feature nodes are updated by probability distribution. After complete updating according to prior and posterior distributions, the weights of SOM have probability distributions for optima clustering. To verify improved performance of our work, we make experiments compared with other learning algorithms using simulation data sets.