• 제목/요약/키워드: Secure Machine Learning

검색결과 75건 처리시간 0.03초

소스코드 취약성 분류를 위한 기계학습 기법의 적용 (Application of Machine Learning Techniques for the Classification of Source Code Vulnerability)

  • 이원경;이민주;서동수
    • 정보보호학회논문지
    • /
    • 제30권4호
    • /
    • pp.735-743
    • /
    • 2020
  • 시큐어코딩은 악의적인 공격 혹은 예상치 못한 오류에 대한 강인함을 제공해줄 수 있는 안전한 코딩 기법으로 정적분석도구의 지원을 통해 취약한 패턴을 찾아내거나 오염 데이터의 유입 가능성을 발견한다. 시큐어코딩은 정적기법을 적극적으로 활용하는 만큼 룰셋에 의존적이라는 단점을 가지며, 정적분석 도구의 복잡성이 높아지는 만큼 정확한 진단이 어렵다는 문제점을 안고 있다. 본 논문은 시큐어코딩을 지원하는 목적으로 기계학습 기법 중 DNN과 CNN, RNN 신경망을 이용하여 개발보안가이드 상의 주요 보안약점에 해당하는 패턴을 학습시키고 분류하는 모델을 개발하며 학습 결과를 분석한다. 이를 통해 기계학습 기법이 정적분석과 더불어 보안약점 탐지에 도움을 줄 수 있을 것으로 기대한다.

Comparative Analysis of Intrusion Detection Attack Based on Machine Learning Classifiers

  • Surafel Mehari;Anuja Kumar Acharya
    • International Journal of Computer Science & Network Security
    • /
    • 제24권10호
    • /
    • pp.115-124
    • /
    • 2024
  • In current day information transmitted from one place to another by using network communication technology. Due to such transmission of information, networking system required a high security environment. The main strategy to secure this environment is to correctly identify the packet and detect if the packet contain a malicious and any illegal activity happened in network environments. To accomplish this we use intrusion detection system (IDS). Intrusion detection is a security technology that design detects and automatically alert or notify to a responsible person. However, creating an efficient Intrusion Detection System face a number of challenges. These challenges are false detection and the data contain high number of features. Currently many researchers use machine learning techniques to overcome the limitation of intrusion detection and increase the efficiency of intrusion detection for correctly identify the packet either the packet is normal or malicious. Many machine-learning techniques use in intrusion detection. However, the question is which machine learning classifiers has been potentially to address intrusion detection issue in network security environment. Choosing the appropriate machine learning techniques required to improve the accuracy of intrusion detection system. In this work, three machine learning classifier are analyzed. Support vector Machine, Naïve Bayes Classifier and K-Nearest Neighbor classifiers. These algorithms tested using NSL KDD dataset by using the combination of Chi square and Extra Tree feature selection method and Python used to implement, analyze and evaluate the classifiers. Experimental result show that K-Nearest Neighbor classifiers outperform the method in categorizing the packet either is normal or malicious.

Centralized Machine Learning Versus Federated Averaging: A Comparison using MNIST Dataset

  • Peng, Sony;Yang, Yixuan;Mao, Makara;Park, Doo-Soon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권2호
    • /
    • pp.742-756
    • /
    • 2022
  • A flood of information has occurred with the rise of the internet and digital devices in the fourth industrial revolution era. Every millisecond, massive amounts of structured and unstructured data are generated; smartphones, wearable devices, sensors, and self-driving cars are just a few examples of devices that currently generate massive amounts of data in our daily. Machine learning has been considered an approach to support and recognize patterns in data in many areas to provide a convenient way to other sectors, including the healthcare sector, government sector, banks, military sector, and more. However, the conventional machine learning model requires the data owner to upload their information to train the model in one central location to perform the model training. This classical model has caused data owners to worry about the risks of transferring private information because traditional machine learning is required to push their data to the cloud to process the model training. Furthermore, the training of machine learning and deep learning models requires massive computing resources. Thus, many researchers have jumped to a new model known as "Federated Learning". Federated learning is emerging to train Artificial Intelligence models over distributed clients, and it provides secure privacy information to the data owner. Hence, this paper implements Federated Averaging with a Deep Neural Network to classify the handwriting image and protect the sensitive data. Moreover, we compare the centralized machine learning model with federated averaging. The result shows the centralized machine learning model outperforms federated learning in terms of accuracy, but this classical model produces another risk, like privacy concern, due to the data being stored in the data center. The MNIST dataset was used in this experiment.

An Integrated Accurate-Secure Heart Disease Prediction (IAS) Model using Cryptographic and Machine Learning Methods

  • Syed Anwar Hussainy F;Senthil Kumar Thillaigovindan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권2호
    • /
    • pp.504-519
    • /
    • 2023
  • Heart disease is becoming the top reason of death all around the world. Diagnosing cardiac illness is a difficult endeavor that necessitates both expertise and extensive knowledge. Machine learning (ML) is becoming gradually more important in the medical field. Most of the works have concentrated on the prediction of cardiac disease, however the precision of the results is minimal, and data integrity is uncertain. To solve these difficulties, this research creates an Integrated Accurate-Secure Heart Disease Prediction (IAS) Model based on Deep Convolutional Neural Networks. Heart-related medical data is collected and pre-processed. Secondly, feature extraction is processed with two factors, from signals and acquired data, which are further trained for classification. The Deep Convolutional Neural Networks (DCNN) is used to categorize received sensor data as normal or abnormal. Furthermore, the results are safeguarded by implementing an integrity validation mechanism based on the hash algorithm. The system's performance is evaluated by comparing the proposed to existing models. The results explain that the proposed model-based cardiac disease diagnosis model surpasses previous techniques. The proposed method demonstrates that it attains accuracy of 98.5 % for the maximum amount of records, which is higher than available classifiers.

Application of the machine learning technique for the development of a condensation heat transfer model for a passive containment cooling system

  • Lee, Dong Hyun;Yoo, Jee Min;Kim, Hui Yung;Hong, Dong Jin;Yun, Byong Jo;Jeong, Jae Jun
    • Nuclear Engineering and Technology
    • /
    • 제54권6호
    • /
    • pp.2297-2310
    • /
    • 2022
  • A condensation heat transfer model is essential to accurately predict the performance of the passive containment cooling system (PCCS) during an accident in an advanced light water reactor. However, most of existing models tend to predict condensation heat transfer very well for a specific range of thermal-hydraulic conditions. In this study, a new correlation for condensation heat transfer coefficient (HTC) is presented using machine learning technique. To secure sufficient training data, a large number of pseudo data were produced by using ten existing condensation models. Then, a neural network model was developed, consisting of a fully connected layer and a convolutional neural network (CNN) algorithm, DenseNet. Based on the hold-out cross-validation, the neural network was trained and validated against the pseudo data. Thereafter, it was evaluated using the experimental data, which were not used for training. The machine learning model predicted better results than the existing models. It was also confirmed through a parametric study that the machine learning model presents continuous and physical HTCs for various thermal-hydraulic conditions. By reflecting the effects of individual variables obtained from the parametric analysis, a new correlation was proposed. It yielded better results for almost all experimental conditions than the ten existing models.

AttentionMesh를 활용한 국가과학기술표준분류체계 소분류 키워드 자동추천에 관한 연구 (A Study on Automatic Recommendation of Keywords for Sub-Classification of National Science and Technology Standard Classification System Using AttentionMesh)

  • 박진호;송민선
    • 한국도서관정보학회지
    • /
    • 제53권2호
    • /
    • pp.95-115
    • /
    • 2022
  • 이 연구의 목적은 국가과학기술표준분류체계의 소분류 용어를 기계학습 알고리즘을 적용하여 기술키워드 변환하는 것이 목적이다. 이를 위해 본 연구에서는 주제어 추천에 적합한 학습 알고리즘으로 AttentionMeSH를 활용했다. 원천데이터는 한국과학기술기획평가원이 정제한 2017년부터 2020년까지 4개년 연구현황 파일을 사용하였다. 학습은 과제명, 연구목표, 연구내용, 기대효과와 같이 연구내용을 잘 표현하고 있는 4개 속성을 사용했다. 그 결과 임계치(threshold)가 0.5일 때 MiF 0.6377이라는 결과가 도출됨을 확인하였다. 향후 실제 업무에 기계학습을 활용하고, 기술키워드 확보를 위해서는 용어관리체계 구축과 다양한 속성들의 데이터 확보가 필요할 것으로 보인다.

데이터 증강을 통한 기계학습 능력 개선 방법 연구 (Study on the Improvement of Machine Learning Ability through Data Augmentation)

  • 김태우;신광성
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.346-347
    • /
    • 2021
  • 기계학습을 위한 패턴인식을 위해서는 학습데이터의 양이 많을수록 그 성능이 향상된다. 하지만 일상에서 검출해내야하는 패턴의 종류 및 정보가 항상 많은 양의 학습데이터를 확보할 수는 없다. 따라서 일반적인 기계학습을 위해 적은데이터셋을 의미있게 부풀릴 필요가 있다. 본 연구에서는 기계학습을 수행할 수 있도록 데이터를 증강시키는 기법에 관해 연구한다. 적은데이터셋을 이용하여 기계학습을 수행하는 대표적인 방법이 전이학습(transfer learning) 기법이다. 전이학습은 범용데이터셋으로 기본적인 학습을 수행한 후 목표데이터셋을 최종 단계에 대입함으로써 결과를 얻어내는 방법이다. 본 연구에서는 ImageNet과 같은 범용데이터셋으로 학습시킨 학습모델을 증강된 데이터를 이용하여 특징추출셋으로 사용하여 원하는 패턴에 대한 검출을 수행한다.

  • PDF

Anomaly-Based Network Intrusion Detection: An Approach Using Ensemble-Based Machine Learning Algorithm

  • Kashif Gul Chachar;Syed Nadeem Ahsan
    • International Journal of Computer Science & Network Security
    • /
    • 제24권1호
    • /
    • pp.107-118
    • /
    • 2024
  • With the seamless growth of the technology, network usage requirements are expanding day by day. The majority of electronic devices are capable of communication, which strongly requires a secure and reliable network. Network-based intrusion detection systems (NIDS) is a new method for preventing and alerting computers and networks from attacks. Machine Learning is an emerging field that provides a variety of ways to implement effective network intrusion detection systems (NIDS). Bagging and Boosting are two ensemble ML techniques, renowned for better performance in the learning and classification process. In this paper, the study provides a detailed literature review of the past work done and proposed a novel ensemble approach to develop a NIDS system based on the voting method using bagging and boosting ensemble techniques. The test results demonstrate that the ensemble of bagging and boosting through voting exhibits the highest classification accuracy of 99.98% and a minimum false positive rate (FPR) on both datasets. Although the model building time is average which can be a tradeoff by processor speed.

비정형데이터의 AI학습을 위한 영상/이미지 데이터 품질 향상 방법 (Method for improving video/image data quality for AI learning of unstructured data)

  • 김승희;류동주
    • 융합보안논문지
    • /
    • 제23권2호
    • /
    • pp.55-66
    • /
    • 2023
  • 최근 전세계적으로 사회 모든 분야에서 인공지능 학습용 데이터에 관한 선행연구를 기반으로, 인공지능 학습용 데이터의 가치를 높이고 고품질 데이터를 확보하고자 하는 움직임이 늘고 있다. 따라서, 고품질 데이터를 확보하기 위한 구축사업에서는 품질관리가 매우 중요하다. 이에, 본 논문에서는 인공지능 학습용 데이터를 구축할 시 고품질데이터 확보를 위한 품질관리와 그에 따른 구축공정별 개선방안을 제시하였다. 특히, 인공지능 학습을 위해 구축되는 비정형데이터는 데이터 품질의 80% 이상이 구축과정에서 결정된다. 본 논문에서는 비정형데이터 이미지/영상데이터에 대한 품질검사를 통해 구축단계에서의 획득, data cleaning, labeling 모델에서 발생된 검사절차 및 문제 요소를 해결함으로써 고품질 데이터 확보 방안을 제시하였으며, 제시한 방안을 토대로 인공지능 학습용 데이터 구축에 참여하는 연구단체와 사업자들에게 데이터의 품질편차를 극복하기 위한 대안이 될 것으로 기대된다.

증류 기반 연합 학습에서 로짓 역전을 통한 개인 정보 취약성에 관한 연구 (A Survey on Privacy Vulnerabilities through Logit Inversion in Distillation-based Federated Learning)

  • 윤수빈;조윤기;백윤흥
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 춘계학술발표대회
    • /
    • pp.711-714
    • /
    • 2024
  • In the dynamic landscape of modern machine learning, Federated Learning (FL) has emerged as a compelling paradigm designed to enhance privacy by enabling participants to collaboratively train models without sharing their private data. Specifically, Distillation-based Federated Learning, like Federated Learning with Model Distillation (FedMD), Federated Gradient Encryption and Model Sharing (FedGEMS), and Differentially Secure Federated Learning (DS-FL), has arisen as a novel approach aimed at addressing Non-IID data challenges by leveraging Federated Learning. These methods refine the standard FL framework by distilling insights from public dataset predictions, securing data transmissions through gradient encryption, and applying differential privacy to mask individual contributions. Despite these innovations, our survey identifies persistent vulnerabilities, particularly concerning the susceptibility to logit inversion attacks where malicious actors could reconstruct private data from shared public predictions. This exploration reveals that even advanced Distillation-based Federated Learning systems harbor significant privacy risks, challenging the prevailing assumptions about their security and underscoring the need for continued advancements in secure Federated Learning methodologies.