• 제목/요약/키워드: Machine Learning Empirical Study

검색결과 92건 처리시간 0.036초

RISKY MODULE PREDICTION FOR NUCLEAR I&C SOFTWARE

  • Kim, Young-Mi;Kim, Hyeon-Soo
    • Nuclear Engineering and Technology
    • /
    • 제44권6호
    • /
    • pp.663-672
    • /
    • 2012
  • As software based digital I&C (Instrumentation and Control) systems are used more prevalently in nuclear plants, enhancement of software dependability has become an important issue in the area of nuclear I&C systems. Critical attributes of software dependability are safety and reliability. These attributes are tightly related to software failures caused by faults. Software testing and V&V (Verification and Validation) activities are hence important for enhancing software dependability. If the risky modules of safety-critical software can be predicted, it will be possible to focus on testing and V&V activities more efficiently and effectively. It should also make it possible to better allocate resources for regulation activities. We propose a prediction technique to estimate risky software modules by adopting machine learning models based on software complexity metrics. An empirical study with various machine learning algorithms was executed for comparing the prediction performance. Experimental results show SVMs (Support Vector Machines) perform as well or better than the other methods.

Finding the best suited autoencoder for reducing model complexity

  • Ngoc, Kien Mai;Hwang, Myunggwon
    • 스마트미디어저널
    • /
    • 제10권3호
    • /
    • pp.9-22
    • /
    • 2021
  • Basically, machine learning models use input data to produce results. Sometimes, the input data is too complicated for the models to learn useful patterns. Therefore, feature engineering is a crucial data preprocessing step for constructing a proper feature set to improve the performance of such models. One of the most efficient methods for automating feature engineering is the autoencoder, which transforms the data from its original space into a latent space. However certain factors, including the datasets, the machine learning models, and the number of dimensions of the latent space (denoted by k), should be carefully considered when using the autoencoder. In this study, we design a framework to compare two data preprocessing approaches: with and without autoencoder and to observe the impact of these factors on autoencoder. We then conduct experiments using autoencoders with classifiers on popular datasets. The empirical results provide a perspective regarding the best suited autoencoder for these factors.

Prediction & Assessment of Change Prone Classes Using Statistical & Machine Learning Techniques

  • Malhotra, Ruchika;Jangra, Ravi
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.778-804
    • /
    • 2017
  • Software today has become an inseparable part of our life. In order to achieve the ever demanding needs of customers, it has to rapidly evolve and include a number of changes. In this paper, our aim is to study the relationship of object oriented metrics with change proneness attribute of a class. Prediction models based on this study can help us in identifying change prone classes of a software. We can then focus our efforts on these change prone classes during testing to yield a better quality software. Previously, researchers have used statistical methods for predicting change prone classes. But machine learning methods are rarely used for identification of change prone classes. In our study, we evaluate and compare the performances of ten machine learning methods with the statistical method. This evaluation is based on two open source software systems developed in Java language. We also validated the developed prediction models using other software data set in the same domain (3D modelling). The performance of the predicted models was evaluated using receiver operating characteristic analysis. The results indicate that the machine learning methods are at par with the statistical method for prediction of change prone classes. Another analysis showed that the models constructed for a software can also be used to predict change prone nature of classes of another software in the same domain. This study would help developers in performing effective regression testing at low cost and effort. It will also help the developers to design an effective model that results in less change prone classes, hence better maintenance.

Kernel-Trick Regression and Classification

  • Huh, Myung-Hoe
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.201-207
    • /
    • 2015
  • Support vector machine (SVM) is a well known kernel-trick supervised learning tool. This study proposes a working scheme for kernel-trick regression and classification (KtRC) as a SVM alternative. KtRC fits the model on a number of random subsamples and selects the best model. Empirical examples and a simulation study indicate that KtRC's performance is comparable to SVM.

자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상 (Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection)

  • 황영숙;정후중;박소영;곽용재;임해창
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제29권9호
    • /
    • pp.654-668
    • /
    • 2002
  • In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

기업의 머신러닝 선정에 영향을 미치는 요인 연구: 확장된 알고리즘 선택 문제의 관점으로 (A Study on the Factors Influencing a Company's Selection of Machine Learning: From the Perspective of Expanded Algorithm Selection Problem)

  • 이영수;권민수;권오병
    • 한국전자거래학회지
    • /
    • 제27권2호
    • /
    • pp.37-64
    • /
    • 2022
  • 인공지능의 사회적수용도가 증가하면서 머신러닝 기법을 기업에 적용하는 사례가 증가하고 있다. 머신러닝 기법의 선정에는 주로 정확성이나 해석 가능성 등 기술적 요인이 주로 기준이 되어왔다. 그러나 머신러닝 채택의 성공은 개발부서, 사용부서, 리더십과 조직문화 등 경영관리 요인도 영향을 주기도 한다. 아쉽게도 기술적 요인과 경영관리적 요인이 함께 고려된 머신러닝 선정의 성공 요인을 이해하는 통합 연구가 거의 존재하지 않는다. 이에 본 논문의 목적은 기업 내 머신러닝 선정을 이해하기 위해 John Rice의 algorithm selection process model과 task-technology fit, 그리고 IS Success Model 이론을 결합한 기술-경영관리 통합 모형을제안하고 실증적 분석을 하는 것이다. 머신러닝을 도입한 국내 기업 240곳을 대상으로 설문 분석을 실시한 결과 알고리즘 품질과 데이터 품질이 높을수록 문제-알고리즘 적합성에 높게 영향을 주는 것으로 나타났으며, 문제-알고리즘 적합성은 조직의 생산성과 혁신성에도 유의한 영향을 미치는 것으로 검증되었다. 또한 외주화와 경영진 지원이 머신러닝 시스템 품질에 긍정적인 영향을 미치고, 데이터 중심 경영 및 동기화와 같은 조직문화 요인은 활용성과에 높은 영향을 미치는 것으로 확인되었다.

Machine learning-enabled parameterization scheme for aerodynamic shape optimization of wind-sensitive structures: A-proof-of-concept study

  • Shaopeng Li;Brian M. Phillips;Zhaoshuo Jiang
    • Wind and Structures
    • /
    • 제39권3호
    • /
    • pp.175-190
    • /
    • 2024
  • Aerodynamic shape optimization is very useful for enhancing the performance of wind-sensitive structures. However, shape parameterization, as the first step in the pipeline of aerodynamic shape optimization, still heavily depends on empirical judgment. If not done properly, the resulting small design space may fail to cover many promising shapes, and hence hinder realizing the full potential of aerodynamic shape optimization. To this end, developing a novel shape parameterization scheme that can reflect real-world complexities while being simple enough for the subsequent optimization process is important. This study proposes a machine learning-based scheme that can automatically learn a low-dimensional latent representation of complex aerodynamic shapes for bluff-body wind-sensitive structures. The resulting latent representation (as design variables for aerodynamic shape optimization) is composed of both discrete and continuous variables, which are embedded in a hierarchy structure. In addition to being intuitive and interpretable, the mixed discrete and continuous variables with the hierarchy structure allow stakeholders to narrow the search space selectively based on their interests. As a proof-of-concept study, shape parameterization examples of tall building cross sections are used to demonstrate the promising features of the proposed scheme and guide future investigations on data-driven parameterization for aerodynamic shape optimization of wind-sensitive structures.

머신러닝을 활용한 코스닥 관리종목지정 예측 (Predicting Administrative Issue Designation in KOSDAQ Market Using Machine Learning Techniques)

  • 채승일;이동주
    • 아태비즈니스연구
    • /
    • 제13권2호
    • /
    • pp.107-122
    • /
    • 2022
  • Purpose - This study aims to develop machine learning models to predict administrative issue designation in KOSDAQ Market using financial data. Design/methodology/approach - Employing four classification techniques including logistic regression, support vector machine, random forest, and gradient boosting to a matched sample of five hundred and thirty-six firms over an eight-year period, the authors develop prediction models and explore the practicality of the models. Findings - The resulting four binary selection models reveal overall satisfactory classification performance in terms of various measures including AUC (area under the receiver operating characteristic curve), accuracy, F1-score, and top quartile lift, while the ensemble models (random forest and gradienct boosting) outperform the others in terms of most measures. Research implications or Originality - Although the assessment of administrative issue potential of firms is critical information to investors and financial institutions, detailed empirical investigation has lagged behind. The current research fills this gap in the literature by proposing parsimonious prediction models based on a few financial variables and validating the applicability of the models.

머신러닝기반의 지도학습과 분류 알고리즘을 적용한 웹쉘 탐지시스템(MWSDS)제안 연구 (Proposal and empirical study of web shell detection system (MWSDS) applying machine learning-based supervised learning and classification)

  • 김기환;이상도;신용태
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2024년도 제69차 동계학술대회논문집 32권1호
    • /
    • pp.49-50
    • /
    • 2024
  • 본 논문에서는 웹쉘 악성코드를 정확하게 분류하고, 빠른시간안에 자동으로 웹쉘 분류 및 분석을 통하여 웹쉘을 탐지하기 위하여 인공지능 머신러닝 기반의 Supervised AI ML 및 Classification 알고리즘을 적용하여 빠른 시간안에 분류, 정확한 분석을 통하여 자동화된 탐지시스템인 MWSDS를 제안하고 웹쉘 실험 데이터를 통하여 실증하였다. 본제안의 경우 웹쉘악성코드 공격에 대한 대응뿐만아니라 관리적인 정보보호 체계수립을 통하여 보다 효과적이며, 지속적으로 대응할 수 있을 것으로 전망된다.

  • PDF

Markov Chain을 응용한 학습 성과 예측 방법 개선 (Improving learning outcome prediction method by applying Markov Chain)

  • 황철현
    • 문화기술의 융합
    • /
    • 제10권4호
    • /
    • pp.595-600
    • /
    • 2024
  • 학습 성과를 예측하거나 학습 경로를 최적화하는 연구 분야에서 기계학습과 같은 인공지능 기술의 사용이 점차 증가하면서 교육 분야의 인공지능 활용은 점차 많은 진전을 보이고 있다. 이러한 연구는 점차 심층학습과 강화학습과 같은 좀 더 고도화된 인공지능 방법으로 진화하고 있다. 본 연구는 학습자의 과거 학습 성과-이력 데이터를 기반으로 미래의 학습 성과를 예측하는 방법을 개선하는 것이다. 따라서 예측 성능을 높이기 위해 Markov Chain 방법을 응용한 조건부 확률을 제안한다. 이 방법은 기계학습에 의한 분류 예측에 추가하여 학습자가 학습 이력 데이터를 분류 예측에 추가함으로써 분류기의 예측 성능을 향상 시키기 위해 사용된다. 제안 방법의 효과를 확인하기 위해서 실증 데이터인 '교구 기반의 유아 교육 학습 성과 데이터'를 활용하여 기존의 분류 알고리즘과 제안 방법에 의한 분류 성능 지표를 비교하는 실험을 수행하였다. 실험 결과, 분류 알고리즘만 단독 사용한 사례보다 제안 방법에 의한 사례에서 더 높은 성능 지표를 산출한다는 것을 확인할 수 있었다.