• Title/Summary/Keyword: machine learning techniques

Search Result 1,117, Processing Time 0.023 seconds

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Implementing a Branch-and-bound Algorithm for Transductive Support Vector Machines

  • Park, Chan-Kyoo
    • Management Science and Financial Engineering
    • /
    • v.16 no.1
    • /
    • pp.81-117
    • /
    • 2010
  • Semi-supervised learning incorporates unlabeled examples, whose labels are unknown, as well as labeled examples into learning process. Although transductive support vector machine (TSVM), one of semi-supervised learning models, was proposed about a decade ago, its application to large-scaled data has still been limited due to its high computational complexity. Our previous research addressed this limitation by introducing a branch-and-bound algorithm for finding an optimal solution to TSVM. In this paper, we propose three new techniques to enhance the performance of the branch-and-bound algorithm. The first one tightens min-cut bound, one of two bounding strategies. Another technique exploits a graph-based approximation to a support vector machine problem to avoid the most time-consuming step. The last one tries to fix the labels of unlabeled examples whose labels can be obviously predicted based on labeled examples. Experimental results are presented which demonstrate that the proposed techniques can reduce drastically the number of subproblems and eventually computational time.

Systematic Research on Privacy-Preserving Distributed Machine Learning (프라이버시를 보호하는 분산 기계 학습 연구 동향)

  • Min Seob Lee;Young Ah Shin;Ji Young Chun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.2
    • /
    • pp.76-90
    • /
    • 2024
  • Although artificial intelligence (AI) can be utilized in various domains such as smart city, healthcare, it is limited due to concerns about the exposure of personal and sensitive information. In response, the concept of distributed machine learning has emerged, wherein learning occurs locally before training a global model, mitigating the concentration of data on a central server. However, overall learning phase in a collaborative way among multiple participants poses threats to data privacy. In this paper, we systematically analyzes recent trends in privacy protection within the realm of distributed machine learning, considering factors such as the presence of a central server, distribution environment of the training datasets, and performance variations among participants. In particular, we focus on key distributed machine learning techniques, including horizontal federated learning, vertical federated learning, and swarm learning. We examine privacy protection mechanisms within these techniques and explores potential directions for future research.

Systematic Review of Bug Report Processing Techniques to Improve Software Management Performance

  • Lee, Dong-Gun;Seo, Yeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.967-985
    • /
    • 2019
  • Bug report processing is a key element of bug fixing in modern software maintenance. Bug reports are not processed immediately after submission and involve several processes such as bug report deduplication and bug report triage before bug fixing is initiated; however, this method of bug fixing is very inefficient because all these processes are performed manually. Software engineers have persistently highlighted the need to automate these processes, and as a result, many automation techniques have been proposed for bug report processing; however, the accuracy of the existing methods is not satisfactory. Therefore, this study focuses on surveying to improve the accuracy of existing techniques for bug report processing. Reviews of each method proposed in this study consist of a description, used techniques, experiments, and comparison results. The results of this study indicate that research in the field of bug deduplication still lacks and therefore requires numerous studies that integrate clustering and natural language processing. This study further indicates that although all studies in the field of triage are based on machine learning, results of studies on deep learning are still insufficient.

Prediction Model of Software Fault using Deep Learning Methods (딥러닝 기법을 사용하는 소프트웨어 결함 예측 모델)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.111-117
    • /
    • 2022
  • Many studies have been conducted on software fault prediction models for decades, and the models using machine learning techniques showed the best performance. Deep learning techniques have become the most popular in the field of machine learning, but few studies have used them as classifiers for fault prediction models. Some studies have used deep learning to obtain semantic information from the model input source code or syntactic data. In this paper, we produced several models by changing the model structure and hyperparameters using MLP with three or more hidden layers. As a result of the model evaluation experiment, the MLP-based deep learning models showed similar performance to the existing models in terms of Accuracy, but significantly better in AUC. It also outperformed another deep learning model, the CNN model.

Realtime Analysis of Sasang Constitution Types from Facial Features Using Computer Vision and Machine Learning

  • Abdullah;Shah Mahsoom Ali;Hee-Cheol Kim
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.3
    • /
    • pp.256-266
    • /
    • 2024
  • Sasang constitutional medicine (SCM) is one of the best traditional therapeutic approaches used in Korea. SCM prioritizes personalized treatment that considers the unique constitution of an individual and encompasses their physical characteristics, personality traits, and susceptibility to specific diseases. Facial features are essential for diagnosing Sasang constitutional types (SCTs). This study aimed to develop a real-time artificial intelligence-based model for diagnosing SCTs using facial images, building an SCTs prediction model based on a machine learning method. Facial features from all images were extracted to develop this model using feature engineering and machine learning techniques. The fusion of these features was used to train the AI model. We used four machine learning algorithms, namely, random forest (RF), multilayer perceptron (MLP), gradient boosting machine (GBM), and extreme gradient boosting (XGB), to investigate SCTs. The GBM outperformed all the other models. The highest accuracy achieved in the experiment was 81%, indicating the robustness of the proposed model and suitability for real-time applications.

Network Traffic Measurement Analysis using Machine Learning

  • Hae-Duck Joshua Jeong
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.2
    • /
    • pp.19-27
    • /
    • 2023
  • In recent times, an exponential increase in Internet traffic has been observed as a result of advancing development of the Internet of Things, mobile networks with sensors, and communication functions within various devices. Further, the COVID-19 pandemic has inevitably led to an explosion of social network traffic. Within this context, considerable attention has been drawn to research on network traffic analysis based on machine learning. In this paper, we design and develop a new machine learning framework for network traffic analysis whereby normal and abnormal traffic is distinguished from one another. To achieve this, we combine together well-known machine learning algorithms and network traffic analysis techniques. Using one of the most widely used datasets KDD CUP'99 in the Weka and Apache Spark environments, we compare and investigate results obtained from time series type analysis of various aspects including malicious codes, feature extraction, data formalization, network traffic measurement tool implementation. Experimental analysis showed that while both the logistic regression and the support vector machine algorithm were excellent for performance evaluation, among these, the logistic regression algorithm performs better. The quantitative analysis results of our proposed machine learning framework show that this approach is reliable and practical, and the performance of the proposed system and another paper is compared and analyzed. In addition, we determined that the framework developed in the Apache Spark environment exhibits a much faster processing speed in the Spark environment than in Weka as there are more datasets used to create and classify machine learning models.

Underwater Acoustic Research Trends with Machine Learning: General Background

  • Yang, Haesang;Lee, Keunhwa;Choo, Youngmin;Kim, Kookhyun
    • Journal of Ocean Engineering and Technology
    • /
    • v.34 no.2
    • /
    • pp.147-154
    • /
    • 2020
  • Underwater acoustics that is the study of the phenomenon of underwater wave propagation and its interaction with boundaries, has mainly been applied to the fields of underwater communication, target detection, marine resources, marine environment, and underwater sound sources. Based on the scientific and engineering understanding of acoustic signals/data, recent studies combining traditional and data-driven machine learning methods have shown continuous progress. Machine learning, represented by deep learning, has shown unprecedented success in a variety of fields, owing to big data, graphical processor unit computing, and advances in algorithms. Although machine learning has not yet been implemented in every single field of underwater acoustics, it will be used more actively in the future in line with the ongoing development and overwhelming achievements of this method. To understand the research trends of machine learning applications in underwater acoustics, the general theoretical background of several related machine learning techniques is introduced in this paper.

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode (N-gram Opcode를 활용한 머신러닝 기반의 분석 방지 보호 기법 탐지 방안 연구)

  • Kim, Hee Yeon;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.181-192
    • /
    • 2022
  • The emergence of new malware is incapacitating existing signature-based malware detection techniques., and applying various anti-analysis techniques makes it difficult to analyze. Recent studies related to signature-based malware detection have limitations in that malware creators can easily bypass them. Therefore, in this study, we try to build a machine learning model that can detect and classify the anti-analysis techniques of packers applied to malware, not using the characteristics of the malware itself. In this study, the n-gram opcodes are extracted from the malicious binary to which various anti-analysis techniques of the commercial packers are applied, and the features are extracted by using TF-IDF, and through this, each anti-analysis technique is detected and classified. In this study, real-world malware samples packed using The mida and VMProtect with multiple anti-analysis techniques were trained and tested with 6 machine learning models, and it constructed the optimal model showing 81.25% accuracy for The mida and 95.65% accuracy for VMProtect.