• Title/Summary/Keyword: Machine Learning Models

Search Result 1,317, Processing Time 0.03 seconds

Generating Training Dataset of Machine Learning Model for Context-Awareness in a Health Status Notification Service (사용자 건강 상태알림 서비스의 상황인지를 위한 기계학습 모델의 학습 데이터 생성 방법)

  • Mun, Jong Hyeok;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.1
    • /
    • pp.25-32
    • /
    • 2020
  • In the context-aware system, rule-based AI technology has been used in the abstraction process for getting context information. However, the rules are complicated by the diversification of user requirements for the service and also data usage is increased. Therefore, there are some technical limitations to maintain rule-based models and to process unstructured data. To overcome these limitations, many studies have applied machine learning techniques to Context-aware systems. In order to utilize this machine learning-based model in the context-aware system, a management process of periodically injecting training data is required. In the previous study on the machine learning based context awareness system, a series of management processes such as the generation and provision of learning data for operating several machine learning models were considered, but the method was limited to the applied system. In this paper, we propose a training data generating method of a machine learning model to extend the machine learning based context-aware system. The proposed method define the training data generating model that can reflect the requirements of the machine learning models and generate the training data for each machine learning model. In the experiment, the training data generating model is defined based on the training data generating schema of the cardiac status analysis model for older in health status notification service, and the training data is generated by applying the model defined in the real environment of the software. In addition, it shows the process of comparing the accuracy by learning the training data generated in the machine learning model, and applied to verify the validity of the generated learning data.

Application and evaluation of machine-learning model for fire accelerant classification from GC-MS data of fire residue

  • Park, Chihyun;Park, Wooyong;Jeon, Sookyung;Lee, Sumin;Lee, Joon-Bae
    • Analytical Science and Technology
    • /
    • v.34 no.5
    • /
    • pp.231-239
    • /
    • 2021
  • Detection of fire accelerants from fire residues is critical to determine whether the case was arson or accidental fire. However, to develop a standardized model for determining the presence or absence of fire accelerants was not easy because of high temperature which cause disappearance or combustion of components of fire accelerants. In this study, logistic regression, random forest, and support vector machine models were trained and evaluated from a total of 728 GC-MS analysis data obtained from actual fire residues. Mean classification accuracies of the three models were 63 %, 81 %, and 84 %, respectively, and in particular, mean AU-PR values of the three models were evaluated as 0.68, 0.86, and 0.86, respectively, showing fine performances of random forest and support vector machine models.

Machine learning models for predicting the compressive strength of concrete containing nano silica

  • Garg, Aman;Aggarwal, Paratibha;Aggarwal, Yogesh;Belarbi, M.O.;Chalak, H.D.;Tounsi, Abdelouahed;Gulia, Reeta
    • Computers and Concrete
    • /
    • v.30 no.1
    • /
    • pp.33-42
    • /
    • 2022
  • Experimentally predicting the compressive strength (CS) of concrete (for a mix design) is a time-consuming and laborious process. The present study aims to propose surrogate models based on Support Vector Machine (SVM) and Gaussian Process Regression (GPR) machine learning techniques, which can predict the CS of concrete containing nano-silica. Content of cement, aggregates, nano-silica and its fineness, water-binder ratio, and the days at which strength has to be predicted are the input variables. The efficiency of the models is compared in terms of Correlation Coefficient (CC), Root Mean Square Error (RMSE), Variance Account For (VAF), Nash-Sutcliffe Efficiency (NSE), and RMSE to observation's standard deviation ratio (RSR). It has been observed that the SVM outperforms GPR in predicting the CS of the concrete containing nano-silica.

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

Optimizing shallow foundation design: A machine learning approach for bearing capacity estimation over cavities

  • Kumar Shubham;Subhadeep Metya;Abdhesh Kumar Sinha
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.629-641
    • /
    • 2024
  • The presence of excavations or cavities beneath the foundations of a building can have a significant impact on their stability and cause extensive damage. Traditional methods for calculating the bearing capacity and subsidence of foundations over cavities can be complex and time-consuming, particularly when dealing with conditions that vary. In such situations, machine learning (ML) and deep learning (DL) techniques provide effective alternatives. This study concentrates on constructing a prediction model based on the performance of ML and DL algorithms that can be applied in real-world settings. The efficacy of eight algorithms, including Regression Analysis, k-Nearest Neighbor, Decision Tree, Random Forest, Multivariate Regression Spline, Artificial Neural Network, and Deep Neural Network, was evaluated. Using a Python-assisted automation technique integrated with the PLAXIS 2D platform, a dataset containing 272 cases with eight input parameters and one target variable was generated. In general, the DL model performed better than the ML models, and all models, except the regression models, attained outstanding results with an R2 greater than 0.90. These models can also be used as surrogate models in reliability analysis to evaluate failure risks and probabilities.

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • Sarwar, Muhammad Nabeel;UlAmin, Riaz;Jabeen, Sidra
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.294-302
    • /
    • 2022
  • Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.

A Study on the Insider Behavior Analysis Framework for Detecting Information Leakage Using Network Traffic Collection and Restoration (네트워크 트래픽 수집 및 복원을 통한 내부자 행위 분석 프레임워크 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.4
    • /
    • pp.125-139
    • /
    • 2017
  • In this paper, we developed a framework to detect and predict insider information leakage by collecting and restoring network traffic. For automated behavior analysis, many meta information and behavior information obtained using network traffic collection are used as machine learning features. By these features, we created and learned behavior model, network model and protocol-specific models. In addition, the ensemble model was developed by digitizing and summing the results of various models. We developed a function to present information leakage candidates and view meta information and behavior information from various perspectives using the visual analysis. This supports to rule-based threat detection and machine learning based threat detection. In the future, we plan to make an ensemble model that applies a regression model to the results of the models, and plan to develop a model with deep learning technology.

Purchase Prediction by Analyzing Users' Online Behaviors Using Machine Learning and Information Theory Approaches

  • Kim, Minsung;Im, Il;Han, Sangman
    • Asia pacific journal of information systems
    • /
    • v.26 no.1
    • /
    • pp.66-79
    • /
    • 2016
  • The availability of detailed data on customers' online behaviors and advances in big data analysis techniques enable us to predict consumer behaviors. In the past, researchers have built purchase prediction models by analyzing clickstream data; however, these clickstream-based prediction models have had several limitations. In this study, we propose a new method for purchase prediction that combines information theory with machine learning techniques. Clickstreams from 5,000 panel members and data on their purchases of electronics, fashion, and cosmetics products were analyzed. Clickstreams were summarized using the 'entropy' concept from information theory, while 'random forests' method was applied to build prediction models. The results show that prediction accuracy of this new method ranges from 0.56 to 0.83, which is a significant improvement over values for clickstream-based prediction models presented in the past. The results indicate further that consumers' information search behaviors differ significantly across product categories.

Finding the best suited autoencoder for reducing model complexity

  • Ngoc, Kien Mai;Hwang, Myunggwon
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.9-22
    • /
    • 2021
  • Basically, machine learning models use input data to produce results. Sometimes, the input data is too complicated for the models to learn useful patterns. Therefore, feature engineering is a crucial data preprocessing step for constructing a proper feature set to improve the performance of such models. One of the most efficient methods for automating feature engineering is the autoencoder, which transforms the data from its original space into a latent space. However certain factors, including the datasets, the machine learning models, and the number of dimensions of the latent space (denoted by k), should be carefully considered when using the autoencoder. In this study, we design a framework to compare two data preprocessing approaches: with and without autoencoder and to observe the impact of these factors on autoencoder. We then conduct experiments using autoencoders with classifiers on popular datasets. The empirical results provide a perspective regarding the best suited autoencoder for these factors.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.