• Title/Summary/Keyword: Computational ROC

Search Result 5, Processing Time 0.015 seconds

SPECT Image Analysis Using Computational ROC Curve Based on Threshold Setup

  • Kim, Moo-Sub;Shin, Han-Back;Kim, Sunmi;Shim, Jae Goo;Yoon, Do-Kun;Suh, Tae Suk
    • Progress in Medical Physics
    • /
    • v.28 no.3
    • /
    • pp.77-82
    • /
    • 2017
  • We proposed the objective ROC analysis method based on the setting of threshold value for evaluation of single photon emission computed tomography (SPECT) image. This proposed ROC analysis method uses the quantification computational threshold value to each signal on the SPECT image. The SPECT images for this study were acquired by using Monte Carlo n-particle extended simulation code (MCNPX, Ver. 2.6.0, Los Alamos National Laboratory, USA). The basic SPECT detectors and specific water phantom were realized in the simulation, and we could get the simulation results by the simulation operation. We tried to analyze the reconstructed images using threshold value application based objective ROC method. We can get the accuracy information of reconstructed region in the image. This proposed ROC technique can be helpful when we have to evaluate the weak signal for the NM image. In this study, the proposed threshold value based computational ROC analysis method can provide better objectivity than the conventional ROC analysis method.

L1-penalized AUC-optimization with a surrogate loss

  • Hyungwoo Kim;Seung Jun Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.203-212
    • /
    • 2024
  • The area under the ROC curve (AUC) is one of the most common criteria used to measure the overall performance of binary classifiers for a wide range of machine learning problems. In this article, we propose a L1-penalized AUC-optimization classifier that directly maximizes the AUC for high-dimensional data. Toward this, we employ the AUC-consistent surrogate loss function and combine the L1-norm penalty which enables us to estimate coefficients and select informative variables simultaneously. In addition, we develop an efficient optimization algorithm by adopting k-means clustering and proximal gradient descent which enjoys computational advantages to obtain solutions for the proposed method. Numerical simulation studies demonstrate that the proposed method shows promising performance in terms of prediction accuracy, variable selectivity, and computational costs.

Functional Prediction of Hypothetical Proteins from Shigella flexneri and Validation of the Predicted Models by Using ROC Curve Analysis

  • Gazi, Md. Amran;Mahmud, Sultan;Fahim, Shah Mohammad;Kibria, Mohammad Golam;Palit, Parag;Islam, Md. Rezaul;Rashid, Humaira;Das, Subhasish;Mahfuz, Mustafa;Ahmeed, Tahmeed
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.26.1-26.12
    • /
    • 2018
  • Shigella spp. constitutes some of the key pathogens responsible for the global burden of diarrhoeal disease. With over 164 million reported cases per annum, shigellosis accounts for 1.1 million deaths each year. Majority of these cases occur among the children of the developing nations and the emergence of multi-drug resistance Shigella strains in clinical isolates demands the development of better/new drugs against this pathogen. The genome of Shigella flexneri was extensively analyzed and found 4,362 proteins among which the functions of 674 proteins, termed as hypothetical proteins (HPs) had not been previously elucidated. Amino acid sequences of all these 674 HPs were studied and the functions of a total of 39 HPs have been assigned with high level of confidence. Here we have utilized a combination of the latest versions of databases to assign the precise function of HPs for which no experimental information is available. These HPs were found to belong to various classes of proteins such as enzymes, binding proteins, signal transducers, lipoprotein, transporters, virulence and other proteins. Evaluation of the performance of the various computational tools conducted using receiver operating characteristic curve analysis and a resoundingly high average accuracy of 93.6% were obtained. Our comprehensive analysis will help to gain greater understanding for the development of many novel potential therapeutic interventions to defeat Shigella infection.

Genetic Function Approximation and Bayesian Models for the Discovery of Future HDAC8 Inhibitors

  • Thangapandian, Sundarapandian;John, Shalini;Lee, Keun-Woo
    • Interdisciplinary Bio Central
    • /
    • v.3 no.4
    • /
    • pp.15.1-15.11
    • /
    • 2011
  • Background: Histone deacetylase (HDAC) 8 is one of its family members catalyzes the removal of acetyl groups from N-terminal lysine residues of histone proteins thereby restricts transcription factors from being expressed. Inhibition of HDAC8 has become an emerging and effective anti-cancer therapy for various cancers. Application computational methodologies may result in identifying the key components that can be used in developing future potent HDAC8 inhibitors. Results: Facilitating the discovery of novel and potential chemical scaffolds as starting points in the future HDAC8 inhibitor design, quantitative structure-activity relationship models were generated with 30 training set compounds using genetic function approximation (GFA) and Bayesian algorithms. Six GFA models were selected based on the significant statistical parameters calculated during model development. A Bayesian model using fingerprints was developed with a receiver operating characteristic curve cross-validation value of 0.902. An external test set of 54 diverse compounds was used in validating the models. Conclusions: Finally two out of six models based on their predictive ability over the test set compounds were selected as final GFA models. The Bayesian model has displayed a high classifying ability with the same test set compounds and the positively and negatively contributing molecular fingerprints were also unveiled by the model. The effectively contributing physicochemical properties and molecular fingerprints from a set of known HDAC8 inhibitors were identified and can be used in designing future HDAC8 inhibitors.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.