• Title/Summary/Keyword: Five-fold cross validation

Search Result 27, Processing Time 0.026 seconds

Survival Prediction of Rats with Hemorrhagic Shocks Using Support Vector Machine (지원벡터기계를 이용한 출혈을 일으킨 흰쥐에서의 생존 예측)

  • Jang, K.H.;Choi, J.L.;Yoo, T.K.;Kwon, M.K.;Kim, D.W.
    • Journal of Biomedical Engineering Research
    • /
    • v.33 no.1
    • /
    • pp.1-7
    • /
    • 2012
  • Hemorrhagic shock is a common cause of death in emergency rooms. Early diagnosis of hemorrhagic shock makes it possible for physicians to treat patients successfully. Therefore, the purpose of this study was to select an optimal survival prediction model using physiological parameters for the two analyzed periods: two and five minutes before and after the bleeding end. We obtained heart rates, mean arterial pressures, respiration rates and temperatures from 45 rats. These physiological parameters were used for the training and testing data sets of survival prediction models using an artificial neural network (ANN) and support vector machine (SVM). We applied a 5-fold cross validation method to avoid over-fitting and to select the optimal survival prediction model. In conclusion, SVM model showed slightly better accuracy than ANN model for survival prediction during the entire analysis period.

Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis (텍스트 분류 기반 기계학습의 정신과 진단 예측 적용)

  • Pak, Doohyun;Hwang, Mingyu;Lee, Minji;Woo, Sung-Il;Hahn, Sang-Woo;Lee, Yeon Jung;Hwang, Jaeuk
    • Korean Journal of Biological Psychiatry
    • /
    • v.27 no.1
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

A novel method for predicting protein subcellular localization based on pseudo amino acid composition

  • Ma, Junwei;Gu, Hong
    • BMB Reports
    • /
    • v.43 no.10
    • /
    • pp.670-676
    • /
    • 2010
  • In this paper, a novel approach, ELM-PCA, is introduced for the first time to predict protein subcellular localization. Firstly, Protein Samples are represented by the pseudo amino acid composition (PseAAC). Secondly, the principal component analysis (PCA) is employed to extract essential features. Finally, the Elman Recurrent Neural Network (RNN) is used as a classifier to identify the protein sequences. The results demonstrate that the proposed approach is effective and practical.

Convolutional Neural Networks for Character-level Classification

  • Ko, Dae-Gun;Song, Su-Han;Kang, Ki-Min;Han, Seong-Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.1
    • /
    • pp.53-59
    • /
    • 2017
  • Optical character recognition (OCR) automatically recognizes text in an image. OCR is still a challenging problem in computer vision. A successful solution to OCR has important device applications, such as text-to-speech conversion and automatic document classification. In this work, we analyze character recognition performance using the current state-of-the-art deep-learning structures. One is the AlexNet structure, another is the LeNet structure, and the other one is the SPNet structure. For this, we have built our own dataset that contains digits and upper- and lower-case characters. We experiment in the presence of salt-and-pepper noise or Gaussian noise, and report the performance comparison in terms of recognition error. Experimental results indicate by five-fold cross-validation that the SPNet structure (our approach) outperforms AlexNet and LeNet in recognition error.

Evaluation of Hemiplegic Gait Using Accelerometer (가속도센서를 이용한 편마비성보행 평가)

  • Lee, Jun Seok;Park, Sooji;Shin, Hangsik
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.11
    • /
    • pp.1634-1640
    • /
    • 2017
  • The study aims to distinguish hemiplegic gait and normal gait using simple wearable device and classification algorithm. Thus, we developed a wearable system equipped three axis accelerometer and three axis gyroscope. The developed wearable system was verified by clinical experiment. In experiment, twenty one normal subjects and twenty one patients undergoing stroke treatment were participated. Based on the measured inertial signal, a random forest algorithm was used to classify hemiplegic gait. Four-fold cross validation was applied to ensure the reliability of the results. To select optimal attributes, we applied the forward search algorithm with 10 times of repetition, then selected five most frequently attributes were chosen as a final attribute. The results of this study showed that 95.2% of accuracy in hemiplegic gait and normal gait classification and 77.4% of accuracy in hemiplegic-side and normal gait classification.

Deep-learning based In-situ Monitoring and Prediction System for the Organic Light Emitting Diode

  • Park, Il-Hoo;Cho, Hyeran;Kim, Gyu-Tae
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.4
    • /
    • pp.126-129
    • /
    • 2020
  • We introduce a lifetime assessment technique using deep learning algorithm with complex electrical parameters such as resistivity, permittivity, impedance parameters as integrated indicators for predicting the degradation of the organic molecules. The evaluation system consists of fully automated in-situ measurement system and multiple layer perceptron learning system with five hidden layers and 1011 perceptra in each layer. Prediction accuracies are calculated and compared depending on the physical feature, learning hyperparameters. 62.5% of full time-series data are used for training and its prediction accuracy is estimated as r-square value of 0.99. Remaining 37.5% of the data are used for testing with prediction accuracy of 0.95. With k-fold cross-validation, the stability to the instantaneous changes in the measured data is also improved.

Assessment of wall convergence for tunnels using machine learning techniques

  • Mahmoodzadeh, Arsalan;Nejati, Hamid Reza;Mohammadi, Mokhtar;Ibrahim, Hawkar Hashim;Mohammed, Adil Hussein;Rashidi, Shima
    • Geomechanics and Engineering
    • /
    • v.31 no.3
    • /
    • pp.265-279
    • /
    • 2022
  • Tunnel convergence prediction is essential for the safe construction and design of tunnels. This study proposes five machine learning models of deep neural network (DNN), K-nearest neighbors (KNN), Gaussian process regression (GPR), support vector regression (SVR), and decision trees (DT) to predict the convergence phenomenon during or shortly after the excavation of tunnels. In this respect, a database including 650 datasets (440 for training, 110 for validation, and 100 for test) was gathered from the previously constructed tunnels. In the database, 12 effective parameters on the tunnel convergence and a target of tunnel wall convergence were considered. Both 5-fold and hold-out cross validation methods were used to analyze the predicted outcomes in the ML models. Finally, the DNN method was proposed as the most robust model. Also, to assess each parameter's contribution to the prediction problem, the backward selection method was used. The results showed that the highest and lowest impact parameters for tunnel convergence are tunnel depth and tunnel width, respectively.

COVID-19 Diagnosis from CXR images through pre-trained Deep Visual Embeddings

  • Khalid, Shahzaib;Syed, Muhammad Shehram Shah;Saba, Erum;Pirzada, Nasrullah
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.175-181
    • /
    • 2022
  • COVID-19 is an acute respiratory syndrome that affects the host's breathing and respiratory system. The novel disease's first case was reported in 2019 and has created a state of emergency in the whole world and declared a global pandemic within months after the first case. The disease created elements of socioeconomic crisis globally. The emergency has made it imperative for professionals to take the necessary measures to make early diagnoses of the disease. The conventional diagnosis for COVID-19 is through Polymerase Chain Reaction (PCR) testing. However, in a lot of rural societies, these tests are not available or take a lot of time to provide results. Hence, we propose a COVID-19 classification system by means of machine learning and transfer learning models. The proposed approach identifies individuals with COVID-19 and distinguishes them from those who are healthy with the help of Deep Visual Embeddings (DVE). Five state-of-the-art models: VGG-19, ResNet50, Inceptionv3, MobileNetv3, and EfficientNetB7, were used in this study along with five different pooling schemes to perform deep feature extraction. In addition, the features are normalized using standard scaling, and 4-fold cross-validation is used to validate the performance over multiple versions of the validation data. The best results of 88.86% UAR, 88.27% Specificity, 89.44% Sensitivity, 88.62% Accuracy, 89.06% Precision, and 87.52% F1-score were obtained using ResNet-50 with Average Pooling and Logistic regression with class weight as the classifier.

Automatic Detection of Type II Solar Radio Burst by Using 1-D Convolution Neutral Network

  • Kyung-Suk Cho;Junyoung Kim;Rok-Soon Kim;Eunsu Park;Yuki Kubo;Kazumasa Iwai
    • Journal of The Korean Astronomical Society
    • /
    • v.56 no.2
    • /
    • pp.213-224
    • /
    • 2023
  • Type II solar radio bursts show frequency drifts from high to low over time. They have been known as a signature of coronal shock associated with Coronal Mass Ejections (CMEs) and/or flares, which cause an abrupt change in the space environment near the Earth (space weather). Therefore, early detection of type II bursts is important for forecasting of space weather. In this study, we develop a deep-learning (DL) model for the automatic detection of type II bursts. For this purpose, we adopted a 1-D Convolution Neutral Network (CNN) as it is well-suited for processing spatiotemporal information within the applied data set. We utilized a total of 286 radio burst spectrum images obtained by Hiraiso Radio Spectrograph (HiRAS) from 1991 and 2012, along with 231 spectrum images without the bursts from 2009 to 2015, to recognizes type II bursts. The burst types were labeled manually according to their spectra features in an answer table. Subsequently, we applied the 1-D CNN technique to the spectrum images using two filter windows with different size along time axis. To develop the DL model, we randomly selected 412 spectrum images (80%) for training and validation. The train history shows that both train and validation losses drop rapidly, while train and validation accuracies increased within approximately 100 epoches. For evaluation of the model's performance, we used 105 test images (20%) and employed a contingence table. It is found that false alarm ratio (FAR) and critical success index (CSI) were 0.14 and 0.83, respectively. Furthermore, we confirmed above result by adopting five-fold cross-validation method, in which we re-sampled five groups randomly. The estimated mean FAR and CSI of the five groups were 0.05 and 0.87, respectively. For experimental purposes, we applied our proposed model to 85 HiRAS type II radio bursts listed in the NGDC catalogue from 2009 to 2016 and 184 quiet (no bursts) spectrum images before and after the type II bursts. As a result, our model successfully detected 79 events (93%) of type II events. This results demonstrates, for the first time, that the 1-D CNN algorithm is useful for detecting type II bursts.

An Experimental Study on Feature Ranking Schemes for Text Classification (텍스트 분류를 위한 자질 순위화 기법에 관한 연구)

  • Pan Jun Kim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.1-21
    • /
    • 2023
  • This study specifically reviewed the performance of the ranking schemes as an efficient feature selection method for text classification. Until now, feature ranking schemes are mostly based on document frequency, and relatively few cases have used the term frequency. Therefore, the performance of single ranking metrics using term frequency and document frequency individually was examined as a feature selection method for text classification, and then the performance of combination ranking schemes using both was reviewed. Specifically, a classification experiment was conducted in an environment using two data sets (Reuters-21578, 20NG) and five classifiers (SVM, NB, ROC, TRA, RNN), and to secure the reliability of the results, 5-Fold cross-validation and t-test were applied. As a result, as a single ranking scheme, the document frequency-based single ranking metric (chi) showed good performance overall. In addition, it was found that there was no significant difference between the highest-performance single ranking and the combination ranking schemes. Therefore, in an environment where sufficient learning documents can be secured in text classification, it is more efficient to use a single ranking metric (chi) based on document frequency as a feature selection method.