• Title/Summary/Keyword: Cross-Validation Approach

Search Result 130, Processing Time 0.025 seconds

Integrating Discrete Wavelet Transform and Neural Networks for Prostate Cancer Detection Using Proteomic Data

  • Hwang, Grace J.;Huang, Chuan-Ching;Chen, Ta Jen;Yue, Jack C.;Ivan Chang, Yuan-Chin;Adam, Bao-Ling
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.319-324
    • /
    • 2005
  • An integrated approach for prostate cancer detection using proteomic data is presented. Due to the high-dimensional feature of proteomic data, the discrete wavelet transform (DWT) is used in the first-stage for data reduction as well as noise removal. After the process of DWT, the dimensionality is reduced from 43,556 to 1,599. Thus, each sample of proteomic data can be represented by 1599 wavelet coefficients. In the second stage, a voting method is used to select a common set of wavelet coefficients for all samples together. This produces a 987-dimension subspace of wavelet coefficients. In the third stage, the Autoassociator algorithm reduces the dimensionality from 987 to 400. Finally, the artificial neural network (ANN) is applied on the 400-dimension space for prostate cancer detection. The integrated approach is examined on 9 categories of 2-class experiments, and also 3- and 4-class experiments. All of the experiments were run 10 times of ten-fold cross-validation (i. e. 10 partitions with 100 runs). For 9 categories of 2-class experiments, the average testing accuracies are between 81% and 96%, and the average testing accuracies of 3- and 4-way classifications are 85% and 84%, respectively. The integrated approach achieves exciting results for the early detection and diagnosis of prostate cancer.

  • PDF

COVID-19 Diagnosis from CXR images through pre-trained Deep Visual Embeddings

  • Khalid, Shahzaib;Syed, Muhammad Shehram Shah;Saba, Erum;Pirzada, Nasrullah
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.175-181
    • /
    • 2022
  • COVID-19 is an acute respiratory syndrome that affects the host's breathing and respiratory system. The novel disease's first case was reported in 2019 and has created a state of emergency in the whole world and declared a global pandemic within months after the first case. The disease created elements of socioeconomic crisis globally. The emergency has made it imperative for professionals to take the necessary measures to make early diagnoses of the disease. The conventional diagnosis for COVID-19 is through Polymerase Chain Reaction (PCR) testing. However, in a lot of rural societies, these tests are not available or take a lot of time to provide results. Hence, we propose a COVID-19 classification system by means of machine learning and transfer learning models. The proposed approach identifies individuals with COVID-19 and distinguishes them from those who are healthy with the help of Deep Visual Embeddings (DVE). Five state-of-the-art models: VGG-19, ResNet50, Inceptionv3, MobileNetv3, and EfficientNetB7, were used in this study along with five different pooling schemes to perform deep feature extraction. In addition, the features are normalized using standard scaling, and 4-fold cross-validation is used to validate the performance over multiple versions of the validation data. The best results of 88.86% UAR, 88.27% Specificity, 89.44% Sensitivity, 88.62% Accuracy, 89.06% Precision, and 87.52% F1-score were obtained using ResNet-50 with Average Pooling and Logistic regression with class weight as the classifier.

Comparative Evaluation among Different Kriging Techniques applied to GOSAT CO2 Map for North East Asia (GOSAT 기반의 동북아시아 CO2 분포도에 적용된 크리깅 기법의 비교평가)

  • Choi, Jin Ho;Um, Jung-Sup
    • Journal of Environmental Impact Assessment
    • /
    • v.20 no.6
    • /
    • pp.879-890
    • /
    • 2011
  • The GOSAT (Greenhouse gases Observing SATellite) data provide new opportunities the most regionally complete and up-to-date assessment of $CO_2$. However, in practice, GOSAT records often suffer from missing data values mainly due to unfavorable meteorological condition in specific time periods of data acquisition. The aim of this research was to identify optimal spatial interpolation techniques to ensure the continuity of $CO_2$ from samples taken in the North East Asia. The accuracy among ordinary kriging (OK), universal kriging (UK) and simple kriging (SK) was compared based on the combined consideration of $R^2$ values, Root Mean Square Error (RMSE), Mean Error (ME) for variogram models. Cross validation for 1312 random sampling points indicate that the (UK) kriging is the best geostatistical method for spatial predictions of $CO_2$ in the East Asia region. The results from this study can be useful for selecting optimal kriging algorithm to produce $CO_2$ map of various landscapes. Also, data users may benefit from a statistical approach that would allow them to better understand the uncertainty and limitations of the GOSAT sample data.

Non-destructive assessment of the three-point-bending strength of mortar beams using radial basis function neural networks

  • Alexandridis, Alex;Stavrakas, Ilias;Stergiopoulos, Charalampos;Hloupis, George;Ninos, Konstantinos;Triantis, Dimos
    • Computers and Concrete
    • /
    • v.16 no.6
    • /
    • pp.919-932
    • /
    • 2015
  • This paper presents a new method for assessing the three-point-bending (3PB) strength of mortar beams in a non-destructive manner, based on neural network (NN) models. The models are based on the radial basis function (RBF) architecture and the fuzzy means algorithm is employed for training, in order to boost the prediction accuracy. Data for training the models were collected based on a series of experiments, where the cement mortar beams were subjected to various bending mechanical loads and the resulting pressure stimulated currents (PSCs) were recorded. The input variables to the NN models were then calculated by describing the PSC relaxation process through a generalization of Boltzmannn-Gibbs statistical physics, known as non-extensive statistical physics (NESP). The NN predictions were evaluated using k-fold cross-validation and new data that were kept independent from training; it can be seen that the proposed method can successfully form the basis of a non-destructive tool for assessing the bending strength. A comparison with a different NN architecture confirms the superiority of the proposed approach.

Numerical study on the characteristics of the flow through injector orifice by multi-block computations (다중블럭계산에 의한 분사기 오리피스 유동특성 해석)

  • Kim, Yeong-Mok
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.21 no.3
    • /
    • pp.414-426
    • /
    • 1997
  • Numerical computations were conducted to characterize the three-dimensional laminar flow through an injector orifice having an inclined angle of 30 .deg.. For this study, the incompressible Navier-Stokes equations in generalized curvilinear coordinates, using a pseudocompressibility approach for continuity equation, were solved. The computations were performed using the finite difference implicit, approximately factored scheme of Beam and Warming and multi-block grids of complete continuity at block interfaces. The multi-block computations were validated for the steady state using direct comparison of multi-block solutions with equivalent single-block ones, including 2-D 180.deg. TAD and 3-D 90.deg. pipe bend. The comparisons between the numerical solutions and the flow field measurements for a tube with sudden contraction were presented in this work for solution validation. Computational results showed the nature of complex flow fields within the inclined injector orifice, including strong pressure-driven secondary flows in the cross stream induced by the effect of streamline curvature. In addition, asymmetric secondary flows were induced in the Reynolds number range above assumed laminar flow regime considered. However, turbulence calculations and grid dependency studies are needed for more accurate computations.

Quantile regression using asymmetric Laplace distribution (비대칭 라플라스 분포를 이용한 분위수 회귀)

  • Park, Hye-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1093-1101
    • /
    • 2009
  • Quantile regression has become a more widely used technique to describe the distribution of a response variable given a set of explanatory variables. This paper proposes a novel modelfor quantile regression using doubly penalized kernel machine with support vector machine iteratively reweighted least squares (SVM-IRWLS). To make inference about the shape of a population distribution, the widely popularregression, would be inadequate, if the distribution is not approximately Gaussian. We present a likelihood-based approach to the estimation of the regression quantiles that uses the asymmetric Laplace density.

  • PDF

Weight Estimation of the Sea Cucumber (Stichopus japonicas) using Vision-based Volume Measurement

  • Lee, Donggil;Kim, Seonghoon;Park, Miseon;Yang, Yongsu
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.6
    • /
    • pp.2154-2161
    • /
    • 2014
  • Growth analysis and selection of sea cucumbers (Stichopus japonicas) is typically performed through length or weight measurements. However, because sea cucumbers continuously change shape depending on the external environment, weight measurement has been the preferred approach. Weight measurements require extensive time and labor, moreover it is often difficult to accurately weigh sea cucumbers because of their wet surface. The present study measured sea cucumber features, including the body length, width, and thickness, by using a vision system and regression analysis to generate $R^2$ values that were used to develop a weight estimation algorithm. The $R^2$ value between the actual volume and weight of the sea cucumbers was 0.999, which was relatively high. Evaluation of the performance of this algorithm using cross-validation showed that the root mean square error and worst-case prediction error were 1.434 g and ${\pm}5.879g$, respectively. In addition, the present study confirmed that the proposed weight estimation algorithm and single slide rail device for weight measurement can measure weights at approximately 4,500 sea cucumbers per hour.

Updating finite element model using dynamic perturbation method and regularization algorithm

  • Chen, Hua-Peng;Huang, Tian-Li
    • Smart Structures and Systems
    • /
    • v.10 no.4_5
    • /
    • pp.427-442
    • /
    • 2012
  • An effective approach for updating finite element model is presented which can provide reliable estimates for structural updating parameters from identified operational modal data. On the basis of the dynamic perturbation method, an exact relationship between the perturbation of structural parameters such as stiffness change and the modal properties of the tested structure is developed. An iterative solution procedure is then provided to solve for the structural updating parameters that characterise the modifications of structural parameters at element level, giving optimised solutions in the least squares sense without requiring an optimisation method. A regularization algorithm based on the Tikhonov solution incorporating the generalised cross-validation method is employed to reduce the influence of measurement errors in vibration modal data and then to produce stable and reasonable solutions for the structural updating parameters. The Canton Tower benchmark problem established by the Hong Kong Polytechnic University is employed to demonstrate the effectiveness and applicability of the proposed model updating technique. The results from the benchmark problem studies show that the proposed technique can successfully adjust the reduced finite element model of the structure using only limited number of frequencies identified from the recorded ambient vibration measurements.

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

Mapping Biodiversity throughoptimized selection of input variables in decision tree models (의사결정나무 변수 선정 방법을 적용한 대축적 생물다양성 지도 구축)

  • Kim, Do Yeon;Heo, Joon;Kim, Chang Jae
    • Journal of Environmental Impact Assessment
    • /
    • v.20 no.5
    • /
    • pp.663-673
    • /
    • 2011
  • In the face of accelerating biodiversity loss and its significance in our coexistence with nature, biodiversity is becoming more crucial in sustainable development perspective. To estimate biodiversity in the future which provides valuable information for decision making system especially in the national level, a quantitative approach must be studied forehand as a baseline of the present status. In this study, we developed a large-scale map of Plant Species Richness (PSR, typical indicator of biodiversity) for Young-dong and Pyung-chang provinces. Due to the accessibility of appropriate data and advance of modelling techniques, reduction of variables without deteriorating the predictive power is considered by applying Genetic algorithm. In addition, a number of Correctly Classified Instances (CCI) with 10-fold cross validation which indicates the predictive power, was carried out for evaluation. This study, as a fundamental baseline, will be beneficial in future land work as well as ecosystem restoration business or other relevant decision making agenda.