• 제목/요약/키워드: Validation data set

검색결과 381건 처리시간 0.023초

Prediction of Tumor Progression During Neoadjuvant Chemotherapy and Survival Outcome in Patients With Triple-Negative Breast Cancer

  • Heera Yoen;Soo-Yeon Kim;Dae-Won Lee;Han-Byoel Lee;Nariya Cho
    • Korean Journal of Radiology
    • /
    • 제24권7호
    • /
    • pp.626-639
    • /
    • 2023
  • Objective: To investigate the association of clinical, pathologic, and magnetic resonance imaging (MRI) variables with progressive disease (PD) during neoadjuvant chemotherapy (NAC) and distant metastasis-free survival (DMFS) in patients with triple-negative breast cancer (TNBC). Materials and Methods: This single-center retrospective study included 252 women with TNBC who underwent NAC between 2010 and 2019. Clinical, pathologic, and treatment data were collected. Two radiologists analyzed the pre-NAC MRI. After random allocation to the development and validation sets in a 2:1 ratio, we developed models to predict PD and DMFS using logistic regression and Cox proportional hazard regression, respectively, and validated them. Results: Among the 252 patients (age, 48.3 ± 10.7 years; 168 in the development set; 84 in the validation set), PD was occurred in 17 patients and 9 patients in the development and validation sets, respectively. In the clinical-pathologic-MRI model, the metaplastic histology (odds ratio [OR], 8.0; P = 0.032), Ki-67 index (OR, 1.02; P = 0.044), and subcutaneous edema (OR, 30.6; P = 0.004) were independently associated with PD in the development set. The clinical-pathologic-MRI model showed a higher area under the receiver-operating characteristic curve (AUC) than the clinical-pathologic model (AUC: 0.69 vs. 0.54; P = 0.017) for predicting PD in the validation set. Distant metastases occurred in 49 patients and 18 patients in the development and validation sets, respectively. Residual disease in both the breast and lymph nodes (hazard ratio [HR], 6.0; P = 0.005) and the presence of lymphovascular invasion (HR, 3.3; P < 0.001) were independently associated with DMFS. The model consisting of these pathologic variables showed a Harrell's C-index of 0.86 in the validation set. Conclusion: The clinical-pathologic-MRI model, which considered subcutaneous edema observed using MRI, performed better than the clinical-pathologic model for predicting PD. However, MRI did not independently contribute to the prediction of DMFS.

Validation of RELAP5 MOD3.3 code for Hybrid-SIT against SET and IET experimental data

  • Yoon, Ho Joon;Al Naqbi, Waleed;Al-Yahia, Omar S.;Jo, Daeseong
    • Nuclear Engineering and Technology
    • /
    • 제52권9호
    • /
    • pp.1926-1938
    • /
    • 2020
  • We validated the performance of RELAP MOD3.3 code regarding the hybrid SIT with available experimental data. The concept of the hybrid SIT is to connect the pressurizer to SIT to utilize the water inside SIT in the case of SBO or SB-LOCA combined with TLOFW. We investigated how well RELAP5 code predicts the physical phenomena in terms of the equilibrium time, stratification, condensation against Separate Effect Test (SET) data. We also conducted the validation of RELAP5 code against Integrated Effect Test (IET) experimental data produced by the ATLAS facility. We followed conventional approach for code validation of IET data, which are pre-test and post-test calculation. RELAP5 code shows substantial difference with changing number of nodes. The increase of the number of nodes tends to reduce the condensation rate at the interface between liquid and vapor inside the hybrid SIT. The environmental heat loss also contributes to the large discrepancy between the simulation results of RELAP5 and the experimental data.

Developing a Molecular Prognostic Predictor of a Cancer based on a Small Sample

  • Kim Inyoung;Lee Sunho;Rha Sun Young;Kim Byungsoo
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.195-198
    • /
    • 2004
  • One Important problem in a cancer microarray study is to identify a set of genes from which a molecular prognostic indicator can be developed. In parallel with this problem is to validate the chosen set of genes. We develop in this note a K-fold cross validation procedure by combining a 'pre-validation' technique and a bootstrap resampling procedure in the Cox regression . The pre-validation technique predicts the microarray predictor of a case without having seen the true class level of the case. It was suggested by Tibshirani and Efron (2002) to avoid the possible over-fitting in the regression in which a microarray based predictor is employed. The bootstrap resampling procedure for the Cox regression was proposed by Sauerbrei and Schumacher (1992) as a means of overcoming the instability of a stepwise selection procedure. We apply this K-fold cross validation to the microarray data of 92 gastric cancers of which the experiment was conducted at Cancer Metastasis Research Center, Yonsei University. We also share some of our experience on the 'false positive' result due to the information leak.

  • PDF

Robust Cross Validation Score

  • Park, Dong-Ryeon
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.413-423
    • /
    • 2005
  • Consider the problem of estimating the underlying regression function from a set of noisy data which is contaminated by a long tailed error distribution. There exist several robust smoothing techniques and these are turned out to be very useful to reduce the influence of outlying observations. However, no matter what kind of robust smoother we use, we should choose the smoothing parameter and relatively less attention has been made for the robust bandwidth selection method. In this paper, we adopt the idea of robust location parameter estimation technique and propose the robust cross validation score functions.

Comparison of EKF and UKF on Training the Artificial Neural Network

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권2호
    • /
    • pp.499-506
    • /
    • 2004
  • The Unscented Kalman Filter is known to outperform the Extended Kalman Filter for the nonlinear state estimation with a significance advantage that it does not require the computation of Jacobian but EKF has a competitive advantage to the UKF on the performance time. We compare both algorithms on training the artificial neural network. The validation data set is used to estimate parameters which are supposed to result in better fitting for the test data set. Experimental results are presented which indicate the performance of both algorithms.

  • PDF

Prediction of the compressive strength of fly ash geopolymer concrete using gene expression programming

  • Alkroosh, Iyad S.;Sarker, Prabir K.
    • Computers and Concrete
    • /
    • 제24권4호
    • /
    • pp.295-302
    • /
    • 2019
  • Evolutionary algorithms based on conventional statistical methods such as regression and classification have been widely used in data mining applications. This work involves application of gene expression programming (GEP) for predicting compressive strength of fly ash geopolymer concrete, which is gaining increasing interest as an environmentally friendly alternative of Portland cement concrete. Based on 56 test results from the existing literature, a model was obtained relating the compressive strength of fly ash geopolymer concrete with the significantly influencing mix design parameters. The predictions of the model in training and validation were evaluated. The coefficient of determination ($R^2$), mean (${\mu}$) and standard deviation (${\sigma}$) were 0.89, 1.0 and 0.12 respectively, for the training set, and 0.89, 0.99 and 0.13 respectively, for the validation set. The error of prediction by the model was also evaluated and found to be very low. This indicates that the predictions of GEP model are in close agreement with the experimental results suggesting this as a promising method for compressive strength prediction of fly ash geopolymer concrete.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Recovery the Missing Streamflow Data on River Basin Based on the Deep Neural Network Model

  • Le, Xuan-Hien;Lee, Giha
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2019년도 학술발표회
    • /
    • pp.156-156
    • /
    • 2019
  • In this study, a gated recurrent unit (GRU) network is constructed based on a deep neural network (DNN) with the aim of restoring the missing daily flow data in river basins. Lai Chau hydrological station is located upstream of the Da river basin (Vietnam) is selected as the target station for this study. Input data of the model are data on observed daily flow for 24 years from 1961 to 1984 (before Hoa Binh dam was built) at 5 hydrological stations, in which 4 gauge stations in the basin downstream and restoring - target station (Lai Chau). The total available data is divided into sections for different purposes. The data set of 23 years (1961-1983) was employed for training and validation purposes, with corresponding rates of 80% for training and 20% for validation respectively. Another data set of one year (1984) was used for the testing purpose to objectively verify the performance and accuracy of the model. Though only a modest amount of input data is required and furthermore the Lai Chau hydrological station is located upstream of the Da River, the calculated results based on the suggested model are in satisfactory agreement with observed data, the Nash - Sutcliffe efficiency (NSE) is higher than 95%. The finding of this study illustrated the outstanding performance of the GRU network model in recovering the missing flow data at Lai Chau station. As a result, DNN models, as well as GRU network models, have great potential for application within the field of hydrology and hydraulics.

  • PDF

대용량 훈련 데이타의 점진적 학습에 기반한 얼굴 검출 방법 (Face Detection Based on Incremental Learning from Very Large Size Training Data)

  • 박지영;이준호
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권7호
    • /
    • pp.949-958
    • /
    • 2004
  • 본 연구는 대용량 훈련 데이타를 사용하는 얼굴 검출 분류기의 학습과정에서 새로운 데이터의 추가 학습이 가능한 새로운 방법을 제안한다. 추가되는 데이타로부터 새로운 정보를 학습하여 이미 습득된 기존의 지식을 갱신하는 것이 점진적 학습의 목표이다. 이러한 학습 기법에 기반한 분류기의 설계에서는 최종 분류기가 전체 훈련 데이타 집합의 특성을 반영하는 것이 매우 중요한 문제이다. 제안하는 알고리즘은 최적화된 최종 분류기 획득을 위하여 훈련 집합의 전역적인 특성을 대표하는 검증집합을 생성하고, 이 집단 내에서의 분류성능을 기준으로 중간단계 분류기들의 가중치를 결정한다. 각 중간단계 분류기는 개변 데이타 집합의 학습 결과로써 가중치 기반 결합 방식에 의해 최종 분류기로 구성된다. 반복적인 실험을 통해, 제안한 알고리즘을 사용하여 학습한 얼굴 검출 분류기의 성능이 AdaBoost 및 Learn++기반의 분류기보다 우수한 검출 성능을 보임을 확인하였다.

Assessment of the Near Real-Time Validation for the AQUA Satellite Level-2 Observation Products

  • Yang Min-Sil;Lee Jeongsoon;Lee Chol;Park Jong-Seo;Kim Hee-Ah
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.35-38
    • /
    • 2004
  • We developed a Near Real-Time Validation System (NRVS) for the Level-2 Products of AQUA Satellite. AQUA satellite is the second largest project of Earth Observing System (EOS) mission of NASA. This satellite provides the information of water cycle of the entire earth with many different forms. Among its products, we have used five kinds of level-2 geophysical parameters containing rain rate, sea surface wind speed, skin surface temperature, atmospheric temperature profile, and atmospheric humidity profile. To use these products in a scientific purpose, reasonable quantification is indispensable. In this paper we explain the near real-time validation system process and its detail algorithm. Its simulation results are also analyzed in a quantitative way. As reference data set in-situ measured meteorological data which are periodically gathered and provided by the Korea Meteorological Administration (KMA) is processed. Not only site-specific analysis but also time-series analysis of the validation results are explained and detail algorithms are described.

  • PDF