External Validation of a Clinical Scoring System for Hematuria

BACKGROUND
The aim of this study was to evaluate the accuracy of a new scoring system in Korean patients with hematuria at high risk of bladder cancer.


MATERIALS AND METHODS
A total of 319 consecutive patients presenting with painless hematuria without a history of bladder cancer were analyzed, from the period of August 2012 to February 2014. All patients underwent clinical examination, and 22 patients with incomplete data were excluded from the final validation data set. The scoring system included four clinical parameters: age (≥50 = 2 vs. <50 =1), gender (male = 2 vs. female = 1), history of smoking (smoker/ex-smoker = 4 vs. non-smoker = 2) and nature of the hematuria (gross = 6 vs. microscopic = 2).


RESULTS
The area under the receiver-operating characteristic curve (95% confidence interval) of the scoring system was 0.718 (0.655-0.777). The calibration plot demonstrated a slight underestimation of bladder cancer probability, but the model had reasonable calibration. Decision curve analysis revealed that the use of model was associated with net benefit gains over the treat-all strategy. The scoring system performed well across a wide range of threshold probabilities (15%-45%).


CONCLUSIONS
The scoring system developed is a highly accurate predictive tool for patients with hematuria. Although further improvements are needed, utilization of this system may assist primary care physicians and other healthcare practitioners in determining a patient's risk of bladder cancer.


Introduction
Hematuria is the most common symptom presenting in patients with bladder cancer.Since screening is not currently recommended, most patients are diagnosed after presenting with hematuria to their clinician.Roughly 4% of the patients with microscopic hematuria and up to 40% of the patients with gross hematuria could be harboring malignancy (Grossfeld et al., 2001).However, the prevalence of urinary tract cancer in the general population is low, translating to a low prior probability of disease and its lack of screening.Moreover, although American Urological Association (AUA) best practice policy recommendations and the most recent AUA guideline include urine testing, imaging, and cystoscopy (Grossfeld et al., 2001;Davis et al., 2012), these recommendations do not perform well in identifying which patients were most likely to have malignant tumors.Therefore, alternative criteria are needed for better identification of patients who truly require further evaluation to effectively and efficiently manage patients with hematuria.
It has been demonstrated that an individual patient's risk for bladder cancer depends on several factors, including age, tobacco use and chemical exposures (Nielsen et al., 2007).However, few models have been developed to quantify individual risk of having bladder cancer after presenting with hematuria (Nielsen et al.,

External Validation of a Clinical Scoring System for Hematuria
Seung Bae Lee 1 , Hyung Suk Kim 2 , Myong Kim 2 , Ja Hyeon Ku 2 * 2007; Davis et al., 2012).Furthermore, as the existing models rely on nuclear matrix protein-22 (NMP22) (Lotan et al., 2009) or immunocytology (Cha et al., 2012a), they are not practical for a daily clinical environment.Recently, Tan et al. (2013) developed a new scoring system using four clinical variables (age, gender, smoking status, and nature of the hematuria) to stratify patients with hematuria into high or low risk of having bladder cancer.
The aim of this study was to evaluate the accuracy of this scoring system in Korean patients with hematuria.

Materials and Methods
This study was carried out after obtaining approval from the institutional review board of Seoul National University Hospital.Patients visiting our institution from August 2012 to February 2014 were included in this study, and 319 consecutive patients presenting with painless hematuria without a history of bladder cancer were analyzed.Patient information such as age, gender, smoking history, and nature of the hematuria were recorded.All patients underwent clinical examination, including CT urography and cystourethroscopy, with biopsy of any suspicious lesions.Microscopic hematuria was defined as three or more erythrocytes visible per highpower field under white-light microscopy.Mid-stream urine specimens were collected, immediately processed, and examined cytologically.Cytologic examination was performed by trained personnel.A total of 22 patients were excluded from the final validation data set due to incomplete data.Consequently, 297 patients were available for final analyses.The demographic data for the external validation cohort is shown in Table 1.
The overall predictive values of the scoring system were compared with several criteria.Discrimination refers to the ability of the risk prediction model to distinguish those with the event from those without the event.Our measure of discrimination was area under the receiver-operating characteristic (ROC) curve.A score of 1 suggests that the model can perfectly discriminate between the patients who have bladder cancer and those who do not, while a score of 0.5 indicates that the model has no discriminative ability.The total area under the ROC curve estimates were internally validated using 500 bootstrap samples.Calibration refers to how closely the predicted probabilities reflect the actual risk.General calibration was assessed using a calibration plot.The relationship between the model-derived and actuarial outcome was graphically explored within the calibration plots to explore performance of the model.The validation was performed using 200 bootstrap resamples to decrease overfit bias.Finally, decision curve analysis (DCA) was used to explore the clinical value of the model (Vickers and Elkin, 2006).DCA is a method for evaluating the clinical net benefit of prediction models; one sums the benefits (true positives) and subtracts the harms (false positives).
For all statistical analyses, a two-sided p<0.05 was regarded as significant.Models, statistics, and figures were prepared using R 2.13.2 (http://www.cran.r-project.org).

Results
The bootstrap-corrected predictive accuracy of the model was good (Figure 1).The area under the ROC curve (95% confidence interval) of the scoring system was 0.718 (0.655-0.777).In our validation cohort, a score of 12 was found to be the best score threshold (Table 2).By using the cut-off value of 12, the scoring system had a sensitivity of 69.6% and a specificity of 69.7%.The positive predictive and negative predictive values were 45.5% and 86.4%, respectively.
To assess the agreement between the predicted and actual outcomes, a calibration curve was generated (Figure 2).The dashed diagonal line represents the performance of an ideal model, where the predicted outcome would correspond perfectly with the actual outcome.The performance of the current model is plotted as the solid line.The calibration plot demonstrated a slight underestimation of bladder cancer probability, but the model was reasonably calibrated.Although the calibration curve did not perfectly match the line of identity (the line at a 45 o angle), the deviation was pictorially minimal, as the solid line was close to the dashed line of the ideal model.
Figure 3 presents the result of the DCA.DCA revealed that the use of the model was associated with net benefit gains over the treat-all strategy.The scoring system performed well across a wide range of threshold probabilities (15%-45%).

Discussion
In 2012, bladder cancer accounted for 7% of new cancer diagnoses and 3% of cancer deaths in American men (Siegel et al., 2012).Approximately 25% of the patients with bladder cancer present with muscle-invasive diseases at diagnosis.Early detection and timely treatment of the cancer are critical, since delays in diagnosis and treatment have been shown to adversely affect survival in patients with bladder cancer (Gore et al., 2009).Traditional screening tools (e.g., cystoscopy) are not practical for population-based screening because they are invasive and not cost-efficient (Botteman et al., 2003).In addition, the procedure has been associated with pain or discomfort in about a third of cases (Van Der Aa et al., 2008).Due to concerns about the number of unnecessary diagnostic evaluations as well as the low prevalence of bladder cancer in the general population (Sutton, 1990), screening is not currently recommended.
Presenting with the clinical sign of hematuria is not trivial, as most patients with bladder cancer are only diagnosed after the development of this symptom.Hematuria is a common urological presentation, estimated to constitute 4-20% of all urological visits (Mariani et al., 1989).In the adult population, it can have different causes, such as urinary tract infections, urolithiasis, benign prostatic hyperplasia, urologic malignancies and secondary tumor invasion from other site (ex.Cervical cancer).In gynecologic section, it was demonstrated that hematuria can be used as a screening test to detect urinary bladder mucosal infiltration of cervical cancer (Chuttiangtum et al., 2012).It has been reported that Urological cancer is found in approximately 5% of the patients presenting with microscopic hematuria, and in around 20% of the patients with gross hematuria (Khadra et al., 2000;Cohen and Brown, 2003).Therefore, screening for bladder cancer may have different performance characteristics in selected high-risk populations.In other areas of urology, e.g.patients with raised prostate-specific antigen levels, sophisticated algorithms and nomograms have been developed that stratify the patient's risk of cancer.With this information available, some patients avoid a prostate biopsy whilst other urological conditions can be identified and managed.
However, little research has been carried out on the investigation of hematuria (Khadra et al., 2000;Lotan et al., 2009;Cha et al., 2012a).Using age, gender, race, smoking history, hematuria, and NMP22 findings, Lotan et al (Lotan et al., 2009) developed a logistic regression model-based nomogram to predict the presence of bladder cancer in high-risk patients.External validation of this nomogram gave an AUC of more than 80%.Cha et al (Cha et al., 2012a) developed another nomogram with an extremely high level of accuracy (bootstrap-corrected AUC: 90.8%) in patients with painless hematuria.This nomogram incorporated immunocytology into a multivariable prediction model for the detection of bladder cancer.Cha et al. (2012b) also developed a highly accurate, well-calibrated nomogram to predict the individual risk of bladder cancer for a patient presenting with asymptomatic hematuria.In this nomogram, age, gender, smoking history, degree of hematuria, and urine cytology define the risk.Although these models have been shown to be highly accurate, they are difficult to apply in a daily clinical environment.Recently, it has been tried in women with hematuria to evaluate the impact of reproductive factors (menopausal status, parity, age at first delivery and age at the last delivery) on the prediction of the risk for developing bladder cancer (Yavuzcan et al., 2013).
Tan's scoring system demonstrated adequate discrimination in the first study (area under the ROC curve=80.4%)(Tan et al., 2013).Nevertheless, this model needs to be validated in a secondary dataset.Such external validation is essential, as a predictive tool is only useful if it is both accurate and generally applicable.The aim of the present study was to externally validate the previously developed model in the prediction of bladder cancer in a patient cohort with hematuria.We report the first external validation of this model in a contemporary single institution cohort of patients.Good predictability was obtained by the clinical scoring system in this study; the accuracy of the model was 71.8%.The correspondence seen between the actual and ideal model predictions by the calibration plot suggested good calibration of the scoring system in the validation cohort.Furthermore, the model demonstrated a meaningful net benefit gain.In addition, we observed a score of 12 as the best score threshold in our validation cohort.Tan et al. (2013) reported that a threshold figure of 10 gives the scoring system a sensitivity of 90% and a specificity of 55.7%.Using the suggested cut-off value of 10 in our cohort, the scoring system had a sensitivity of 86.1% but a specificity of only 36.7%, while the cut-off value of 12 yielded a sensitivity of 69.6% and a specificity of 69.7%.
In general, model performance tends to be lower when external validation is performed, for several reasons.First, the construction of a valid model relies on the appropriate selection of variables for the analysis.For example, smoking history was categorized as "current", "history of" or "never".Quantification of tobacco use could have improved the accuracy of the model.Second, undoubtedly, it is possible that inclusion of other variables might have improved the performance of the model.Therefore, addition of occupational or other chemical exposures analysis, the y-axis measures net benefit, calculated by summing the benefits (true positives) and subtracting the harms (false positives).The straight line represents the assumption that all patients will have bladder cancer, and the horizontal line represents the assumption that no patients will have bladder cancer.The dotted line indicates the net benefit of using the new model would also likely improve the model's predictive accuracy.Third, in Tan's scoring system, the cut-off age of 50 years was chosen.Although many studies have considered ages above this to be a risk factor for bladder cancer (Messing et al., 1987;Alishahi et al., 2002;Madeb and Messing, 2008), Shariat et al (Shariat et al., 2010) demonstrated that the incidence of bladder cancer increases more significantly after 65 years of age.Finally, combinations of molecular markers may improve the prediction of bladder cancer in patients with hematuria.The ability of biomarker data to improve model predictions has already been confirmed (Shariat et al., 2008).However, the limited availability of data on such molecular markers might restrict the widespread application of such models.In addition, since racial variation in the expression of molecular markers may also be present, validation of the models would need to be performed in research including different races.
The study was limited by several factors.Data management was retrospectively assessed for analysis, which is the main drawback.Because 22 patients without complete clinical information or whose information was not available had to be excluded, a possible selection bias may have been introduced.Finally, this study set came from a single institution and included a relatively small number of patients.However, bootstrap corrected predictive accuracy reduces overfit bias, offers the possibility of internal validation without sample size limitations and provides the most bias-free estimates of discrimination properties.
In conclusion, the scoring system proved to be a highly accurate predictive tool for patients with hematuria.This routine checkup for age, gender, history of smoking, and nature of the hematuria is universally applicable, adds no additional costs, and is not a time-consuming process.These results encourage the application of this scoring system for predicting bladder cancer.Although further improvements are needed, utilization of this scoring system may assist primary care physicians and other healthcare practitioners in determining a patient's risk of bladder cancer.

Figure 2 .
Figure 2. Calibration Plot.The calibration curve demonstrates the relation between the predicted and observed bladder cancer rates.The diagonal line represents the performance of an ideal model.The solid line represents the actual model performance that compares the predicted and observed bladder cancer rates (using 200 bootstrap samples).Points estimated below the diagonal line correspond to over prediction, whereas points situated above the diagonal line correspond to under prediction

Figure 3 .
Figure 3. Decision Curve Analysis.In decision curve