Estimating the Completeness of Lung Cancer Registry in Ardabil, Iran with a Three-Source Capture-Recapture Method

Cancer registration is an important component of a comprehensive cancer control program, providing timely data and information for research and administrative use. Capture-recapture methods have been used as tools to investigate completeness of cancer registry data. This study aimed to estimate the completeness of lung cancer cases registered in Ardabil Population Based Cancer Registry (APBCR) with a three-source capture-recapture method. Data for all new cases of lung cancer reported by three sources (pathology reports, death certificates, and medical records) to APBCR for 2006 and 2008 were obtained. Duplicate cases shared among the three sources were identified based on similarity of first name, last name and father’s names. A log-linear model was used to estimate number of missed cases and to control for dependency among sources. A total of 218 new cases of lung cancer was reported by three sources after removing duplicates. The estimated completeness calculated by log-linear method was 26.4 for 2006 and 27.1 for 2008. The completeness differed according to gender. In men, the completeness was 26.0% for 2006 and 28.1 for 2008. In women, the completeness was 36.5% for 2006 and 46.9 for 2008. In conclusion, none of the three sources can be considered as a reliable source for accurate cancer incidence estimation.


Introduction
Lung cancer is known as one of the most important public health problems because of its high incidence rate, rapid progression, and poor prognosis (Montazeri et al., 2001). Also, it is the leading cause of death due to cancer in 87 countries in men and 26 countries in women, with the latter largely restricted to high income countries (Torre et al., 2015). Lung cancer is one of the five leading cancers in Iran, and the incidence trend was increasing steadily in both men and women (Hosseini et al., 2009). Accurate cancer incidence data are essential for planning, monitoring and evaluating national and regional cancer control programs (Kamo et al., 2007). The purpose of population based cancer registries is to estimate the cancer burden in the area covered, to observe trends and regional differences and to provide a data base for epidemiological research (Schmidtmann, 2008).
Decision makers in health authorities need to know how reliable the data is on which they base their policies. Therefore, completeness of registration is used as one of the measures of quality of a cancer registry (Schmidtmann, 2008). Completeness is defined as the proportion of incident cancer cases that is registered (Schmidtmann, 2008). Completeness level of cancer registration is one of the main parts of quality control in such registration (Mosavi-Jarrahi et al., 2013). In the literature, several methods are described to evaluate completeness, which are divided in two categories: qualitative methods and quantitative methods. The qualitative methods used were mortality/incidence (M:I) ratios and the proportion of microscopically verified cases and, among quantitative methods, the ones applied were the capture recapture, the death certificates and M:I ratios method, and the flow method (Castro, 2011). Since most cancer registries employ more than one data source for case finding, capture-recapture methods may be used to estimate the number of incident cases in the population, and hence to assess completeness of case ascertainment (Robles et al., 1988). Capture-recapture is the method widely used in wildlife population censuses (Suwanrungruang et al., 2011). Another important application for this method is in epidemiology for estimating prevalence of a particular disease and estimating the completeness of ascertainment of disease registers. However, capture-recapture method can principally be applied to any situation where there are two or even more incomplete lists (Poorolajal et al., 2010). Two assumptions have to be made when using the simple capture-recapture method. Firstly, the sources are independent, and secondly, all individuals within the same source have an equal chance of being included (Parkin and Bray, 2009;Suwanrungruang et al., 2011). The use of capture-recapture methods is very efficient for reducing the costs of disease registration as well as reducing bias in incidence estimations and in the case of comparing population subgroups (Mosavi-Jarrahi et al., 2013).
The pathology-based cancer registry has been established in Ardabil province since 1999. For the first time in Iran, the population based cancer registry established in Ardabil in 2003. The Ardabil cancer registry (ACR) actively collects information of cancer incidence from Pathology-based, hospital-based and death certificates. The main goal of the ACR is to measure cancer incidence and mortality in residents of Ardabil province (Babaei et al., 2010). As there is no screening test for lung cancer and the patients typically were detected in end stage of disease, and also because of its poor prognosis and rapid progression, the incidence cases may be not completely registered in pathology source and may registered only in death certificates or hospital records. So, this study aims to estimate the completeness of lung cancer in every source of registry (including pathology, hospital records and death certificates) and estimate the lung cancer incidence in Ardabil province by three source capture-recapture method.

Materials and Methods
This study was conducted in Ardabil Province which is located in north-west of Iran. All new cases of lung cancer reported by three sources; pathology reports, death certificates and medical records that reported to Ardabil population-based cancer registry in 2006 and 2008 were enrolled in this study. The duplicate cases in every 3 sources of registry were identified and removed using EXCEL software. Some characteristics such as name, surname, father's name, date of birth and ICD codes related to their cancer type were used to identify the common cases among three sources. After identifying the common cases using data linkage, the incidence rate of lung cancer was estimated by the capture-recapture method and log-linear models. To use capture-recapture method, two main assumptions should be considered, sources of information should be independently and all people who are in every data source should have an equal chance to presence in the study (Parkin and Bray, 2009;Suwanrungruang et al., 2011). However, in most human populations and medical Science studies, usually these assumptions are not established and different sources are not independent. So, three source capture-recapture and log-linear model was used to consider the interactions between three sources and estimate the completeness and more accurate incidence rate of lung cancer in Ardabil province. With three registers, there are eight possible combinations of these registers in which cases do or do not appear. The general model uses eight parameters, the common parameter (the logarithm of the number expected to be in all lists), three 'main effects' parameters (the log odds ratios against appearing in each list for cases who appear in the others), three 'two-way interactions' or second order effect parameters (the log odds ratios between pairs of lists for cases who appear in the other), and a 'three-way' interaction parameter. For three registers, A with i levels, B with j levels, C with k levels, the natural logarithm (ln or loge) of expected frequency   where θ is the common parameter , λA, λB, and λC are the main effect parameters, λAB, λAC and λBC are the second order effect (two-way interaction) parameters and λABC is the highest order effect (three-way interaction) parameter. The value of this last three-way interaction parameter cannot be tested from the study data and is assumed to be zero. To assess how the various log-linear models fit the data (model fitting) and select the best model we used the log likelihood-ratio test, also known as G2 or deviance, Akaike's Information Criterion (AIC) and Bayesian Information Criteria (BIC) which they can be expressed as where Obsj is the observed number of individuals in each cell j, and Expji is the expected number of individuals in each cell j under model i.
where the first term, G2, is a measure of how well the model fits the data and the second term,2 [df], is a penalty for the addition of parameters (and hence model complexity).
Where Nobs is the total number of observed individuals. The lower the value of G2, AIC and BIC the better is the fit of the model. AIC is the more appropriate criteria which is used by researchers for model selection (Ho-ok and Regal, 1997;Hook and Regal, 2000;Motevalian et al., 2007). Therefore, we used these criteria for evaluating the goodness of fit. Finally the model with lower amount of the AIC was chosen as the best model. Estimated incidence rate was calculated based on the estimated new cases of lung cancer (by use of selected model in log-linear analysis) at a certain time over the number of the population at risk in Ardabil province at that time. All the incidences are reported based on the incidence per one hundred thousand populations. Also the completeness was calculated by age groups and calendar time, respectively. In all stages of this study the individual's information such as name, surname and other

Results
After investigating and remove duplicate cases between three sources, total 218 new cases of lung cancer were reported to Ardabil population-based cancer registry in 2006 and 2008. The pathology source, hospital records and death certificates were reported 71, 81 and 113 new cases of lung cancer, respectively. Of 218 subjects 163 (74.8%) were male. The mean age of participants was 66.1 (±13.4), 66.3 (±13.3) years for men and 65.5 (±13.9) years for women. Venn diagram shows the common cases between pathology reports, hospital   records and death certificate (Figure 1). In three source capture-recapture analysis with log-linear model, a model in which two sources of pathology and medical records were mutually interdependent and death certificates source were independent was chosen as the best model with the lowest value of Akaike's Information Criterion and Bayesian Information Criterion that were 43.3 and 43.1, respectively (  (Table 2).

Discussion
The completeness of lung cancer in this study was estimated by the capture-recapture method and log-linear models. The mean age of all subjects was 66.1±13.4 years (66.3 ± 13.3 for men and 65.5± 13.9 for women). The age distribution does not show the difference between men and women and is consistent with the average age reported by studies conducted in other parts of Iran that reported average age was 59-67 years (Montazeri et al., 2001;Hosseini et al., 2009). In log-linear analysis, we select the best model using AIC, BIC and G2 statistics. The model where sources pathology and hospital records are dependent and independent of the source death certificates was selected in this study. Also, the dependency between pathology reports and hospital records in selected model is seems logical with the description of this relationship in situations that what happens really in society. The completeness of registration based on population based cancer registry in Ardabil province, after removing duplicate cases between pathology reports, hospital records and death certificates, for 2006 and 2008 years was 26.4% and 27%, respectively. Also the completeness of registration in men and women, after removing the duplicates between three sources, in 2006 was 26.01% and 36.54%, and in 2008 was 28.08% and 46.87%, respectively. The estimated completeness in log-linear analysis for the completeness of cancer registries in our study is much lower than other countries that reported 96 % to 99.6% for all types of cancers overally (Robles et al., 1988;Crocetti et al., 2001;Gajalakshmi et al., 2001) and also for lung cancer, (Inger Kristin Larsen et al., 2009)  There is no evidence for completeness of lung cancer registry in other parts of iran, but the similar study that conducted in Tehran with capture recapture method for evaluations of cancer registry system indicated that the completeness of gastric and esophagus cancer registry was 29.9% and 30.8%,respectively (Aghaei et al., 2013) and also in Ardabil province of Iran, the completeness of registration for gastric and esophagus cancer was 36.35% and 37.76% (Khodadost et al., 2014). This result is consistent with our study and confirmed that the quality of cancer registry in Iran is highly inappropriate and need to more attention to improve it.