Comparison of Bayesian Spatial Ecological Regression Models for Investigating the Incidence of Breast Cancer in Iran, 2005-2008

Background: Breast cancer is the most prevalent kind of cancer among women in Iran. Regarding the importance of cancer prevention and considerable variation of breast cancer incidence in different parts of the country, it is necessary to recognize regions with high incidence of breast cancer and evaluate the role of potential risk factors by use of advanced statistical models. The present study focussed on incidence of breast cancer in Iran at the province level and also explored the impact of some prominent covariates using Bayesian models. Materials and Methods: All patients diagnosed with breast cancer in Iran from 2005 to 2008 were included in the study. Smoking, fruit and vegetable intake, physical activity, obesity and the Human Development Index (HDI), measured at the province level, were considered as potential modulating factors. Gamma-Poisson, log normal and BYM models were used to estimate the relative risk of breast cancer in this ecological investigation with and without adjustment for the covariates. Results: The unadjusted BYM model had the best fit among applied models. Without adjustment, Isfahan, Yazd, and Tehran had the highest incidences and Sistan-Baluchestan and Chaharmahal-Bakhtiari had the lowest. With the adjusted model, Khorasan-Razavi, Lorestan and Hamedan had the highest and Ardebil and Kohgiluyeh-Boyerahmad the lowest incidences. A significantly direct association was found between breast cancer incidence and HDI. Conclusions: BYM model has better fit, because it contains parameters that allow including effects from neighbors. Since HDI is a significant variable, it is also recommended that HDI should be considered in future investigations. This study showed that Yazd, Isfahan and Tehran provinces feature the highest crude incidences of breast cancer.


Introduction
analysis is an epidemiologically discovering method and investigates the relation between diseases prevalence and risk factors which are measured in groups instead of individuals. The simplest risk estimation method is usingstandardized mortality rate (SMR) by dividing observed (Y i ) by expected frequencies (E i ). This method isresulted from Maximum Likelihood Estimation. However, this estimator has two shortcomings; firstly, SMR does not consider spatial correlation and, secondly, it gives over dispersed estimates.

Ecological models
The following models were used to study the incidence and also to assess association between incidence and the covariates. In these models, {Y i , i=1,…, n}and {E i , i=1,…, n} represent the number of observed and expected breast cancer cases forprovince i respectively. It is assumed that Y i has Poisson distribution with the rate of μ i =q i E i where q i represents relative risk for province i. In the following models, ∑ h=1 H β h x ih represents linear combination of risk factors x 1 x 2 ,..., x H with corresponding coefficient β h . In all following equations, α is the overall level of relative risk.

-Gamma-poisson regression model
It regards the relation between incidence and risk factors in every province via following log-linear model; The most important problem of this model is disregarding spatial correlation. Variability cannot be illustrated by ecological variables in many cases. Moreover, in this model, variation becomes more than the expected variability of Poisson model. Since over dispersion happens in this situation, it is added to model. In fact, the rate of over dispersion depends on the score of heterogeneity found among relative risks. In Bayesian structure, it is suggested for this model that q i has Gamma distribution. So, the achieved posterior distribution will be Gamma (Mahaki et al., 2011;Jafari-Koshki et al., 2014).

-Lognormal model
Although Gamma prior distribution seems suitable for risk rate mathematically, Gamma-Poisson model has some restrictions. Because adjusting suitability of the independent variable is difficult and involving the spatial correlation between the rates of regions is impossible. Lognormal model is more flexible for relative risk formulation and is defined as followed (Asmarian et al., 2013).

-BYM model
This model considers two sources of changes for justifying the heterogeneity the rate of incidence in every region in addition to independent variables. It also models the rate of incidence through following equation. al., 2013).
Regarding to ageing process of Iranian population and increase in development of the cancer and its considerable geographical variation in different parts of country, it is necessary to recognize the regions with high incidence and the role of its most significant risk factors to improve prevention process by use of advanced statistical models.
The present study investigates the incidence of breast cancer in Iran at province level and also explores the impact of some covariates including smoking, over-weightiness or obesity, physical activity, fruit and vegetable intakes, and human development index by use of Bayesian models.

Materials and Methods
All registered patients suffering from breast cancer in Iran from 2005 to 2008 were included in this study. These data were extracted from the report by Noncommunicable Diseases Management Center of Iranian Ministry of Health and Medical Education which are published annually.
Also the data of four high risks including smoking, fruit and vegetable intake, physical activity and obesitywere extracted from annual reports of Non-communicable Disease Risk Factors Surveillance System of the Ministry of Health. Smoking was considered as the multiplication of the percent of smokers and the mean number of cigarettes smoked daily in each province. Over-weightiness or obesity was the population proportion in each province with body mass index more than 25 (BMI>25). The fruit and vegetable variable was illustrated as the sum of daily consumption of fruit and vegetables in each province. Physical activity was calculated using a combined index called Metabolic Equivalent (MET). Human Development Index is geometric mean of achieved success in every province in three main dimensions of human development including long and healthy life, having access to knowledge and normal life standards and extracted from annual reports of Central Bank. Its range is from zero to one (Akbari et al., 2008). Ecological DOI:http://dx.doi.org/10.7314/APJCP.2015.16.14.5669 Bayesian Spatial Ecological Regression Models for Investigating the Incidence of Breast Cancer in Iran, 2005Iran, -2008 Where u i and v i represent non-structural and structural heterogeneity respectively. They are also supposed to have normal distribution and conditional auto regressive normal distribution. Regarding conditional auto regressive, the rate of incidence in every region was supposed to be dependent on the incidence rate in every neighbor regions (Lawson et al., 1999;Lawson et al., 2003).
These models were fitted using OpenBUGS version 3.2.1. Convergence was checked by using Brooks-Gelman-Robin plots. Significance tests for parameters were done by the use of Bayesian credible intervals (CrI) which are equivalent to p-value.
The total number of known breast cancer cases has been 25152 in Iran from 2005 to 2008. The most cases were found in Isfahan (2214 cases) and Khorasan-Razavi (2218 cases). The least incidence was seen in Kohgiluye-Boyerahmad (67 cases).
According to the results shown in Table 1, the role of all risk factors was found to be significant in Poisson-Gamma model. This model disregards the possibility of correlation among provinces. So, this model may be disguising.
Regarding non-structural heterogeneity in lognormal model, the impact of variables was adjusted and risk factors including over weightiness and Human Development Index were significant. This shows that over weightiness and HDI cause to increase the rate of cancer incidence.
Just HDI as a descriptive variable was significant in BYM model. It shows that the risk of breast cancer incidence is more in the regions having higher HDI. Figure 1A shows the province's relative risks regarding non-structural heterogeneity and without adjusting of risk factors in Gamma-Poisson model. According to this map, Yazd and Tehran province have the highest incidence risk of breast cancer and Sistan-Baluchestan province has the lowest incidence risk. Generally, central provinces have a higher risk of breast cancer incidence. Figure 1B shows the relative risk of provinces according to non-structural heterogeneity without adjusting the impact of risk factors in Lognormal model. These maps show that northwestern and southeastern provinces have a lower incidence risk of breast cancer among which Sistan-Baluchestan and Kohgiluyah-Boyerahmad have the least incidence of breast cancer. Figure 1C shows the relative risk of the provinces regarding non-structural heterogeneity with adjusting risk factors in Lognormal model. This map illustrates that Khorasan-Razavi and Hamedan provinces have the highest incidence risk of breast cancer and Kohgiluyah-Boyerahmad, Sistan-Baluchestan, and Ardebil provinces have the lowest incidence risk of breast cancer. Figure 1D shows the province's relative risk in BYM model without adjusting for risk factors and taking structural and non-structural heterogeneity into account. According to this map, the most incidence risk of breast cancer was seen in Isfahan and Tehran provinces and the

D) E)
minimum incidence was found in Sistan-Baluchestan and Kohgiluyah-Boyerahmad provinces. Figure 1E shows the relative risk adjusted for risk factors in BYM model that consider structural and nonstructural heterogeneity. According to this map, Yazd, Ghazvin, Ardebil and Khorasan-Shomalihad the least incidence risk of breast cancer and Khorasan-Razavi, Lorestan and Hamedan had the maximum of incidence risk.
Table 1 also compares the goodness of fitness of Gamma-Poisson, Lognormal and BYM models with and without considering risk factors by using of DIC. BYM has best goodness of fitness without the presence of risk factors, because it contains structural and non-structural heterogeneity. Gamma-Poisson model without presence of risk factors has the worst fit. Regarding the presence of risk factors, BYM and Lognormalmodels have rather the same fit. By considering risk factors, no change is made in the fit of lognormal model. Gamma-Poisson model has poor fit because it disregards the spatial correlation among the provinces. Therefore, it seems better to use BYM model in ecological analysis in comparison to other ecological regression models.

Discussion
BYM model without adjusting the impact of risk factors showed that central provinces including Isfahan, Yazd and Tehran have the most incidences of breast cancer and Northernprovinces including Fars, Khuzestan and Khorasan-Shomali provinces have the highest incidence after them. North-eastern and south-western provinces have the minimum rates of incidence risk among all provinces. These results are in according to the results reported by Jafari et al. (2014) and Mahaki et al. (2011) studies. However in Jafari et al results were crude and were not adjusted for risk factors. With adjusting risk factors, Khorasan-Razavi, Lorestan and Hamedan have the most incidence of breast cancer and Ardebil, Kohgiluyah-Boyerahmad have the least incidence. In general, southwestern provinces have lower incidence of breast cancer. The results show that the incidence of breast cancer is higher in provinces with higher HDI.
The comparison of maps B and D, C and E shows that in spite of finding different amounts of relative risks, ranks of relative risks in all provinces are identical. I.e. provinces with higher relative risk in map B and D have higher relative risk in map C and D. This is in accordance with the result of Clayton and Kaldor study (Clayton et al., 1987).
As shown in Table 1, direct impact of HDI in breast cancer incidence is affirmed. It is compatible with the results of Yost et al study (Yost et al., 2001). Barrios et al found positive relation between HDI and breast cancer incidence as well. In their study, the correlation coefficient between HDI and breast cancer incidence was 0.68 (Barrios et al., 2013).
It could be due to air pollution, more exposure to carcinogens in urban regions or because of diagnosing more cases of cancer in provinces with higher HDI, promotion of western life style and pollution increase accompanied by HDI development. HDI increase can cause to develop diagnosis of people with cancer. Although Abbastabar et al study (Abbastabar et al., 2013) showed the significant relation between breast cancer and fruit and vegetable intake, this significant relation was not found in the present study. This difference could be due to the conducting of the present investigation in ecological (province) level, whereas their study was done in individual level. The results of this study should not be interpreted in individual level because this may be misleading due to a phenomenon known as ecological fallacy. Since the urban lifestyle develops increasingly and direct association has been proved between HDI and incidence rate of breast cancer, it is necessaryto conduct some measures to provide awareness for the people and improve their lifestyle for preventing from growing the incidence of breast cancer.
Air pollution, family history, neonate feeding situation and other covariates were not available at province level. So, we suggest conducting further ecological researches regarding these factors as well.